4.1. Model Comparison of Team Performances
The database included both KRIRM (n
1 = 55) and SDSUHC (n
2 = 9) teams since 2004 (total
n = 64). Initial screening of the team performances revealed strong fit between total costs (
Figure 4a) and team rank (
Figure 4b). Overall, the expected costs and ranks of teams fit fairly well, r
2 values of 0.90 and 0.89, respectively (
Table 2). Despite the overall strong correlations, we identified 20 teams that clearly did not fit the expected costs pattern between the modeled and observed costs of the majority of teams, resulting in a total
n of 44 (n
1 = 38; n
2 = 6). Importantly, the discarded teams were fairly normally distributed throughout the database, with three teams removed from the top quartile, six from the third quartile, eight from the second quartile, and three from the bottom quartile.
Removing these teams significantly improved the match between observed and expected team performances and ranks, with r
2 values of 0.97 and 0.98 (
Table 2;
Figure 4c,d). Average errors (in terms of total
$ costs,
$/week, or cases/week) decreased
$75 and
$2, respectively. Removing the teams with inconsistent costs relative to the remaining teams created a significantly improved fit in team ranks (e.g., from six to 11 exact rank matches, or from nine to 25 percent; only five teams with rank discrepancies greater than three positions, down from 55 to 11 percent). Utilizing the Beer Game model in this way allowed us to screen the database for the teams that most likely had the greatest accounting errors and gave added confidence that the remaining teams, although not perfect in their accounting, were accurate enough to allow comparison across the dataset. The proportion of discarded teams due to likely accounting errors (31% of the original database) was therefore much smaller than [
31], which discarded 75% of that database due to errors, indicating that players may do a better job of accounting than was previously expected.
4.2. Participants’ Performance across the Database
The team average total costs relative to the benchmark costs (identified in [
31]) are shown in
Table 3. The average team cost was over 23 times the benchmark and twice the average reported in [
31] (
Table 3;
Figure 5), although that study only reported scores of the best performing teams. The wholesaler, distributor, and factory ratios of actual to benchmark costs were as high as 30 times greater than optimal cost levels, however, the retailers in our group performed similarly to other studies (
Table 3). The differences in total costs and costs of each sector to the benchmark costs were all highly significant, and compared to Sterman [
31], all sectors were significant except the retailer. To identify how well the best performing teams in our database performed relative to previous studies, total team and individual position costs were summarized into quartiles (
Table 3). The top performing teams in [
31], whose team average (
$2028) and position average costs (retailer
$383, wholesaler
$635, distributor
$630, factory
$380) fell most closely between our third and fourth quartile of team performances, indicating similar performance between the above average teams. These results held across positions (
Table 3).
Similar oscillations, amplifications, and phase lags were observed between our team performances and common Beer Game results (
Table 4;
Figure 6). Orders and inventories expressed large fluctuations, with average inventory recovery of 25.5 weeks. Backlogs of inventories migrate from the retailer to the factory similar to typical Beer Game results (
Figure 6), with the peak order rate at the factory being over three times the peak order rate of the retailer. Closed loop gains (Δ[factory orders]/Δ[customer orders]) averaged nearly 1400%, or double that reported by Sterman [
31]. Maximum backlogs averaged 35 cases and occurred between 34 and 35 weeks (
Table 4). As expected, inventories overshoot initial levels, peaking at week 35. Phase lags were more evenly distributed than typical Beer Game runs, however this was likely due to the larger sample size smoothing out the week of peak order rates. Participants’ anticipated minimum inventory (date of minimum inventory minus date of week order rate) were generally delayed by one or two weeks, indicating reactive strategies that did not account for orders in the supply line (orders placed but not yet received) and perpetuated extreme inventory levels later in the game.
Although the overall scores were poorer than team performances reported in the literature, the top 10 teams in our database (or ≈25%) performed better than the top 25% reported in Sterman [
31] and held across all game positions except for the factory (
Table 5).
Retailer, wholesaler, and distributor costs were all significantly lower (which contributed to an overall significantly lower team total cost), while the factory costs were significantly higher. Periodicity and phase lags were noticeably shorter and amplification lower than the Sterman [
31] teams. Of the top 10 teams of our database, the SDSUHC groups were disproportionately represented. Eight of the top 10 teams came from the KRIRM participants (≈21% of the KRIRM sample) while two teams came from the SDSUHC participants (≈33% of the SDSUHC sample).
4.3. Comparison of Performances from More and Less Experienced Participants
We hypothesized that the older, more experienced group (KRIRM) would perform better on the Beer Game task than the less experienced players, primarily undergraduate students (SDSUHC). We found no evidence to support this (
Table 6), as neither the team total costs nor any of the player position costs were significantly different. This corroborates previous conclusions that management experience may not mitigate misperceptions of feedback [
38]. However, qualitative analyses of the trends in effective inventory and order rates tell a more interesting story (
Figure 7). The SDSUHC teams appeared to achieve maximum inventory earlier than the KRIRM groups and by week 35 were reducing their overall inventory levels back toward the ‘anchored’ inventory level of 12. This was achieved through overall lower average order rates (
Table 6;
Figure 7). Although retailer orders were similar, wholesaler, distributor, and factory average order rates differed from as low as one to as high as six cases per week. After initial inventory recovery, discrepancies in order rates were even larger (up to eight cases at the factory level) and were all statistically significant across positions (
Table 6). Based on the change in slope of order rates and effective inventories after week 29 for the SDSUHC teams, it appears the younger players began accounting for cases in delivery much sooner than the KRIRM groups, whose maximum effective inventory levels continued to rise. It is possible that several interesting features are at work that created the divergence in trends of effective inventory between players.
First, the older KRIRM participants could have continued to order more cases after the initial inventory recovery as a way to accumulate “coordination stock” to hedge against the risk that customer orders will significantly change in the future (based on their perception of customer orders as well as experience in the real-world) or in case the other players deviate from the near equilibrium (but-suboptimal) position that the game reaches by week 30 (i.e., compensate for obvious weaknesses in their teammates) [
68]. Relying on real-world experience requires participants to determine strategy via comparison of the game to previous experience by analogy, however, decision makers who reason by analogy in complex dynamic situations have not performed as well as those who do not [
71].
Second, the older participants were likely less inclined to lower their order rates after inventory recovery, since the initial strategy (increase the order rate to get out of backlog) eventually paid off. In other words, so long as they achieved zero backlog, they were not as heavily anchored to the initial inventory level as the younger players. It has been shown that experience with a particular set of behaviors improves performance, but that as opportunity costs of trying new strategies rises, individuals will experiment with fewer decisions and are less likely to identify superior methods compared to their status quo [
91,
92]. It is likely that the opportunity costs to change strategies appeared to be too high for the older players.
Third, the younger, less experienced players in the SDSUHC teams significantly lowered their order rates after inventory recovery compared to the KRIRM group (
Table 6). Although inventories are affected by the choices of the other players, participants are forced to discretely place new orders based on each new inventory level, and new order rates represent desired change in the stock of the individual player. Therefore, each choice in order is aimed at closing the gap between desired and actual states of inventory (albeit with the necessary receiving and shipping delays).
Our older players increased order rates to get out of backlog, and rather than decreasing order rates once effective inventories recovered, continued to order at relatively high rates (i.e., they were heavily anchored to the choices that worked to get themselves out of backlog), while our younger participants made a more abrupt shift to lower order rates upon inventory recovery and escalation. Younger players in our sample were more heavily anchored to the initial inventory level and were therefore more responsive to escalating inventory levels (and therefore costs) by lowering their order rates significantly (
Table 6).
Recent psychology research strengthens these conclusions. For example, research on dynamic decision making choices of younger versus older adults has shown that older adults (age 60–84) perform better on choice-dependent tasks, which require learning how previous choices influence current performance and making a new decision based on that knowledge [
93,
94]. Older players in our sample were more heavily anchored to their previous strategy that worked (order more cases to get out of backlog), and because of that success continued to do so. Research on younger decision makers (age 18–23) has shown that they perform better on choice-independent tasks (where learning requires exploiting the options that give the highest reward on each new trial [
93,
94]) and students have best learned dynamic decision making in systems by ‘doing’ and ‘failure’ rather than ‘knowing’ or relying on experience [
70]. Older adults have also been shown to base their decisions on changes in states, compared to younger adults, who are more apt to change decisions based on comparison of expected values of new trials [
95].
Several cognitive mechanisms or learning impairments may be underlying these patterns. For example, work has shown that age-related impairments in learning may result from declines in phasic dopaminergic signals in older versus younger adults [
96], likely contributing to the deficits in feedback-driven reinforcement learning in older adults [
97]. In two exploratory choice task experiments to understand how younger and older adults differ in their exploratory choices, Blanco et al. found that strategies by the two groups were qualitatively different (with older adults performing worse), in part due to older adults applying a strategy shaped by their wealth of real-world decision-making experience that may be ill-suited in some decision environments due to increased working memory loads [
98]. Worthy et al. suggested that older adults’ departures from state-based decision strategies in favor of immediate reward strategies were due to age-related declines in the neural structures needed for more computationally demanding (e.g., goal oriented) decision making [
99]. This cognitive burden on working memory load likely leads participants to focus more on immediate versus delayed consequences of decisions [
100]. Lastly, Kurnianingsih et al. found that older adults (aged 61–80) were significantly more uncertainty averse for both risky and ambiguous choices and exhibited strategies with decreased use of maximizing information [
101], which likely contributes to learning deficits observed in healthy older adults driven by a diminished capacity to represent and use uncertainty to guide learning [
102]. This corroborates others who have shown that younger adults more willingly explore task structures when unexpected rewards or costs indicate a need for a shift in decision strategy, compared to older adults who show preservative behavior and have deficits in updating expected values of alternative decisions [
103].
Our results coincide with those observed in age-related studies [
93,
94,
95,
96,
97,
98,
99,
100,
101,
102,
103] that likely explain the discrepancy in order rates between groups (
Table 6;
Figure 8). Our results are also strengthened by the conclusions in Rouwette et al. [
34]. Rouwette et al. [
34] found that: (1) there exist few to no fundamental differences between system dynamics-oriented tasks and performance task games from other social science disciplines, and (2) it was important to note that simulation players have primarily been sampled from university student populations. The psychology literature supports a difference in task performance by age and we have overcome the weakness of relying on university student populations by including a majority of teams composed as working professional in AGNR fields.
4.4. Implications for Agricultural and Natural Resource Management
There are a number of key lessons from the Beer Game in general and from this study in particular that are of interest for AGNR management. The boom–bust nature of the Beer Game occurs due the inherent ordering and shipping delays coupled with the overwhelming tendency of players to ignore their supply line. Natural resource managers embedded in real-world systems with extremely long time-delays (e.g., year to decadal scales) performed just as bad, if not worse, than managers from corporate contexts at identifying and managing the delayed-inventory management task in the Beer Game. Results closely corresponded to typical results of other Beer Game trials, indicating that our participants, despite intimate knowledge of AGNR systems had adopted a similar decision rule identified by Sterman [
31], where participants anchor their initial expectations to the starting inventory level that inevitably produces extremely poor results. This is due to the misperception of delayed feedback between placing and receiving orders and not fully accounting for cases in the supply line, both of which lead to over-ordering and instability in even the best performing teams (
Figure 5). Even those that recognize and manage systems with many time delays that often vary from months to years in length, they still commit the same errors as ones without such experience with delays.
What are some examples from AGNR systems of failures to account for such delays and supply-on-order and what implications might there be for AGNR management in the 21st century? Unfortunately, numerous AGNR cases can be found. First, it is important to recognize how supply-lines are adjusted in AGNR systems. Producers typically have two leverage mechanisms: adjusting the number of units in production (e.g., total land under cultivation; total animal inventory, etc.) or adjusting the production per unit (e.g., production per unit of cultivated area in cropping systems; yield per head in livestock systems, etc.). Employing either of these options poses interesting trade-offs in the ability to adequately adjust the supply line. Increasing the number of units in production subjects producers to delays on the order of two to four years, while reducing units in production can occur quite rapidly (within a year). On the other hand, increasing the production per unit (through selective plant or animal breeding to enhance production potential) shortens the delay in increasing the supply line, but the genetic enhancement of the overall population makes reduction in per unit productivity extremely difficult if not impossible. To illustrate the importance of these two mechanisms to AGNR systems, consider two recent examples from the United State: corn market boom-and-bust and the contraction of the dairy industry.
The U.S. corn market for decades saw market prices oscillate between
$2–4 per bushel and producers’ land use decisions remained relatively stable around 78 million acres (
Figure 9). In response to a step change in demand in the mid-2000s arising from renewed energy policies incentivizing ethanol production (similar to the step change observed in the Beer Game), prices rose to a peak of between
$6–7 per bushel between 2011 and 2013 due to the inventory shortages resulting from the surge in capacity utilization to fill the increased demand. Producers, aiming to capitalize on the rising prices, began expanding planted area of corn by 20%, much of which onto land that had previously been retired from cultivation. Inevitably, there was a delay in productivity (which continues to increase with investment in crop production potential) as these areas came out of retirement. Failing to account for the supply line (i.e., newly converted land that had not reached its full production potential yet), total production over-shot the increased demand, resulting in a collapse of corn prices back to historical levels by 2014. As of 2020, no significant land use correction has occurred (
Figure 9).
The corn market example is a conspicuous case. A more subtle but just as powerful example may be found in the U.S. dairy industry. Dairy production is highly seasonal, peaking in late spring and bottoming in winter. Likewise, dairy production consumption is seasonal, peaking in the late fall during the holiday season. Because of the mismatch in peak supply and demand periods, managing inventory is critical for a stable market environment. As a result, prior to 1960, the U.S. dairy industry experienced cycles of expansion and contradiction similar to many other livestock industries as a result of its commodity cycle (
Figure 10). Farm policy interventions in the U.S. began managing these dynamics by purchasing and storing large volumes of milk inventory to buffer seasonal variations in supply and establishing minimum price supports that helped minimize price volatility. Under these conditions, dairy herds were able to consolidate, with 50% fewer head in 1980 compared to 1960. Simultaneously, investments in animal potential yielded a 200% increase in per head productivity. In the late 1980s, U.S. farm policy lowered support prices and government inventories (or coordination stocks) ceased to function as a buffer against seasonal supply and demand imbalances. This increased the price volatility (which has weakened farm business planning, debt repayments, and dairy farm solvency) and the importance of private inventory holdings [
104,
105].
Why the increasing price volatility (amplitude) despite a stable dairy herd level? In part, seasonality of milk production inevitably creates oscillations in inventory and therefore price. However, the amplitude has significantly increased, with greater gaps between seasonal highs and lows, indicating large shifts in inventory (booms and busts similar to the Beer Game). Booms in supply (which drive price declines) have resulted not from increasing animal units, but increasing production per head (up 400% compared to 1960), and the industry has not counteracted this productivity with reducing total animal units. Instead, inventory corrections have been made through dumping (119 million pounds in 2016, 170 million pounds in 2017, over 145 million pounds in 2018; greater dumping rate is expected in 2020 due to the coronavirus pandemic; [
106,
107,
108]). Clearly, as indicated in farm gate milk prices, this is a low leverage strategy that only temporarily corrects inventory and prices and prolongs the stress to remaining dairy producers as the volatility rises due to the continual rise in incoming inventory (that necessitates increased dumping) that will not soon change due to investments over time in herd productivity (i.e., permanent gains in genetic potential that has raised milk yield per head) that have accrued or have not yet been realized due to delays in the system.
What are the implications for the future of AGNR systems management? Without accounting for the supply line on order in AGNR supply chains, AGNR managers will continue to respond in ways to perpetuate the problems stemming from inherent oscillations and will continue to look for external causes to blame (e.g., environmental variability, government policy change, consumer behavior, etc.) for internal industry dilemmas [
104]. System structure can be defined by the basic interrelationships that influence, regulate, or control behavior (including external constraints), but structure more importantly is the endogenous decision-making rules, operating policies, goals, and modus operandi, many of which are unwritten and embedded in the culture of industries and organizations. For example, given the productivity-driven goals and mental models of the dairy industry, order rate (i.e., investment in per head productivity) has not slowed, despite the recognition that the market is over-supplied. Failure to recognize how our decisions interact with the system as a whole hinders our ability to find and effectively apply leverage to systemic problems (leverage often comes from new ways of thinking [
109]).
AGNR professionals must overcome the same common learning disabilities that are seen in humans across cultures and contexts [
60,
95] and the barriers that impede our learning about complex systems [
33]. Almost regardless of history or experience, when inserted into a given position within a system or organization, the structure incentivizes that we “become our position.” In AGNR, managers often view their position as “producers” or those who “feed the world,” reinforcing tendencies to view success based on their own productivity rather than how effectively they have met consumer expectations or balanced socio-economic and environmental concerns (e.g., the soil and water externalities cited above). Since many externalities are never felt by those that made the decisions that created the problems and because AGNR delays are particularly lengthy, our ‘knee jerk’ reactions are to assign blame to others around us and we fail to effectively learn from experience and the collective wisdom of others in the system:
“To oscillate, the time delay must be (at least partially) ignored. The manager must continue to initiate corrective actions in response to the perceived gap between the desired and actual state of the system even after sufficient corrections to close the gap are in the pipeline… Learning to recognize and account for time delays goes hand in hand with learning to be patient, to defer gratification, and to trade short-run sacrifice for long-term reward. These abilities do not develop automatically. They are part of a slow process of maturation. The longer the time delays and the greater the uncertainty over how long it will take to see the results of your corrective actions, the harder it is to account for the supply line.”
Similar learning disabilities and the consequences they exert on decision making have been observed in other natural resource management studies [
111,
112,
113].
To overcome these disabilities and barriers, the SD profession has prioritized and advocated for systems-based education from K-12th grade levels up to university graduate programs (see the Creative Learning Exchange at clexchange.org, as well as works of Forrester [
114,
115,
116,
117,
118,
119] and others [
50,
55,
120,
121]). Given the results of our Beer Game database and our experience in the AGNR professions, the need for systems education in these disciplines is as desperately needed as ever if effective change is to be expected and gaps in the 21st century challenge begin to sustainably close.
AGNR professionals with systems education could likely achieve significantly different results compared to professionals without systems-oriented education. For example, thinking in systems forces us to recognize the interconnectedness and dynamic complexity of the problem at hand, the physical stocks and flows central to the issue, and time-delays between decisions and results. Systems thinking and system dynamics modeling also encourages us to maintain an unwavering commitment to the highest standards and rigor of scientific method by recognizing and correcting our hidden biases and documenting and testing our assumptions about the problem. By doing so, we can explore a wider decision space for new or previously unrecognized leverage points to achieve our goals [
59]. Achieving the 21st century agriculture challenge requires input and collaboration across disciplines and cultures. System dynamics can provide a common unifying language to facilitate such collaboration.