A Preliminary Fuzzy Inference System for Predicting Atmospheric Ozone in an Intermountain Basin

Lawson, John R.; Lyman, Seth N.

doi:10.3390/air2030020

Open AccessArticle

A Preliminary Fuzzy Inference System for Predicting Atmospheric Ozone in an Intermountain Basin

by

John R. Lawson

^1,2,*

and

Seth N. Lyman

^1,3

¹

Bingham Research Center, Utah State University, Vernal, UT 84078, USA

²

Department of Mathematics and Statistics, Utah State University, Logan, UT 84078, USA

³

Department of Chemistry and Biochemistry, Utah State University, Logan, UT 84078, USA

^*

Author to whom correspondence should be addressed.

Air 2024, 2(3), 337-361; https://doi.org/10.3390/air2030020

Submission received: 1 August 2024 / Revised: 7 September 2024 / Accepted: 10 September 2024 / Published: 18 September 2024

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

High concentrations of ozone in the Uinta Basin, Utah, can occur after sufficient snowfall and a strong atmospheric anticyclone creates a persistent cold pool that traps emissions from oil and gas operations, where sustained photolysis of the precursors builds ozone to unhealthy concentrations. The basin’s winter-ozone system is well understood by domain experts and supported by archives of atmospheric observations. Rules of the system can be formulated in natural language (“sufficient snowfall and high pressure leads to high ozone”), lending itself to analysis with a fuzzy-logic inference system. This method encodes human expertise as machine intelligence in a more prescribed manner than more complex, black-box inference methods such as neural networks, increasing user trustworthiness of our model prototype before further optimization. Herein, we develop an ozone forecasting system, Clyfar, informed by an archive of meteorological and air-chemistry measurements. This prototype successfully demonstrates proof-of-concept despite rudimentary tuning. We describe our framework for predicting future ozone concentrations if input values are drawn from numerical weather prediction forecasts rather than observations as Clyfar initial conditions. We evaluate inferred values for one winter, finding our prototype demonstrates mixed performance but promise after optimization to deliver useful forecast guidance for decision-makers when forecast data are used as input. This early version model is the basis of ongoing optimization through machine learning.

Keywords:

fuzzy logic; machine intelligence; ozone; forecasting; air quality; meteorology; decision-making

1. Introduction

High, unhealthy concentrations of ozone in the Uinta Basin, Utah [1] (Figure 1) within the U.S. Intermountain West can occur some winters. If a substantial snow coverage persists in the wake of a snow-bearing extratropical cyclone in tandem with increasing surface pressure, a persistent cold pool may form that traps emissions from oil and gas operations [2,3,4]. Insolation then drives ozone production through photolysis of these precursor pollutants, primarily nitrous oxides (NO_x) and volatile organic compounds (VOCs). High ozone is typically an urban summertime problem due to intense human activity (pp. 90–95, 884 [5]). However, the mechanism is different in locations with winter episodes that are dependent on snow [6] such as the Uinta Basin winter-ozone (UBWO) system [2,3]. The Uinta Basin is one of only two locations in North America with winter ozone episodes [7] due to the delicate balance of latitude, elevation, and terrain shape that enables simultaneous persistent snow cover and insolation strong enough to raise ozone concentration to unhealthy levels [6].

While snowfall predictions are difficult due to sensitivity to temperature and altitude in mountain regions, the UBWO physical system is well understood by domain experts [2,4,8,9]. After snowfall—the prime prerequisite—the UBWO system can evolve into two possible states: the development of ozone concentrations that exceed the U.S. Environmental Protection Agency (EPA) regulation or not. Quantities such as snow depth, ozone concentration, and wind speed are continuous and subject to error, and rules of the system behavior can be described with adverbs of degree (e.g., “quite”, “sufficient”). The scientific problem of modeling a well-understood physical system with sparse, imperfect data lends itself to a fuzzy-logic inference prediction system [10,11]. Fuzzy logic, an unfamiliar but relatively elemental form of machine intelligence, swaps familiar two-valued logic (True or False) for a continuum between those Boolean limits of zero and unity. Hence, snowfall can be partly sufficient and partly not, in contrast to the nuance lost if Boolean logic is used (i.e., a snowfall is entirely sufficient or not).

The EPA regulates ozone concentration via National Ambient Air Quality Standards (NAAQS). We define a high-ozone episode when our representative observation for daily maximum ozone concentration in the Uinta Basin exceeds the 70 ppb NAAQS limit. Multiple exceedances of this threshold can trigger sanctions, leading to limits on and higher costs for industrial development. As such, forecasts of cold pool and high-ozone events are critical to warn the oil and gas industry, protect public health, and support the regional economy. Since winter ozone episodes only occur during relatively rare and particular meteorological conditions, useful predictions would better inform decision-makers responsible for reductions in emissions. The Ozone Alert system, run at the Bingham Research Center (BRC) in Vernal, Utah (Figure 1), since 2017, provides qualitative winter ozone forecasts to a network of over 100 oil and gas operators, other stakeholders, and local residents. The program followed a request by oil and gas companies, and members of regional oil and gas trade associations are encouraged to participate. However, the system is entirely manual and disseminated via a one-way email list. Seeking improved guidance to support the Ozone Alert program, we aim to ultimately replace some workload and subjectivity in the status quomanually administrated email list.

1.1. Seeking an Alternative to Traditional Air-Quality Models

An obstacle to issuing accurate ozone forecasts stems from inability of traditional grid point numerical weather prediction (NWP) models to capture cold pools [3,12], snowfall [13], and when coupled with atmospheric chemistry models, high concentrations of ozone [14,15]. Further, approximations of sub-grid-scale processes can perform poorly in mountainous regions [16]; hence, forecast systems might be better developed specifically for mountainous applications [17] given compounding errors in atmospheric chemistry and dynamical processes [18]. Observational data are relatively sparse in the basin for radar and in situ observations (for example, basin-level snowfall depth in Figure 1), posing an issue for training and/or evaluation of grid-based prediction systems.

More importantly, there is an unavoidable trade-off between sampling uncertainty in a forecast (achieved by a Monte Carlo or ensemble system) and the fineness of grid spacing (hence better resolving elements such as shallow cold pools [19]). Any physical NWP model must resolve complex terrain to capture persistent cold pools in complex terrain; this is generally set as a horizontal grid spacing finer than

Δ x = 3 km

in NWP models (e.g., ref. [20]). However, increasing the fineness of NWP grid spacing in two or three spatial dimensions rapidly raises computational demand when further considering the reduction in the number of timesteps to satisfy the Courant–Friedrichs–Lewy criterion (i.e., information does not cross more than one grid cell per timestep). Given finite compute resources, a resolution increase reduces the maximum number of ensemble members, and in turn, the ensemble prediction system more sparsely samples the uncertainty of future states, increasing the risk of an extreme event not captured by this limited set of forecast members.

This reduction in ensemble membership comes despite more members required to capture the variation in finer-scale phenomena captured by a finer scale model [21]. Wind flow across the basin, a complex landscape carved with canyons and surrounded by mountain ranges, is subject to diurnal reversal of slope flows, channeling in the canyons, and other small-scale patterns that cannot be observed with the sparse network of observations in the basin [2]. While fine resolution is required to capture the mechanisms leading to cold pools and high ozone in the UBWO system, the intricacy of streamlines across the basin flow is an unknown unknown, not captured by sparse observations; yet, this high-resolution model will produce a prediction. This appears irreconcilable for resolving a shallow planetary boundary layer such as a UBWO cold pool under high uncertainty. What are our alternatives to this traditional configuration of NWP ensembles?

1.2. From Machine Intelligence to Ozone Prediction

Atmospheric scientists are quickly embracing state-of-the-art methods in AI suitable for operational forecasting (e.g., ref. [22,23]), including those relevant to air quality (e.g., ref. [24,25]). Alternatives to traditional NWP can range in complexity from simpler statistical relationships [26] to pure deep learning AI models [27,28,29]. Through the information age, AI and machine learning (data-focused) techniques have become more powerful and accessible through open-source software such as scikit-learn [30], large language models (LLMs) such as ChatGPT (chatgpt.com, accessed on 1 August 2024), PaLM [31,32] and BLOOM [33]; and so on. While powerful models have never been more accessible, a potential pitfall is black-box behavior where the human supervisor cannot fully trust generated output because they are not sure how the conclusion was reached [34]. Adopting an Ockham’s Razor approach to constructing an FIS (e.g., ref. [35]), herein we seek the simplest model that gives useful guidance and no simpler; at this point, developers may use post-processing or deep learning to fine-tune model performance by optimizing parameters [36,37].

We outline below a prototype ozone concentration prediction model for the UBWO system, which implements a fuzzy-logic inference system (FIS) that infers the possibility of a cold pool from meteorological input. Its rules are drawn from human expertise and archived observations. We refer to our fuzzy-logic prediction system as Clyfar. This is Welsh for “clever” to reflect our focus of codifying human expertise as machine intelligence and is a loose abbreviation of “Computational Logic Yielding Forecasts for Atmospheric Research”.

2. Data and Methodology

2.1. Data Sources and Pre-Processing

We obtained atmospheric measurements from the compilation of sensor networks archived by Synoptic Weather (www.synopticdata.com, accessed on 1 July 2024), a spin-off from MesoWest [38]. The geographical domain is a 72-km (45-mile) radius around Pelican Lake (UCL21) with coordinates (40.1742, −109.6666), shown by the red circle in Figure 1. As we will use a rule-based system where permutations of variables and their categories build rapidly, we limit this preliminary version of Clyfar to four input variables deemed most important for predicting high ozone concentrations in the basin:

Snow cover
Mean sea-level pressure (MSLP)
Insolation
Surface wind

The rationale for the above might be summarized as “after a heavy snowfall, if wind calms under a strengthening high-pressure system and daytime skies are mostly clear, ozone is possible”. Future iterations of Clyfar may include additional variables such as ground heat flux (available for snow melt), actinic irradiation [3], and a “memory” of cold-pool strength and ozone concentration.

Our output (target) variable, Uinta Basin daily ozone concentration maximum, is defined for the local time-zone period of midnight to midnight. We must therefore engineer representative input variables in the same time period. Observation stations can have different suites of atmospheric sensors, and use of only one station leaves the analysis susceptible to spurious error. We therefore use the following functions to reduce observation sets to a Basin-representative value configured after extensive preliminary testing and discussion between domain experts:

Snow cover data are sparse in the basin (stations reporting snow depth at basin level are marked with black squares in Figure 1), where most stations are operated by volunteers in the Cooperative Observation Program (COOP; https://www.ncei.noaa.gov/products/land-based-station/cooperative-observer-network, accessed on 1 July 2024). A station that reports once a day may not sample at a time most representative for that solar day. Therefore, our snow value is the 90th percentile of the set of maximum snow-depth reports from basin floor stations on the COOP network taken at least once a day.
Raw pressure data are reduced to mean sea-level pressure (MSLP) on Synoptic Weather’s server before download, and we use the median value from all stations’ daily maximum as representative. The computation of MSLP becomes less reliable with height, and preliminary work revealed absolute values of MSLP in the dataset to be excessively large. The excessive MSLP values appear to be a systematic, additive offset that did not preclude good performance in preliminary testing. Current work is investigating alternative calculations of MSLP and the source of high bias.
Insolation is affected by both optical depth (humidity and clouds, particulate matter) and the solar angle. Passing clouds make the data temporally variable, and spatially, higher elevation stations will receive more radiation under clear skies. To generate a representative value for the basin, we employ a “near-zenith mean” that takes the mean downwelling solar radiation for each station between 1000 and 1400 local time. From this set of all stations, we then take the median value.
Wind data. We want to identify wind strong enough to disperse pollutants and/or the cold pool while ignoring transient gusts from storms (mainly a result of evaporative cooling and attendant downdrafts). Hence, we assume that the Vernal Regional Airport (KVEL) is representative and take its daily median 10 m maximum reported wind value, with the benefit of a long, reliable archive of observations. The airport is approximately 4.5 km (2.8 miles) from the nearest foothills east of the runway and even further from canyon exits north of the town. As such, we neglect effects from downslope winds, drainage flows, or wind funneling; we take KVEL wind reports as representative of the basin as a whole. Future versions will consider more stations’ reports.
Ozone data. While internal data show that there is occasionally considerable variation in ozone concentrations from west to east in the basin, for the purpose of this initial study we choose one value by taking the 99th percentile of each ozone observation, then take the median value from this set.

2.2. Fuzzy Logic: Background and Justification

Fuzzy logic differs from traditional two-valued logic (True or False) by allowing variables to have continuous set membership between 0 and 1. For example, in traditional logic, we might categorize a day as either “rainy” or “not rainy” based on a fixed threshold of precipitation. However, this binary classification fails to capture the nuances of weather conditions. Fuzzy logic allows us to define a “rainy day” as a continuous spectrum:

0 mm (trace amounts) of rain: definitely not rainy (membership = 0);
0.1 mm of rain: mostly not rainy (membership = 0.1);
1 mm of rain: somewhat rainy (membership = 0.5);
5 mm of rain: quite rainy (membership = 0.9);
Over 10 mm of rain: definitely rainy (membership = 1).

This approach allows for a more nuanced representation of weather conditions, where a day with 1 mm of rain is not simply “not rainy” or “rainy” but rather as “somewhat rainy” with a membership of 0.5 in the “rainy" set. Fuzzy logic has many advantages over bivalent (two-valued) logic. While its use in consumer products and control systems has integrated with AI and ML techniques [39,40,41], the philosophy of fuzzy set theory still holds and is still deployed in many applications outside of control systems, such as predictions of rainfall [42] and fog [43] serving as meteorological examples. Outputs from FISs have numerous advantages, such as lower sensitivity in small perturbations versus probabilistic models due to smoothing [44] and acceptance of conflicting information [45]. Output can also be considered to be an upper bound on probability [45], usually preferred by risk-averse users.

We can encode nuance in our ruleset with membership functions that determine how much a given input value belongs to a particular category of the variable. For instance, in our rainfall example, we might define overlapping membership functions for each variable’s category, where the observed rainfall might have partial membership in multiple sets, allowing the system to reckon with ambiguity or conflict.

We can use domain expertise and archived observations to determine numerical values for adjectives/adverbs when creating a ruleset for the system at hand. For example, researchers might use their experience and historical data to define what constitutes “high pressure” or “calm wind” in a particular region, translating these linguistic terms into specific membership functions. The use of an FIS is motivated by multiple characteristics of the UBWO system:

The formation of UBWO cold pools—and usually high ozone concentration—is a well-known system but hinges on sufficient snowfall. As a complex system with two basins of attraction, the sensitivity of cold-pool formation is lower when snow is either absent or very deep, whereas near the cusp of the two potential future states (near the bifurcation point), chaotic growth means small changes grow rapidly [46,47]. Setting and predicting representative values of snow depth is difficult due to drifting snow, sparse data observations, and inherent limitations of human knowledge and ability to represent UBWO system complexity. Fuzzy logic effectively smooths some noise, making its behavior more resilient in presence of error [45], trading some specificity for the estimate of uncertainty.
Evolution of an AI system with ongoing development and optimization that can be increased in complexity to optimize output utility to Ozone Alert forecasters and decision makers. Machine learning techniques can be deployed with rulesets and parameter tuning [39] to leverage benefits from different AI/ML techniques, while the FIS ruleset remains understandable by the human.
Capturing both complex terrain and uncertainty is a trade-off when running expensive NWP models. As grid spacing becomes finer, timesteps between integrations must become closer together, and we might consider a finer grid in the vertical direction to better capture shallow cold pools in simulations. However, a rare event (e.g., a heavy snowfall that occurs 1 in 5 winters) requires ample sampling of the uncertainty distribution. The fewer members in a forecast ensemble, the less chance of capturing the true nature of uncertainty, and the more difficult to calibrate the system to optimize balance between sharpness and reliability of uncertainty estimates. Further, fine-scale atmospheric flow and state is an unknown unknown: a high-resolution NWP model may be overkill. However, we lack the observations to diagnose such a scenario: the so-called curse of dimensionality. Running many lightweight statistical simulations may spend finite computer resources more effectively than unfalsifiable and demanding high-resolution NWP models.

We are further motivated to use an FIS to follow best practices of explainable AI [34], albeit fuzzy logic being only an elementary form of AI [48,49]. An FIS encodes domain knowledge explicitly, enabling explainable and transparent construction of its workings and can be extended with a fuzzy neural network (e.g., ref. [50]) or fine-tuned with deep learning (e.g., ref. [51]). Herein, we create a prototype model to demonstrate the potential of forecasting ozone concentrations for the purpose of automation, optimization, and greater insight into UBWO system behavior. Comprehensive reviews of fuzzy logic can be found in, e.g., [52]. We perform inference with the so-called Mamdani method, which the authors found more accessible than, e.g., the Sugeno method; the choice of inference is outside the scope of this manuscript, but the method is discussed further in [53] and references therein.

3. Configuration of Clyfar: A Fuzzy Inference System for Ozone Prediction

Written completely in python code, Clyfar comprises a module for pre-processing input data, an inference system based on a fuzzy ruleset, and a planned post-processing module that will optimize output further based on observations.

We define membership functions for each category in each variable informed by an archive of meteorological and ozone concentration observations. Though the authors had access to 20 years of data for this region, the present study will focus on the winter of 2021/2022 as an illustrative case study to demonstrate the promising (but mixed) results of our prototype. To simplify our prototype for sake of understanding, we restrict our system to four input weather variables with ozone as the sole output variable. Further Clyfar iterations will consider more rules and variables. The authors stress this single winter is not a representative evaluation of long-term performance, but a foundation for future versions via lessons learned.

3.1. Overview of Approach

Some users seek a deterministic forecast, perhaps interpreted as a hedged ‘best guess’. However, other decision-makers benefit from information about uncertainty, increasing the chances of detecting an early, low-risk, high-impact event [54,55] by accounting for chaotic error growth [46,56]. Inference of both a single value and uncertainty distribution follows this method:

Pre-processing: Process observational data to create a representative value of the basin state per input variable and time (feature engineering).
Define Membership Functions: Define the distribution of membership of the variable to a category (“adverbs of degree”, e.g., sufficient snow). These functions (curves) map the input data (e.g., 250 mm snow) to their corresponding fuzzy sets with non-zero memberships (e.g., 1.0 sufficient snow),
Construct Fuzzy Rules: Develop a set of if–then rules that define the relationship between input and output variables based on domain expertise (e.g., “Sufficient snow and calm winds lead to elevated ozone”.)
Fuzzification: Convert the crisp input values into fuzzy values using the defined membership functions. For instance, snowfall at the cusp of negligible and sufficient for cold-pool formation will have non-zero membership to both categories.
Apply Inference Rules: For each fuzzy rule, we compute an activation in the range $[0, 1]$ of the target variable’s category. We use the fuzzy “AND” operator to combine multiple activations with an infimum (a minimum in finite sets). This matches intuition that it is harder to activate multiple rules at a higher level. Further, “OR” operators are combined with the supremum (maximum), and this is used to create an aggregated activation or possibility distribution [45],
Possibility distribution: the supremum is also used to aggregate the rule outputs (i.e., the maximum value from each rule output for each point in the output’s numerical range). Then, each category has an activation level that represents a possibility [57,58], conceptually an upper bound on probability [44,45] that can be considered a likelihood (but not a probability);
Defuzzification: To generate a single, deterministic value in native units, we convert the aggregated activation distribution back into crisp values using defuzzification methods such as the centroid method (a sort of weighted average or center of gravity). We might also preserve the possibility distribution by skipping this final step.

To gauge performance of Clyfar, we will compare inferred values (resembling forecasts) with observed ozone concentrations. Our system is assumed stationary; therefore, the model should capture the UBWO key behavior with observations before forecasts can be issued. As there is no machine learning occurring at this stage of the FIS, there is no concern with training and testing over the same dataset.

3.2. Pre-Processing and Membership Functions

Input variables were chosen by inspecting our archive of observations as detailed in Section 2. Clusters or bifurcations in scatter plots of daily representative values of ozone concentration against various input variables, as shown for wind speed in Figure 2, represent potential regimes or areas of nonlinear behavior in the UBWO atmospheric state known to domain experts. For instance, Figure 2 shows that even a moderate wind speed can disperse the pollution and lower concentrations. In the following figures, the x-axis represents the possible range considered by the inference system (also called the universe of discourse); values outside either bound are clipped to the appropriate minimum or maximum.

We construct membership functions as follows (and shown in Figure 3, Figure 4, Figure 5, Figure 6 and Figure 7):

Wind speed. As seen in Figure 2, exceedance events in winter 2021–2022 only occurred if the representative wind was calm enough. Preliminary testing showed this was common to numerous stations and seasons, matching domain expertise. We chose two opposing sigmoid distributions crossing close to 2.5 ${m s}^{- 1}$ as advised by observations and adjusted slightly during preliminary testing.
Snow depth. Similarly to the wind variable, we choose two opposing sigmoid functions that cross around a region of “sufficient snow”. This is around 100 mm (3.9 inch). Although difficult to directly compare, the sigmoid shapes were shallower, resulting in more likely overlap when more frequently observed in the UBWO system (see the inset of Figure 4) to represent more uncertainty around what constitutes “sufficient” snow depth.
Mean sea-level pressure (MSLP). Rising pressure behind a snowstorm reinforces the surface anticyclone in cold air, often in tandem with warm air advection aloft (e.g., ref. [12]). We choose three categories: two extremes are conducive to dissipation or formation of cold pools, while the middle category essentially increases specificity (an additional membership function curve) at the cost of increasing the ruleset complexity. Regarding magnitudes of mean sea-level pressure (MSLP), values appear too high, perhaps due to calculation error, but preliminary testing showed no obvious errors. This will be adjusted in the future. The authors also tested for sensitivity to normalization of input data (i.e., pressure in [0,1]) due to the large gap in ranges between MSLP and the other variables. There was no observed improvement in performance, with some loss of transparency due to the required transform to and from the normalized range [0,1].
Solar insolation. The authors found most subjective uncertainty and sensitivity when considering downwelling solar radiation critical for photolysis and the process leading to unhealthy ozone concentrations. Solar insolation measured at the surface is highly sensitive to cloud cover factored nonlinearly by the time of day when solar obscuration occurred. Further complexity in the ozone–insolation relationship is created by how increasing insolation increases with photolysis and ozone production but eventually mixes out the cold pool due to melting snow and thermal mixing of the planetary boundary layer. We encode this large uncertainty with larger overlap of membership functions (Figure 6). We decide to define four periods to reflect the four main months of the UBWO system (December to March inclusive) and parallel the ozone output categories discussed next. We label the solar insolation categories as seasons as these ranges are typical of those seasons in the Uinta Basin. There is much overlap between a cloudy spring day and a clear mid-winter’s day in terms of insolation. Given the importance of actinic irradiation to the UBWO [3], these estimates may be required to narrow bounds of uncertainty regarding photolysis rates.

The output is ozone concentration in four categories: background, moderate, elevated, and extreme. We choose not to match the EPA Air Quality Index categories (https://www.airnow.gov/aqi/aqi-basics/, accessed on 1 August 2024) but instead opt for fewer categories to focus on understandable FIS configuration. The choice of four categories strikes a balance between complexity (required to capture extremes) and simplicity (to reduce the size of the FIS ruleset). Not all permutations of these rules are needed as they are either physically inconsistent (e.g., snow is sufficient and solar is summer) or already captured by another rule (optimizing the ruleset is outside the scope of the current text). Again, it is human expertise that can hand-pick or modify rules, adding trustworthy complexity. Next, we use relationships between input variables and ozone concentration samples to determine membership functions. The membership functions are constructed using Gaussian or sigmoid functions (Figure 3, Figure 4, Figure 5, Figure 6 and Figure 7). See Appendix A for details on function construction for each variable and category. Below, we italicize variable categories (e.g., sufficient snow) to differentiate fuzzy variable descriptors from body text.

3.3. Ruleset of UBWO Behavior

In natural language, we can describe the UBWO system with the following rules. We define a limited list known to human experts [2,4,8,9], and the list does not exhaust the permutations of all variables and categories for our Clyfar prototype:

If there is negligible snow, pressure is low, or wind is breezy, then the ozone level will be at background levels. This is because pollutants are blown away from the region of interest;
If there is sufficient snow, pressure is high, wind is calm, and the solar radiation is typical for spring, then the ozone level will be extreme (typical high-ozone case).
If there is sufficient snow, pressure is high, wind is calm, and the solar radiation is typical for winter, then the ozone level will be elevated. There is still sufficient sunlight for photolysis to build ozone to unhealthy levels, but it may take longer to build, for example.
If there is sufficient snow, pressure is high, wind is calm, and the solar radiation is low (midwinter) or high (summer), then the ozone level will be moderate.
If there is sufficient snow, pressure is average, wind is calm, and the solar radiation is low to moderate (winter into spring), then the ozone level will be elevated.
If there is sufficient snow, pressure is average, wind is calm, and the solar radiation is lowest (midwinter) or highest (late spring into summer), then ozone level will be moderate. This is because insolation is either too weak for prolific ozone generation or so strong it may mix out the boundary layer.

We render this ruleset using logic operators in Appendix B.

4. Illustrative Examples

Here, we assess our system with synthetic examples to demonstrate expected behavior for four scenarios whereby unhealthy levels of ozone are deemed (1) likely, (2) unlikely, (3) on the cusp of occurring or not, and (4) an implausible scenario for snow in summer.

4.1. Case 1: Ozone Likely

We begin with an example in a situation where ozone levels are expected to be higher than the NAAQS limit, given deep snow, high pressure, weak winds, and insolation strong enough to instigate ample ozone production but weak enough not to mix out the cold pool. This is shown in Figure 8.

snow = 250 mm (9.8 inches);
mslp = 1045 hPa;
wind = 1.0 ${m s}^{- 1}$ ;
solar = 640 ${W m}^{- 2}$ .

The crisp value predicts an ozone level of 87 ppb. Looking at the four categories, there is little support (possibility) for background and elevated levels of ozone, but a strong possibility for extreme levels. The centroid method is used to generate a most-likely value, but by its nature of computing a weighted average over rule-activation aggregation, it cannot generate a crisp value to the right of the extreme Gaussian curve’s center value.

4.2. Case 2: Ozone Unlikely

Next, a case unlikely to yield ozone above a typical background level is presented in Figure 9. In the input data, we prescribe a thin snow depth and a breeze that would likely blow a portion of pollutants from the basin and/or initiative mechanical mixing of the cold pool and dissipation into the free troposphere.

snow = 50 mm (2.0 inches)
mslp = 1025 hPa
wind = 4.0 ${m s}^{- 1}$
solar = 600 ${W m}^{- 2}$

The inferred ozone concentration suggests it is entirely possible (likely) to remain near background levels. The impossibility of another outcome other than background is triggered by Rule 1 (breezy wind → background ozone). We recall that possibilities across categories can sum to more than one, unlike probabilities. Hence, not only is the possibility (activation) of background close to 1.0, the possibilities of other categories are also equal or near zero. A background level is not only totally possible but entirely necessary due to the impossibility of all other outcomes. Further information on possibility and necessity—dual measures that represent bounds on uncertainty—is found in [45,58].

4.3. Case 3: On the Cusp

We consider a case where it is deliberately not immediately clear which ozone level is most possible due to variables on the cusp of the membership function’s intersection (i.e., close to a potential tipping point in the physical system, such as sufficient snow).

snow = 100 mm (3.9 inches);
mslp = 1040 hPa;
wind = 1.5 ${m s}^{- 1}$ ;
solar = 500 ${W m}^{- 2}$ .

We see in Figure 10 the maximum of all possibilities is less than unity: a so-called subnormal distribution [59]. While further discussion is outside the scope of this study, this signals insufficient rule coverage as something must happen (i.e., at least one category must be entirely possible before it may necessarily occur). Confirming a weakness in the ruleset’s construction, we find that moderate was not activated, and the adjacent categories were activated instead, which seems counterintuitive. Alternatively, this distribution and the two basins of attraction to ultimate states may indicate a bifurcation in solutions (i.e., it is difficult to discriminate between the two states).

There is a substantial possibility of elevated ozone, but there is also a considerable possibility of ozone being limited to background levels. The activated range (filled area of membership functions in figures) around the crisp value (black line) is large, suggesting considerable uncertainty (i.e., a wide range of output variables are possible). In this case, a centroid value does not communicate the high uncertainty (i.e., the substantial possibility of other sets, particularly background). Further to this crisp (deterministic) value, stakeholders who are risk-averse would benefit from information that extreme levels are still possible in case evasive action is financial sensible (e.g., ref. [60]).

4.4. Case 4: Ignorance

A common mantra for statistical processing states that “garbage in, garbage out”’; unfortunately, “garbage out” and useful guidance are often indistinguishable before the event occurs or not. A supervising human in loop or an automatic quality control may prevent nonsensical values from Clyfar processing; however, let us consider raw output in an implausible scenario of summer snow.

snow = 83 mm (3.3 inches)
mslp = 1050 hPa
wind = 1.0 ${m s}^{- 1}$
solar = 1100 ${W m}^{- 2}$

We know that Clyfar cannot offer a useful prediction (Figure 11). The lack of support in the data and a near-uniform distribution of (not very) possible outcomes represents substantial ignorance, which may be preferable over a deterministic, crisp ozone concentration that is extricated from scarce information: we obtain a (meta-)confidence in the confidence of an event. Cases that fall outside the ruleset (i.e., little activation of few rules) but still result in high-impact events resemble black swans [61] in that they have not been considered due to their absence in observation records. In a stationary climate with a long record of observation, we can confidently say some events—such as snow on July 1 at the basin floor—are impossible, or “off the attractor” in the paradigm of chaotic, complex systems [62]. While we find that Clyfar suggests that all outcomes are not very plausible, which is true, a non-optimized or restrictive model will continue to suffer from these problems if the set of rule permutations is not explored sufficiently. Indeed, an FIS with complete ruleset coverage would show ignorance during a nonsensical event (like this example) with low activations across all categories. The knowledge of a lack of knowledge is useful to know!

5. Case Study: Winter 2021/2022

We now present inferred values (resembling predictions) from this preliminary version of Clyfar, here marked as version 0.1 (v0.1), using observed weather variables and evaluating against collected ozone data for the same daily periods.

The advantage of choosing this winter is the two clear spikes in Clyfar ozone forecasts during the season, with only one event being observed. High ozone was associated with typical precursors familiar to Ozone Alert forecasters, such as calm wind and antecedent snowfall. We highlight three regions of the 2021/2022 season that illustrate the good, bad, and typical (null) performance quality of the Clyfar prototype. We order these subsections in chronological order; each event is labeled with a black arrow above the axis in Figure 12. We include the rank (a description of the percentile in which this possibility fell for this winter) for reference. It is intuitive that achieving an extreme value of ozone is more difficult than a background level—we see a background value more often (regression to the mean).

5.1. 14 December 2021: Example of Background Signal

As noted above, crisp (deterministic) forecast values generated from Clyfar cannot exceed the center of the Gaussian curve for each category book-ending the universe of discourse (i.e., background and extreme ozone). This hard limit is an artifact of the defuzzification method (here the centroid method, a sort of weighted average), and can be addressed by changing this method [63] or perhaps post-processing with another algorithm or model. We configure Clyfar in a modular manner to allow for modification of algorithms or pre-/post-processing independently during optimization.

When we view output as the possibility of each category (Figure 13), Clyfar suggests that background ozone levels are almost entirely possible (≈0.95), in contrast to the almost impossible occurrence of the other, higher concentrations. Similarly to Figure 9, the impossibility of other categories makes background levels necessary—not just possible. Despite a high possibility value for background levels, we find this value to still be in the lowest quartile of possibility for the season. This is sensible: all else equal, it is more possible to achieve typical, background levels of ozone than rare, extreme levels. However, further interpretation is needed whether percentiles rather than raw possibility values are more useful to signal a potential low-risk, high-impact event at long (less predictable) lead times (e.g., ref. [64]).

5.2. 2 January 2022: Poor Forecast

In this event, Clyfar predicted that an elevated level of ozone was most possible but without high support in the data (evidenced by the possibility value ≈0.3).

We see in Figure 14 that extreme levels of ozone, while deemed not likely by Clyfar, are predicted with a possibility in the top 2% of values for this winter (this is possible with hindsight; operationally, percentiles would be computed from a longer archive). The long tail of the extreme ozone category (e.g., Figure 7) allocates possibilistic weight towards high values, drawing the centroid towards a larger crisp-value forecast. In this poorly forecasted event, one sees the benefit of preserving uncertainty of a possibility distribution as an additional source of forecast information to the deterministic prediction. A decision-maker would arguably avoid the worst losses from a missed event if there were an expression of uncertainty. In the context of the entire winter (Figure 12), we see the crisp predicted value (blue) is a stark false alarm in contrast to observed (orange), but also note the comparatively lower possibility of extreme values for the 2 January event compared to 27 February, as discussed next.

5.3. 27 February 2022: Good Forecast

Here, Clyfar excels in the magnitude of the crisp value and the sudden (nonlinear) increase in ozone levels on the same day as observed values rose substantially in tandem. We note that the peak is not sustained in forecasts as long as the period observed; this prototype has no memory, and each forecast day is computed independently. Current work is underway coupling this prototype with, e.g., the previous day’s ozone concentration, given the common strong auto-correlation between yesterday’s and today’s ozone values as a cold pool strengthens [3].

To further understand the utility of a possibility distribution, compare the 2 January and 27 February cases in Figure 12 and Figure 15: while the time series of crisp values (deterministic ozone forecasts) have stark performance disparities, the 27 February case (high ozone observed) had larger possibility values for extreme ozone (red bars).

6. Conclusions and Future Work

In summary, the performance of the preliminary version is promising. Unsurprisingly, there is a need for optimization, potentially with gradient descent [30] and other machine learning methods, and data mining may continue to provide insight into variables that explain more variance in the ozone time series [65]. The low possibility values in aggregate seen in activation output (e.g., Figure 10) suggest that larger coverage of the ruleset permutations is needed. Further, users would find the impact of a missed high-ozone event much worse than a false alarm due to the risk aversion inherent in oil and gas operations.

The display of nonlinearity in a prototype model is encouraging, exemplified by a sudden increase in ozone concentrations for the well forecast event in Figure 12 and Figure 15, despite a lack of day-to-day memory. It may be more common for Clyfar to infer higher levels of ozone if we include the previous day’s maximum as another input. Despite this, the deterministic crisp value of ozone concentration is a hedged forecast [66]. The defuzzification is not a representation of the most likely forecast, but rather a value that minimizes a perceived loss function (such as mean square error). Throughout development, the authors will use individual members from ensemble NWP models to drive instances of Clyfar. Ultimately, this yields an ensemble of possibility distributions and an ensemble of crisp values, from which users can generate an accessible summary of uncertainty in addition to a deterministic forecast. Members of a Monte Carlo collection of Clyfar forecasts are cheap to run in large numbers, enabling a wide sampling of forecast uncertainty.

It is difficult to communicate uncertainty [67], and the concept of possibilities (rather than probabilities) is not a familiar one for many stakeholders and air quality scientists. However, we leave discussion of risk communication for a future manuscript. We decide not to normalize our possibility distribution (i.e., the heights of each bar chart or height of color fill in activation results). Doing so would give a false sense that the rule coverage is sufficient to cover all outcomes, leaving the user susceptible to “black swan” (unforeseen; failure of imagination) events. The authors consider it more useful for this iteration of Clyfar to leave a non-zero possibility value assigned to an unknown category (conceptually, “unknown”) rather than normalizing the bars (i.e., stretching the possibility values until at least one bar equals unity). However, this lack of rule coverage is information in its own right, bears from poor support in the data, and represents a lack of confidence in the possibility distribution—the uncertainty of uncertainty!

Accordingly, small differences in the categories’ possibility values from day to day may represent large anomalies in terms of percentiles. Figure 16 shows distributions for the same season. Circle are single days, and the boxes indicate the interquartile range. Short boxes most likely indicate the extreme is difficult to achieve and the majority of days have a small, similar value that manifests a small range of possibility values. However, it may also be a sign of lack of ruleset coverage: output categories are insufficiently activated to capture the full complexity of the UBWO system.

Future Work: Optimizing and Deployment

The rudimentary version of Clyfar herein is a baseline for future versions and other models to surpass in performance and skill. A more complex model should only supplant a previous version when it shows a skill increase worthy of an increase in computational demand or complexity (the latter of which comes at the loss of explainability of results). Use of machine learning techniques such as gradient descent [30] can optimize a fit faster if the areas of sampling are constrained closer to human-defined regions of phase space. Further, neural networks can optimize FISs [51], yielding a hybrid prediction system known as a neurofuzzy (e.g., ref. [68]). During the optimization of Clyfar, data sparsity will hamper training of machine learning methods. While satellite imagery is accessible and covers a wide area, it is most useful for identifying snowfall when it is already snowing (therefore unable to identify surface snow during storm passage).

The upcoming first operational version of Clyfar will ingest pre-processed NWP forecasts, rather than observations, such that inferences represent predictions. We intend to use Global Ensemble Forecasting System (GEFS) data [69,70] as input generating 14-day forecasts of daily maximum ozone concentration. These forecasts will be made available to the public via a website currently in development as part of Ozone Alert. We improve the dissemination of Clyfar by holding site-user surveys and continue research deploying LLMs to tailor atmospheric hazard risk communication appropriately for the end-user advised by recent studies in LLM translation and communication skill [71,72].

Author Contributions

Conceptualization, J.R.L. and S.N.L.; methodology, J.R.L. and S.N.L.; software, J.R.L.; validation, J.R.L. and S.N.L.; formal analysis, J.R.L. and S.N.L.; investigation, J.R.L. and S.N.L.; resources, J.R.L. and S.N.L.; data curation, J.R.L. and S.N.L.; writing—original draft preparation, J.R.L.; writing—review and editing, J.R.L. and S.N.L.; visualization, J.R.L. and S.N.L.; supervision, S.N.L.; project administration, S.N.L.; funding acquisition, S.N.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is funded by Uintah County Special Service District 1 and the Utah Legislature.

Data Availability Statement

All data and methods, along with archived observations and worked Jupyter notebooks that generate figures herein, are available upon request from the corresponding author. Code for Clyfar is documented and updated at https://github.com/Bingham-Research-Center/clyfar (accessed on 1 August 2024) and more information on the Bingham Research Center data collection is found at https://www.usu.edu/binghamresearch/ (accessed on 1 July 2024).

Use of Artificial Intelligence

Brainstorming with OpenAI GPT-4 output accelerated project development and helped link disparate concepts. GitHub Co-Pilot output was used to assist python code development. No generative AI was used verbatim in the writing of this paper.

Acknowledgments

The authors thank the editor and two anonymous reviewers for their critique in improving this paper. The authors further thank Brian Blaylock of the U.S. Naval Research Laboratory for his continued work developing critical open-source python packages at https://github.com/blaylockbk (accessed 1 July 2024), with Trevor O’Neil and Michael Davies both assisting with previous and ongoing data collection and processing, respectively. JRL thanks his wife Taylor for delivering baby Finn during completion of this manuscript, and the patience of editorial team, coauthor, and wife alike.

Conflicts of Interest

The authors have no outside conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

BRC	Bingham Research Center
Clyfar	Computational Logic for Yielding Atmospheric Research
COOP	Cooperative Observation Program
EPA	Environmental Protection Agency
FIS	Fuzzy-logic Inference System
GEFS	Global Ensemble Forecast System
KVEL	Vernal Regional Airport
LLM	Large Language Model
MSLP	Mean Sea-level Pressure
NWP	Numerical Weather Prediction
NAAQS	National Ambient Air Quality Standards
UBWO	Uinta Basin Winter Ozone
VOC	Volatile Organic Compound

Appendix A

Gaussian curves are each defined by mean (

\bar{x}

) and standard deviation (

σ

) values. This approach is implemented with the scikit-fuzz Python module [73] (https://github.com/scikit-fuzzy/scikit-fuzzy, accessed on 1 January 2024). The general formula for each ozone level’s membership function is given by:

{VARIABLE}_{level} (x) = {exp}^{- \frac{{(x - \bar{x})}^{2}}{2 σ^{2}}}

(A1)

where “level” is replaced by the descriptive term for each membership function. We also use sigmoid (“S-shaped”) functions in the FIS mechanics to represent variables that asymptote to 0 or 1. The sigmoid membership function is generated with the equation

μ_{x} = \frac{1}{1 + exp [- c \cdot (x - b)]}

(A2)

where

μ_{x}

is the membership value with respect to x; x is the variable of interest; b is the center value of the sigmoid (

y = \frac{1}{2}

); c controls the width of the sigmoidal region about b (magnitude) and determines the function’s shape. A positive value of c implies the left side approaches 0.0 while the right side approaches 1.0; likewise, vice versa for a negative sign. We show numerical values for each variable category’s membership function in Table A1. The range of that variable (formally the universe of discourse) considered by the FIS is also shown. Values outside of this range are clipped to the nearest value in that range.

Table A1. Parameters for membership functions shown graphically in Figure 3, Figure 4, Figure 5, Figure 6 and Figure 7. A hyphen denotes this is a non-applicable constant for this row’s membership function.

Variable	Range	Units	Category	Function	$\bar{x}$	$σ$	b	c
wind	0–20	${m s}^{- 1}$	calm	sigmoid	-	-	2.5	−3.0
			breezy	sigmoid	-	-	2.5	3.0
snow	0–750	mm	negligible	sigmoid	-	-	70	−0.07
			sufficient	sigmoid	-	-	100	0.07
mslp	1000–1070	Pa	low	sigmoid	-	-	101,300	−0.005
	( $\times 10^{2})$		average	Gaussian	102,900	800	-	-
			high	sigmoid	-	-	104,500	0.005
solar	0–1100	${W m}^{- 2}$	midwinter	sigmoid	-	-	300	−0.03
			winter	Gaussian	450	100	-	-
			spring	Gaussian	650	100	-	-
			summer	sigmoid	-	-	750	0.03
ozone	20–140	ppb	background	Gaussian	40	6.0	-	-
			moderate	Gaussian	52	5.5	-	-
			elevated	Gaussian	67	6.0	-	-
			moderate	Gaussian	95	10.0	-	-

Appendix B

We can construct rulesets for the ozone system as follows (mslp denoting MSLP):

snow = negligible ∨ mslp = low ∨ wind = breezy
→ ozone = background
snow = sufficient ∧ mslp = high ∧ wind = calm ∧ solar = spring
→ ozone = extreme
snow = sufficient ∧ mslp = high ∧ wind = calm ∧ solar = winter
→ ozone = elevated
snow = sufficient ∧ mslp = high ∧ wind = calm ∧ solar = (midwinter ∨ summer)
→ ozone = moderate
snow = sufficient ∧ mslp = average ∧ wind = calm ∧ solar = (winter ∨ spring)
→ ozone = elevated
snow = sufficient ∧ mslp = average ∧ wind = calm ∧ solar = (midwinter ∨ summer)
→ ozone = moderate

Table A2. Logical operators and associated functions for bivalent logic and fuzzy equivalents, where A and B represent independent events.

Description	Rendered	Bivalent Function	Fuzzy Function
Implication (IF...THEN)	→
A AND B	A ∧ B	minimum	infimum
A OR B	A ∨ B	maximum	supremum
NOT A	$\neg A$	( $1 - A$ )	( $1 - A$ )

References

Bader, J.W. Structural and tectonic evolution of the Douglas Creek arch, the Douglas Creek fault zone, and environs, northwestern Colorado and northeastern Utah: Implications for petroleum accumulation in the Piceance and Uinta basins. Rocky Mt. Geol. 2009, 44, 121–145. [Google Scholar] [CrossRef]
Lyman, S.; Tran, T. Inversion structure and winter ozone distribution in the Uintah Basin, Utah, U.S.A. Atmos. Environ. 2015, 123, 156–165. [Google Scholar] [CrossRef]
Neemann, E.M.; Crosman, E.T.; Horel, J.D.; Avey, L. Simulations of a cold-air pool associated with elevated wintertime ozone in the Uintah Basin, Utah. Atmos. Chem. Phys. 2015, 15, 135–151. [Google Scholar] [CrossRef]
Mansfield, M.L. Statistical analysis of winter ozone exceedances in the Uintah Basin, Utah, USA. J. Air Waste Manag. Assoc. 2018, 68, 403–414. [Google Scholar] [CrossRef]
Finlayson-Pitts, B.J.; Pitts, J.N., Jr. Chemistry of the Upper and Lower Atmosphere: Theory, Experiments, and Applications; Elsevier: Amsterdam, The Netherlands, 1999. [Google Scholar]
Schnell, R.C.; Oltmans, S.J.; Neely, R.R.; Endres, M.S.; Molenar, J.V.; White, A.B. Rapid photochemical production of ozone at high concentrations in a rural site during winter. Nat. Geosci. 2009, 2, 120–122. [Google Scholar] [CrossRef]
Mansfield, M.L.; Hall, C.F. A survey of valleys and basins of the western United States for the capacity to produce winter ozone. J. Air Waste Manag. Assoc. 2018, 68, 909–919. [Google Scholar] [CrossRef]
Mansfield, M.L.; Hall, C.F. Statistical analysis of winter ozone events. Air Qual. Atmos. Health 2013, 6, 687–699. [Google Scholar] [CrossRef]
Oltmans, S.; Schnell, R.; Johnson, B.; Pétron, G.; Mefford, T.; Neely, R., III. Anatomy of wintertime ozone associated with oil and natural gas extraction activity in Wyoming and Utah. Elementa 2014, 2, 000024. [Google Scholar] [CrossRef]
Zadeh, L.A. Fuzzy sets. Inf. Control. 1965, 8, 338–353. [Google Scholar] [CrossRef]
Zadeh, L.A. The role of fuzzy logic in the management of uncertainty in expert systems. Fuzzy Sets Syst. 1983, 11, 199–227. [Google Scholar] [CrossRef]
Lareau, N.P.; Crosman, E.; David Whiteman, C.; Horel, J.; Hoch, S.W.; Brown, W.O.J.; Horst, T.W. The Persistent Cold-Air Pool Study. Bull. Am. Meteorol. Soc. 2013, 94, 51–63. [Google Scholar] [CrossRef]
Terzago, S.; Andreoli, V.; Arduini, G.; Balsamo, G.; Campo, L.; Cassardo, C.; Cremonese, E.; Dolia, D.; Gabellani, S.; von Hardenberg, J.; et al. Sensitivity of snow models to the accuracy of meteorological forcings in mountain environments. Hydrol. Earth Syst. Sci. 2020, 24, 4061–4090. [Google Scholar] [CrossRef]
Matichuk, R.; Tonnesen, G.; Luecken, D.; Gilliam, R.; Napelenok, S.L.; Baker, K.R.; Schwede, D.; Murphy, B.; Helmig, D.; Lyman, S.N.; et al. Evaluation of the Community Multiscale Air Quality Model for Simulating Winter Ozone Formation in the Uinta Basin. J. Geophys. Res. D Atmos. 2017, 122, 13545–13572. [Google Scholar] [CrossRef]
Tran, T.; Tran, H.; Mansfield, M.; Lyman, S.; Crosman, E. Four dimensional data assimilation (FDDA) impacts on WRF performance in simulating inversion layer structure and distributions of CMAQ-simulated winter ozone concentrations in Uintah Basin. Atmos. Environ. 2018, 177, 75–92. [Google Scholar] [CrossRef]
Herrero, J.; Polo, M.J. Parameterization of atmospheric longwave emissivity in a mountainous site for all sky conditions. Hydrol. Earth Syst. Sci. 2012, 16, 3139–3147. [Google Scholar] [CrossRef]
Awan, N.K.; Truhetz, H.; Gobiet, A. Parameterization-induced error characteristics of MM5 and WRF operated in climate mode over the alpine region: An ensemble-based Analysis. J. Clim. 2011, 24, 3107–3123. [Google Scholar] [CrossRef]
Gilliam, R.C.; Hogrefe, C.; Rao, S.T. New methods for evaluating meteorological models used in air quality applications. Atmos. Environ. 2006, 40, 5073–5086. [Google Scholar] [CrossRef]
Squitieri, B.J.; Gallus, W.A. On the forecast sensitivity of MCS cold pools and related features to horizontal grid-spacing in convection-allowing WRF simulations. Weather Forecast. 2019, 35, 325–346. [Google Scholar] [CrossRef]
Skamarock, W.C.; Klemp, J.B.; Dudhia, J.; Gill, D.O.; Liu, Z.; Berner, J.; Wang, W.; Powers, J.G.; Duda, M.G.; Barker, D.M.; et al. A Description of the Advanced Research WRF Model Version 4; Technical Report; National Science Foundation: Alexandria, VA, USA, 2019. [CrossRef]
Tennekes, H. Turbulent Flow In Two and Three Dimensions. Bull. Amer. Meteor. Soc. 1978, 59, 22–28. [Google Scholar] [CrossRef]
Bommer, P.L.; Kretschmer, M.; Hedström, A.; Bareeva, D.; Höhne, M.M.C. Finding the right XAI method—A guide for the evaluation and ranking of Explainable AI methods in climate science. Artif. Intell. Earth Syst. 2024, 3, 1–26. [Google Scholar] [CrossRef]
Potvin, C.K.; Flora, M.L.; Skinner, P.S.; Reinhart, A.E.; Matilla, B.C. Using machine learning to predict convection-allowing ensemble forecast skill: Evaluation with the NSSL Warn-on-Forecast System. Artif. Intell. Earth Syst. 2024, 3, 1–22. [Google Scholar] [CrossRef]
Casallas, A.; Ferro, C.; Celis, N.; Guevara-Luna, M.A.; Mogollón-Sotelo, C.; Guevara-Luna, F.A.; Merchán, M. Long short-term memory artificial neural network approach to forecast meteorology and PM2.5 local variables in Bogotá, Colombia. Model. Earth Syst. Environ. 2022, 8, 2951–2964. [Google Scholar] [CrossRef]
Park, M.; Zheng, Z.; Riemer, N.; Tessum, C.W. Learned 1D passive scalar advection to accelerate chemical transport modeling: A case study with GEOS-FP horizontal wind fields. Artif. Intell. Earth Syst. 2024, 3. [Google Scholar] [CrossRef]
Lindsey, D.; McNoldy, B.; Finch, Z.O.; Henderson, D.; Lerach, D.; Seigel, R.; Steinweg, J.; Stuckmeyer, E.A.; Van Cleave, D.T.; Williams, G.; et al. A high wind statistical prediction model for the northern Front Range of Colorado. Electron. J. Oper. Meteorol. 2011. [Google Scholar]
Keisler, R. Forecasting global weather with graph neural networks. arXiv 2022, arXiv:2202.07575. [Google Scholar] [CrossRef]
Jeon, H.J.; Kang, J.H.; Kwon, I.H.; Lee, O.J. CloudNine: Analyzing Meteorological Observation Impact on Weather Prediction Using Explainable Graph Neural Networks. arXiv 2024, arXiv:cs.LG/2402.14861. [Google Scholar]
Hakim, G.J.; Masanam, S. Dynamical tests of a deep-learning weather prediction model. Artif. Intell. Earth Syst. 2024, 3, 1–11. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Driess, D.; Xia, F.; Sajjadi, M.S.M.; Lynch, C.; Chowdhery, A.; Ichter, B.; Wahid, A.; Tompson, J.; Vuong, Q.; Yu, T.; et al. PaLM-E: An Embodied Multimodal Language Model. arXiv 2023, arXiv:cs.LG/2303.03378. [Google Scholar]
Chowdhery, A.; Narang, S.; Devlin, J.; Bosma, M.; Mishra, G.; Roberts, A.; Barham, P.; Chung, H.W.; Sutton, C.; Gehrmann, S.; et al. PaLM: Scaling Language Modeling with Pathways. arXiv 2022, arXiv:cs.CL/2204.02311. [Google Scholar]
Le Scao, T.; Fan, A.; Akiki, C.; Pavlick, E.; Ilić, S.; Hesslow, D.; Castagné, R.; Luccioni, A.S.; Yvon, F.; Gallé, M.; et al. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. arXiv 2022, arXiv:2211.05100. [Google Scholar]
Flora, M.L.; Potvin, C.K.; McGovern, A.; Handler, S. A Machine Learning Explainability Tutorial for Atmospheric Sciences. Artif. Intell. Earth Syst. 2024, 3, e230018. [Google Scholar] [CrossRef]
Camastra, F.; Ciaramella, A.; Giovannelli, V.; Lener, M.; Rastelli, V.; Staiano, A.; Staiano, G.; Starace, A. A fuzzy decision system for genetically modified plant environmental risk assessment using Mamdani inference. Expert Syst. Appl. 2015, 42, 1710–1716. [Google Scholar] [CrossRef]
Chase, R.J.; Harrison, D.R.; Lackmann, G.M.; McGovern, A. A Machine Learning Tutorial for Operational Meteorology. Part II: Neural Networks and Deep Learning. Weather Forecast. 2023, 38, 1271–1293. [Google Scholar] [CrossRef]
Höhlein, K.; Schulz, B.; Westermann, R.; Lerch, S. Postprocessing of Ensemble Weather Forecasts Using Permutation-Invariant Neural Networks. Artif. Intell. Earth Syst. 2024, 3, e230070. [Google Scholar] [CrossRef]
Horel, J.; Splitt, M.; Dunn, L.; Pechmann, J.; White, B.; Ciliberi, C.; Lazarus, S.; Slemmer, J.; Zaff, D.; Burks, J. Mesowest: Cooperative mesonets in the western United States. Bull. Am. Meteorol. Soc. 2002, 83, 211–225. [Google Scholar] [CrossRef]
Shapiro, A.F. The merging of neural networks, fuzzy logic, and genetic algorithms. Insur. Math. Econ. 2002, 31, 115–131. [Google Scholar] [CrossRef]
Zadeh, L.A. Is there a need for fuzzy logic? Inf. Sci. 2008, 178, 2751–2779. [Google Scholar] [CrossRef]
Sarker, I.H. AI-based modeling: Techniques, applications and research issues towards automation, intelligent and smart systems. SN Comput. Sci. 2022, 3, 158. [Google Scholar] [CrossRef]
Asklany, S.A.; Elhelow, K.; Youssef, I.K.; Abd El-wahab, M. Rainfall events prediction using rule-based fuzzy inference system. Atmos. Res. 2011, 101, 228–236. [Google Scholar] [CrossRef]
Mitra, A.K.; Nath, S.; Sharma, A.K. Fog forecasting using rule-based fuzzy inference system. J. Ind. Soc. Remote Sens. 2008, 36, 243–253. [Google Scholar] [CrossRef]
Zadeh, L. A simple view of the Dempster-Shafer theory of evidence and its implication for the rule of combination. AI Mag. 1985, 7, 85–90. [Google Scholar] [CrossRef]
Dubois, D.; Prade, H. Possibility Theory: An Approach to Computerized Processing of Uncertainty; Plenum Press: New York, NY, USA, 1988. [Google Scholar]
Lorenz, E.N. Deterministic Nonperiodic Flow. J. Atmos. Sci. 1963, 20, 130–141. [Google Scholar] [CrossRef]
May, R.M. Simple mathematical models with very complicated dynamics. Nature 1976, 261, 459–467. [Google Scholar] [CrossRef]
Dubois, D.; Prade, H. Fuzzy set and possibility theory-based methods in artificial intelligence. Artif. Intell. 2003, 148, 1–9. [Google Scholar] [CrossRef]
Nedjah, N.; de Macedo Mourelle, L. Fuzzy Systems Engineering: Theory and Practice, 2005 ed.; Studies in Fuzziness and Soft Computing; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
Chang, F.J.; Chang, Y.T. Adaptive neuro-fuzzy inference system for prediction of water level in reservoir. Adv. Water Resour. 2006, 29, 1–10. [Google Scholar] [CrossRef]
Abraham, A. Adaptation of fuzzy inference system using neural learning. In Fuzzy Systems Engineering; Studies in Fuzziness and Soft Computing; Springer: Berlin/Heidelberg, Germany, 2005; pp. 53–83. [Google Scholar] [CrossRef]
Zadeh, L.A.; Klir, G.J.; Yuan, B. Fuzzy Sets, Fuzzy Logic, and Fuzzy Systems: Selected Papers; World Scientific: Singapore, 1996. [Google Scholar]
Mamdani, E.H. Advances in the linguistic synthesis of fuzzy controllers. Int. J. Man. Mach. Stud. 1976, 8, 669–678. [Google Scholar] [CrossRef]
Williams, R.M.; Ferro, C.A.T.; Kwasniok, F. A comparison of ensemble post-processing methods for extreme events. Quart. J. Roy. Meteor. Soc. 2014, 140, 1112–1120. [Google Scholar] [CrossRef]
Sterk, A.E.; Stephenson, D.B.; Holland, M.P.; Mylne, K.R. On the predictability of extremes: Does the butterfly effect ever decrease? Q. J. R. Meteorol. Soc. 2016, 142, 58–64. [Google Scholar] [CrossRef]
Palmer, T.N.; Döring, A.; Seregin, G. The real butterfly effect. Nonlinearity 2014, 27, R123. [Google Scholar] [CrossRef]
Zadeh, L.A. Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst. 1978, 1, 3–28. [Google Scholar] [CrossRef]
Le Carrer, N. Possibly extreme, probably not: Is possibility theory the route for risk-averse decision-making? Atmos. Sci. Lett. 2021, 22, 1–13. [Google Scholar] [CrossRef]
Oussalah, M. On the normalization of subnormal possibility distributions: New investigations. Int. J. Gen. Syst. 2002, 31, 277–301. [Google Scholar] [CrossRef]
Buizza, R. Accuracy and Potential Economic Value of Categorical and Probabilistic Forecasts of Discrete Events. Mon. Weather Rev. 2001, 129, 2329–2345. [Google Scholar] [CrossRef]
Taleb, N.N. The Black Swan the Impact of the Highly Improbable, 1st ed.; Random House: New York, NY, USA, 2007. [Google Scholar]
Palmer, T.N. Quantum Reality, Complex Numbers, and the Meteorological Butterfly Effect. Bull. Am. Meteorol. Soc. 2005, 86, 519–530. [Google Scholar] [CrossRef]
Chakraverty, S.; Sahoo, D.M.; Mahato, N.R. Defuzzification. In Concepts of Soft Computing; Springer Singapore: Singapore, 2019; pp. 117–127. [Google Scholar] [CrossRef]
Li, H.; Wang, X.; Choy, S.; Jiang, C.; Wu, S.; Zhang, J.; Qiu, C.; Zhou, K.; Li, L.; Fu, E.; et al. Detecting heavy rainfall using anomaly-based percentile thresholds of predictors derived from GNSS-PWV. Atmos. Res. 2022, 265, 105912. [Google Scholar] [CrossRef]
Han, J.; Pei, J.; Tong, H. Data Mining: Concepts and Techniques; Morgan Kaufmann: Burlington, MA, USA, 2022. [Google Scholar]
Wilks, D.S. Statistical Methods in the Atmospheric Sciences; Academic Press: Cambridge, MA, USA, 2011. [Google Scholar]
Demuth, J.L.; Morss, R.E.; Palen, L.; Anderson, K.M.; Anderson, J.; Kogan, M.; Stowe, K.; Bica, M.; Lazrus, H.; Wilhelmi, O.; et al. “Sometimes da #beachlife ain’t always da wave”: Understanding People’s Evolving Hurricane Risk Communication, Risk Assessments, and Responses Using Twitter Narratives. Weather Clim. Soc. 2018, 10, 537–560. [Google Scholar] [CrossRef]
Zounemat-Kermani, M.; Teshnehlab, M. Using adaptive neuro-fuzzy inference system for hydrological time series prediction. Appl. Soft Comput. 2008, 8, 928–936. [Google Scholar] [CrossRef]
Zhou, X.; Zhu, Y.; Hou, D.; Fu, B.; Li, W.; Guan, H.; Sinsky, E.; Kolczynski, W.; Xue, X.; Luo, Y.; et al. The development of the NCEP global ensemble forecast system version 12. Weather Forecast. 2022, 37, 1069–1084. [Google Scholar] [CrossRef]
Harrison, L.; Landsfeld, M.; Husak, G.; Davenport, F.; Shukla, S.; Turner, W.; Peterson, P.; Funk, C. Advancing early warning capabilities with CHIRPS-compatible NCEP GEFS precipitation forecasts. Sci. Data 2022, 9, 375. [Google Scholar] [CrossRef]
Trujillo-Falcón, J.E.; Reedy, J.; Klockow-McClain, K.E.; Berry, K.L.; Stumpf, G.J.; Bates, A.V.; LaDue, J.G. Creating a Communication Framework for FACETs: How Probabilistic Hazard Information Affected Warning Operations in NOAA’s Hazardous Weather Testbed. Weather. Clim. Soc. 2022, 14, 881–892. [Google Scholar] [CrossRef]
Lawson, J.R.; Flora, M.L.; Goebbert, K.H.; Lyman, S.N.; Potvin, C.K.; Schultz, D.M.; Stepanek, A.J.; Trujillo-Falcón, J.E. Pixels and Predictions: Potential of GPT-4V in Meteorological Imagery Analysis and Forecast Communication. arXiv 2024, arXiv:cs.CL/2404.15166. [Google Scholar]
Warner, J. JDWarner/Scikit-Fuzzy: Scikit-Fuzzy Version0.4.2. Available online: https://github.com/scikit-fuzzy/scikit-fuzzy (accessed on 1 January 2024).

Figure 1. Geographical domain of the present study. Panel (a) is a satellite image showing the approximate bounding box of the Uinta Basin. The red circle and text denotes the radius from which all available observations were obtained for the study period. The red cross marks the center of that radius. Blue circles are towns, green points are geological features, and black squares mark observation stations reporting snow depth via the COOP network. Major orographic features bounding the basin’s perimeter are labeled with a cyan background. The black vertical line marks the Utah–Colorado boundary (Utah to the west). In panel (b), the context of the Uinta Basin (whose bounding box is labeled and marked in red) in shown within the Intermountain West of the continental United States.

Figure 2. Scatter plot of representative ozone against wind speed for the 2021–2022 winter. The purple dashed line indicates the NAAQS limit. Red scatter markers denote days exceeding the NAAQS limit; orange markers are within 10 ppb; other days are in blue.

Figure 3. Membership functions for the representative basin value for 10 m wind speed. The x-axis range is zoomed to capture the salient aspects of the sigmoids.

Figure 4. Membership functions for daily median snow depth. As in Figure 3, the x-axis range is zoomed to capture the salient aspects of the sigmoids.

Figure 5. Membership functions for daily median mean sea-level pressure (MSLP).

Figure 6. Membership functions for incoming short-wave solar radiation.

Figure 7. Membership function for daily maximum of atmospheric ozone concentration.

Figure 8. The possibility of each ozone fuzzy set (filled color), membership function overlaid (colored line), both generated by the inference system for a likely high-ozone day. The solid black line indicates the centroid-derived crisp value, while the dashed black line is a typical background ozone-concentration level for reference (35 ppb). The dashed magenta line indicates the NAAQS limit for ozone.

Figure 9. As Figure 8 but for a scenario unlikely to yield ozone in excess of background levels.

Figure 10. As Figure 8 but for a scenario at the cusp of yielding predictions of elevated ozone levels.

Figure 11. As Figure 8 but for an impossible scenario with summer snow, unforeseen by Clyfar.

Figure 12. Full forecast of centroid (hedged) and observed (orange) values, denoted as blue and orange lines, respectively. The four possibility levels, color-coded as the legend, are overlaid so that more extreme levels are plotted higher in the stack of bars for conspicuousness. The arrows denote the three days examined in further depth in the text. Horizontal lines mark the NAAQS limit (in magenta) and typical background concentrations (in black).

Figure 13. Inferred possibility of ozone categories valid 14 December 2021, showing background predicted well. F and O denote the rough category that the forecast and observed values fell into, respectively. The annotated rank displays that possibility value’s percentile in this winter’s set.

Figure 14. Forecast of ozone categories valid 2 January 2022, subjectively a poorly forecast case.

Figure 15. Forecast of ozone categories valid 27 February 2022. This was a subjectively good forecast, including in the deterministic time series (Figure 12).

Figure 16. Box-and-whisker distribution plot for possibility for each category of ozone concentration daily maximum for the winter 2021/2022. Circles are individual events. Blue boxes represent the interquartile range, and the green horizontal line is the maximum value per category.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lawson, J.R.; Lyman, S.N. A Preliminary Fuzzy Inference System for Predicting Atmospheric Ozone in an Intermountain Basin. Air 2024, 2, 337-361. https://doi.org/10.3390/air2030020

AMA Style

Lawson JR, Lyman SN. A Preliminary Fuzzy Inference System for Predicting Atmospheric Ozone in an Intermountain Basin. Air. 2024; 2(3):337-361. https://doi.org/10.3390/air2030020

Chicago/Turabian Style

Lawson, John R., and Seth N. Lyman. 2024. "A Preliminary Fuzzy Inference System for Predicting Atmospheric Ozone in an Intermountain Basin" Air 2, no. 3: 337-361. https://doi.org/10.3390/air2030020

APA Style

Lawson, J. R., & Lyman, S. N. (2024). A Preliminary Fuzzy Inference System for Predicting Atmospheric Ozone in an Intermountain Basin. Air, 2(3), 337-361. https://doi.org/10.3390/air2030020

Article Menu

A Preliminary Fuzzy Inference System for Predicting Atmospheric Ozone in an Intermountain Basin

Abstract

1. Introduction

1.1. Seeking an Alternative to Traditional Air-Quality Models

1.2. From Machine Intelligence to Ozone Prediction

2. Data and Methodology

2.1. Data Sources and Pre-Processing

2.2. Fuzzy Logic: Background and Justification

3. Configuration of Clyfar: A Fuzzy Inference System for Ozone Prediction

3.1. Overview of Approach

3.2. Pre-Processing and Membership Functions

3.3. Ruleset of UBWO Behavior

4. Illustrative Examples

4.1. Case 1: Ozone Likely

4.2. Case 2: Ozone Unlikely

4.3. Case 3: On the Cusp

4.4. Case 4: Ignorance

5. Case Study: Winter 2021/2022

5.1. 14 December 2021: Example of Background Signal

5.2. 2 January 2022: Poor Forecast

5.3. 27 February 2022: Good Forecast

6. Conclusions and Future Work

Future Work: Optimizing and Deployment

Author Contributions

Funding

Data Availability Statement

Use of Artificial Intelligence

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI