Classification and Prediction of Nitrogen Dioxide in a Portuguese Air Quality Critical Zone
Abstract
:1. Introduction
1.1. Context
1.2. Motivation
- 1.
- North Metropolitan Area of Lisbon (MAL) or, similarly, Lisboa Norte in Portuguese;
- 2.
- Coastal Porto or, similarly, Porto Litoral in Portuguese; and
- 3.
- Between Douro and Minho or, equivalently, Entre Douro e Minho in Portuguese.
1.3. Goals and Research Questions
- Other air pollutants, namely:
- 1.
- Particle matter with a diameter of less than 2.5 m (PM);
- 2.
- Particle matter with a diameter of less than 10 m (PM);
- 3.
- Carbon monoxide (CO); and
- 4.
- Ozone (O).
- Information about meteorological conditions, namely:
- 1.
- Hourly measurements of the atmospheric pressure (AP);
- 2.
- Hourly measurements of the relative humidity (RH);
- 3.
- Hourly measurements of the temperature (T);
- 4.
- Hourly measurements of the precipitation (P);
- 5.
- Hourly measurements of the wind intensity (WI);
- 6.
- Hourly measurements of the wind direction (WD); and
- 7.
- Hourly measurements of the ultraviolet index (UVI).
- Information about noise, measured by the equivalent continuous sound level (ECSL); and
- Information about road traffic, measured by the total hourly traffic volume (THTV).
- 1.
- Which impact in terms of sign and magnitude does each explanatory variable have on the selected target?
- 2.
- Can the internationally patented VSCA model adequately predict the evolution of NO2 in a Portuguese air quality critical zone?
1.4. Literature and Research Hypotheses
- H4 guarantees that the classification analysis is categorized according to different peak traffic hours and/or time periods;
- H5 tests whether the use of electric vehicles, which are absent of CO emissions, plays a role on influencing NO concentration levels in the atmosphere; and
- H6 ensures a credible prediction of NO in one of the Portuguese cities with the highest level of air pollution caused by this specific pollutant.
1.5. Summary of Results
- 8–16 h is the period where compliance with the NO’s hourly ceiling is most compromised for a given increase in the level of CO emissions;
- Several regularization procedures confirm that the set of input variables used in the benchmark analysis has explanatory power on the target;
- The input-oriented model aimed at minimizing NO emissions, which is solved through several variants of stochastic frontier analysis, confirms the influence and significance of input variables on the target;
- The two-step continual learning approach also validates benchmark findings from a qualitative point of view; and
- Considering multiple scenarios, a variance importance analysis shows that CO unambiguously has the highest relevance on explaining NO concentration levels.
- 1.
- The level of NO concentrations in the atmosphere is expected to be volatile; and
- 2.
- This air pollutant does not have a downward trend in test and/or validation sets.
2. Materials and Methods
2.1. Overview of Atmospheric Pollution Caused by Nitrogen Dioxide
2.1.1. Description
2.1.2. Targets
- Ensure compliance with objectives established at the EU level in terms of air quality, which aim to avoid and prevent harmful effects of air pollutants on human health and the environment;
- Assess the air quality throughout the national territory, with a special focus on urban centers; and
- Portuguese legislation materialized by Decree-Law No. 102/2010 of 2010 establishes the obligation to preserve air quality in cases where it is good and improve it in other cases; moreover, it imposes two ceilings on NO—hourly limit value (HLV) and annual limit value (ALV)—that cannot be exceed:
- 1.
- HLV (i.e., a limit value for the average hourly concentration of 200 g/m of NO, which must not be exceeded more than 18 times per calendar year); and
- 2.
- ALV (i.e., a limit value for the average annual concentration of NO of 40 g/m).
2.1.3. Evolution
2.2. Data
- 1.
- Air quality;
- 2.
- Noise;
- 3.
- Road traffic; and
- 4.
- Meteorological conditions
- A grid with a spacing of 2 × 2 km;
- Inclusion of requirements related to the Sharing Cities project;
- Incorporation of geographical boundaries of each parish and Reduced Emission Zones (REZs);
- Areas with a homogeneous climate response, land use and occupation rate; and
- The road network proximity to sources of pollution.
2.3. Preliminary Statistical Tests and Hypotheses Testing
- 1.
- Multicollinearity;
- 2.
- Normality, skewness and kurtosis;
- 3.
- Homoscedasticity;
- 4.
- Autocorrelation or, similarly, serial correlation;
- 5.
- Stationary;
- 6.
- Specification; and
- 7.
- Hypotheses testing.
2.3.1. Multicollinearity
2.3.2. Normality, Skewness and Kurtosis
2.3.3. Homoscedasticity
2.3.4. Autocorrelation
- OLS combined with the Newey–West estimator, after [22] proposing a estimator that is consistent in the presence of heteroscedasticity and/or autocorrelation generated by stationary processes and that asymptotically enables the statistical inference conducted from the OLS results; this estimator is based on a function of estimated residuals obtained by OLS and, unlike White’s procedure, takes into account not only sample variances, but also sample covariances;
- First-order generalized least squares (GLS) method, either in GLS (AR(1)) or EGLS (AR(1)) form, respectively depending on whether is known or not; hence, the GLS estimation is equivalent to the OLS estimation of the original model after this being transformed by the first-order generalized difference method;
- Two-step Cochrane–Orcutt method [23] and subsequent extensions where, in first place, the original model is estimated by OLS to obtain the residuals to estimate the equation , to obtain and then, in second place, use this estimate to construct a set of new input variables from which results a transformed model by first differences; and
- Nonlinear least squares (NLS) method, where the minimization condition cannot be obtained analytically, being necessary to resort to a search algorithm that is developed in several iterations, starting with predefined initial values. The search process ends when a certain convergence criterion defined in advance is satisfied.
2.3.5. Stationary
2.3.6. Specification
2.3.7. Hypotheses Tests
2.4. Classification Analysis
2.4.1. Benchmark Model
- 1 = HLV is satisfied since the observation i stands below the HLV (alternative); or
- 0 = HLV is surpassed and, thus, not satisfied (base category),
- 1.
- Marginal effects vary across distinct observations given that different explanatory variables influence the probability density function ; and
- 2.
- Marginal effects can vary for a given observation since the magnitude of the effect may be different from one value of to another.
2.4.2. Technical Extensions
- 1.
- Application of a supervised machine learning (ML) model combined with several regularization methods to the set of inputs used in the classification exercise;
- 2.
- Use of a two-step continual learning (CL) approach;
- 3.
- Execution of a variable importance analysis (VIA);
- 4.
- Evaluation of the ordered multinomial discrete choice model (OM-DCM) by increasing the number of categories that define the target; and
- 5.
- Development of a stochastic frontier analysis (SFA);
- 1.
- Initially, a supervised ML model—logistic LASSO—is exogenously chosen;
- 2.
- The objective function of this supervised ML model accommodates a penalty component that must be optimized. To achieve this purpose, a certain penalized regression technique (a.k.a. regularization method or tuning process) is applied to the initial set of covariates in order to find the general degree of penalization (a.k.a. penalty level) that optimizes the objective function, which is represented by . After identifying the optimal value of this parameter, one can find which input variables have a statistically significant impact on the target and, for each of these, the weight or coefficient on the target which, similarly to what was verified in the initial exercise, can only be interpreted in terms of sign. In this extension, three regularization techniques are combined with the logistic LASSO model:
- k-fold cross-validation, whose purpose is to assess the out-of-sample classification performance;
- AIC, BIC and Extended Bayesian information criterion (EBIC), by finding the penalty level that corresponds to the lowest value taken by each of these information criteria; and
- Rigorous penalization, which corresponds to a penalty level formally given by , where c is a slack parameter with default value of 1.1, is the standard normal cumulative distribution function (CDF) and is the significance level, whose default value is . According to [26], this approach requires standardized predictors and the penalty level is motivated by self-normalized moderate deviation theory to overrule the noise associated with the data-generating process.
- 3.
- After identifying covariates with explanatory power on the target, coefficients are estimated; and
- 4.
- Although not mandatory because classification metrics (e.g., cross-entropy loss) can be reported, we convert the classification problem into a regression problem in order to run the Wilcoxon signed-rank test and provide the following metrics: RMSE, MAE, Mean Absolute Percentage Error (MAPE) and Theil’s U statistic. The later is a relative accuracy measure that compares the predicted outcomes with results of forecasting with minimal historical data.
- 1.
- In a first step, random forest (RF) is applied to the set of covariates holding a strong relative importance on explaining the dependent variable in order to mitigate the risk of overfitting; and
- 2.
- In a second step, a probit model estimated by the maximum likelihood method is applied to predicted target values obtained in the first step to mitigate the risk of underfitting.
- Applied to the entire set of input variables;
- Only applied to meteorological data;
- Only applied to ai pollutants; and
- Only applied to statistically significant covariates.
2.4.3. Normative Improvements through Sample Segmentation
- 1.
- Peak traffic hours (8 h–16 h, 16 h–24 h and 24 h–8 h);
- 2.
- Seasons (summer, spring, autumn and winter); and
- 3.
- The weekend–weekdays dichotomy.
2.5. Prediction Analysis
Framework
- 1.
- AP because the respective Wald test indicates that this covariate is statistically significant for a critical p-value of 10% when NO residuals follow a logistic distribution, which corresponds to the inclusion of a softmax layer in DLNN frameworks;
- 2.
- The target lagged in time by one period, , based on the presence of autocorrelation identified by Durbin–Watson, Breusch–Godfrey and autoregressive conditional heteroscedasticity (ARCH) tests; and
- 3.
- ECSL, which is not individually significant according to the Wald test, so that the DLNN models used in the prediction exercise are expected to assign a null weight to this input variable; hence, its inclusion is justified by the need to identify the consistency between the statistical robustness associated with the pre-processing phase and the set of different DLNN models used in the forecast analysis.
- WaveNet defined by [31], where the application of a grid search to find its optimal depth shows that this is given by . A relevant aspect from this model is the use of causal convolutions, which ensures that the order of data modeling is not violated. Hence, the prediction generated by the WaveNet at time-step cannot depend on future time-steps. Another key characteristic is the use of dilated convolutions to gradually increase the receptive field. A dilated convolution is a convolution where the kernel K is applied over one area larger than its length k by skipping input values with a certain step. It is equivalent to a convolution with a larger filter derived from the original filter by dilating it with zeros. Hence, when using a dilation rate for any , the causal padding has size given by . Finally, as recognized by [32], the residual block is the heart of a WaveNet, being constituted by two convolutional layers, one using the sigmoid activation function and another using the tanh activation function, which are multiplied by each other. Then, inside the block, the result is pass through into another convolution with and . This technique is referred to as the projection operation or channel pooling layer. Both residual and parametrized skip connections are used throughout the network to accelerate convergence and enable training of much deeper models. This block is executed a given number of times in the depth of the network, with depth}. The dilatation increases exponentially according to the formula .
- Temporal Convolutional Network (TCN) defined by [33], where the application of a grid search allows us to conclude that one should optimally consider , with and a global kernel size given by . The TCN is characterized by three main characteristics:
- -
- Convolutions in the architecture are causal, which means that there is no information moving from future to past since causal convolutions imply that an output at time t is convolved only with elements from time t and earlier (i.e., sequence modeling) from the previous layer;
- -
- The architecture can take an input sequence of any length and map it to an output sequence of the same length similarly to RNNs; hence, similar to the WaveNet model, causal padding of length is added to keep subsequent layers with the same length as previous ones; and
- -
- It is possible to build a long and effective history size using deep neural networks augmented with residual blocks and dilated convolutions.
- Standard Long Short-Term Memory (LSTM) of [34], where the application of a grid search reveals as optimal strategy the inclusion of a first LSTM layer with 64 neurons, input shape of , active return sequences and ReLu activation function, followed by a second LSTM layer with 32 neurons and without active return sequences; these layers are followed by three dense layers, where the last one for the output is defined with tanh activation function; the remaining optimal technical options include 100 epochs with a batch size of 64 and applying the Adam optimizer. Conceptually, the LSTM has three basic requirements:
- -
- The system should store information for arbitrary durations;
- -
- The system should be resistant to noise (i.e., fluctuations of random or irrelevant inputs to predict the correct output); and
- -
- Parameters should be trainable in a reasonable time.
The LSTM adds a cell state c to the standard RNN that runs straight down the entire recursive chain with only some minor linear interactions to control the information that needs to be remembered. These control structures are called gates. Activations are stored in the internal state of an LSTM unit (i.e., recursive neuron), which holds long-term temporal contextual information. A common architecture consist of a memory cell c and its update input gate i, output gate o and forget gate f. Gates have in and out connections. Weights of connections, which need to be learned during training, are used to orient the operation of each gate. - LSTM of [34] complemented by the standard attention mechanism of [35]. A standard attention mechanism is a residual block that multiplies the output with its own input and then reconnects to the main pipeline with a weighted scaled sequence. Scaling parameters are called attention weights and the result is called context weights for each value i of the input sequence. As such, c is the context vector of sequence size n. The computation of results from applying a softmax activation function to the input sequence on layer l, thereby meaning that input values of the sequence compete with each other to receive attention. Knowing that the sum of all values obtained from the softmax activation corresponds to 100%, attention weights in the attention vector have values within .
- Bi-Dimensional Convolutional Long Short-Term Memory (ConvLSTM2D) defined by [36] considering 7 segments, where each segment represents 1 day, each day consists of 24 time-step hours and each time-step corresponds to 1 h. The ConvLSTM2D is a recurrent layer similar to the LSTM, but internal matrix multiplications are exchanged with convolution operations. Since we are dealing with a MTS problem, the concept of sequence segment is introduced to make inputs compatible with ConvLSTM2D layers, which have to deal with segments of temporal sequences. Data flow through the ConvLSTM2D cells by keeping a 3D input dimension composed by Segments × TimeSteps × Variables rather than merely applying a 2D input dimension composed by TimeSteps × Variables as observed in the standard LSTM.
- 1.
- Inclusion of convolutional layers inside the attention block to ensure that input variables are split to assess the individual influence of each input variable on the target [1]; and
- 2.
- Incorporation of roll padding, which is a new padding method that allows us to dissuade the specification problem identified by the Ramsey test.
- One dimensional (1D) convolutional (i.e., Conv1D) layers; and
- Two dimensional (2D) convolutional (i.e., Conv2D) layers.
2.6. Conceptual Framework and Methodological Pipeline
3. Results
3.1. Classification
3.1.1. Performance, Model Selection, Interpretation of Coefficients and Marginal Effects
- Negative sign of O and CO indicates that satisfying the HLV of NO is less likely with additional level of O and CO emissions;
- Positive sign of WD reveals that satisfying the HLV of NO is more likely when the wind direction changes from north to east; and
- Even more interestingly, the null value taken by the representative coefficient of THTV suggests that higher road traffic does not influence the fulfilment of NO’s legal hourly ceiling. Consequently, this result allows us to conclude that THTV has a neutral character on satisfying the HLV associated with the target.
- In average terms, in the set of 7824 observations of the sample, it is estimated that an additional g/m of CO emitted to the atmosphere decreases the probability of satisfying the HLV of NO by 19.1 percentage points (p.p.); however, at the mean observation level, the probability of satisfying the HLV of NO only decreases 2.3 p.p.;
- In average terms, in the set of 7824 observations of the sample, it is estimated that an additional g/m of ozone concentrated in the air decreases the probability of satisfying the HLV of NO by 0.04 p.p. while, at the mean observation level, this probability drops 0.01 p.p.;
- At the level of wind direction, the exact value of the average marginal effect is 0.0000575 which suggests that, for each change in wind direction from north to east in the magnitude of 100 degrees, the probability of satisfying the HLV concentration of NO increases by 0.58 p.p.; at the average observation level, the increase is only 0.07 p.p. given the value 7.02 × 10 taken by the respective marginal effect; and
- With regard to road traffic, the exact value of the average marginal effect is 4.12 × 10, which suggests that, for every 10,000 additional vehicles circulating per hour in the city of Lisbon, the probability of satisfying the HLV of NO concentration decreases by 4.12 p.p.; at the mean observation level, the reduction is only 0.05 p.p. given the value 5.03 × 10 taken by the respective marginal effect.
- H1 and H3 are verified and thus unequivocally validated; notwithstanding, despite the magnitude being low, results indicate that a change in wind direction can influence the probability of satisfying NO’s legal hourly ceiling;
- H2 is only partially verified because, although O negatively influences the probability of compliance with the HLV associated with the target, the magnitude of the effect is redundant; and
- H5 is also ratified such that, during the period between 1 September 2021 and 23 July 2022, results suggest that the relationship between THTV and NO is characterized by the approximation to what, in the absence of a better terminology, can be considered a principle of neutrality.
3.1.2. Standard Regularization Techniques Combined with Logistic LASSO
- 1.
- Depending on the penalization technique adopted to reduce input dimensionality, it can be concluded that only AP and WI are included on top of the input variables used in the initial exercise;
- 2.
- Results show that the sign exhibited by the estimated coefficients is coherent across different regularization techniques.; and
- 3.
- More importantly, the sign of each coefficient does not change relative to the benchmark scenario, which qualitatively reinforces the initial findings.
3.1.3. Continual Learning
3.1.4. Variable Importance Analysis
- Regardless of the scenario, CO has the highest relative importance on the target;
- WD has the strongest relative importance on NO if considering only weather conditions; and
- VIA outcomes are clearly consistent with those yielding in the benchmark analysis.
3.1.5. Ordered Multinomial Discrete Choice Model
- With the probit model, one unit increase in CO is associated with being 88.65 p.p. less likely to fall in the category of low NO concentration levels, 87.26 p.p. more likely to stand in the intermediate category of NO concentration levels—where the HLV either may or may not be surpassed—and 1.39 p.p. more likely to be part of the category that largely exceeds the HLV of NO; and
- With the logit model, one unit increase in CO is associated with being 92.28 p.p. less likely to fall in the category of low NO concentration levels, 90.86 p.p. more likely to stand in the intermediate category of NO concentration levels and 1.43 p.p. more likely to be part of the category that largely exceeds the HLV of NO, which implies that the logit estimation sharpens even more the chance of falling into the low or intermediate categories of NO.
- With the probit model, one unit increase in CO is associated with being 84.88 p.p. less likely to fall in the category of low NO concentration levels and 84.88 p.p. more likely to stand in the intermediate category of NO concentration levels, while it has a null influence on the likelihood of belonging to the category that largely exceeds the HLV of NO; and
- With the logit model, one unit increase in CO is associated with being 75.99 p.p. less likely to fall in the category of low NO concentration levels and 75.98 p.p. more likely to stand in the intermediate category of NO concentration levels, while it has a null influence on the likelihood of belonging to the category that largely exceeds the HLV of NO; therefore, in opposition to what happens with the average marginal effects, the logit model smooths interchangeability between the low and intermediate category of NO in relation to the probit model.
3.1.6. Stochastic Frontier Analysis
3.1.7. Normative Improvements through Sample Segmentation
3.2. Prediction
3.2.1. Performance Evaluation
- 1.
- Adaptability and flexibility to other contexts (e.g., forecasting the evolution of inflation rate in the context of supply-side shocks); and
- 2.
- Establishes the rationale for a credible and effective connection between the field of Advanced Econometrics and Artificial Intelligence (AI) by providing a statistical ground for the inclusion of technical components such as the roll padding rather than falling into the fallacy associated with a pseudo-interpretation of weights observed in some AI studies.
3.2.2. Robustness Checks
- 1.
- A strong volatility; and
- 2.
- A negligible downward trend.
- Change in the number of recursive neurons composing the first LSTM layer from 64 to as observed in Figure 15a–d;
- Modification of the number of dense layers from 3 to as observed in Figure 16a–c;
- Change the batch size from 64 to as observed in Figure 17a–e;
- Change in the number of epochs from 100 to as observed in Figure 18a–d;
- Inclusion of dropout in the different layers, with as observed in Figure 19a–d;
- Modification of the activation function used in the dense layer for the output from Tanh to as observed in Figure 20a–c; and
- Change Adam to alternative set of optimizers as observed in Figure 21a–g, while bearing in mind that:
- -
- Adam is a stochastic gradient descent method that is based on the adaptive estimation of first (i.e., mean) and second order (i.e., variance and covariance) moments;
- -
- RMSPROP maintains a discounted moving average of the square of the gradients, dividing the gradient by the root of the mean;
- -
- Adadelta is a stochastic gradient descent method that relies on the adaptive learning rate per dimension to solve the disadvantage of continuous drop in the learning rate throughout training;
- -
- Adagrad is an optimizer with a parameterized learning rate, adapting to the frequency for which a parameter is updated during training (i.e., the more updates a parameter receives, the smaller the parameterizations);
- -
- SGD is a stochastic gradient descent method using the Nesterov momentum when the learning rate decreases;
- -
- Nadam is an optimizer derived from Adam including Nesterov momentum and with a correlation with the RMSPROP optimizer; and
- -
- FTRL is a method that uses a unique global base learning rate and can behave such as Adagrad with learning rate power or as a descending gradient with learning rate power close to zero.
4. Discussion
4.1. Classification
4.1.1. Main Contributions and Identification of Potential Market Failures
- 1.
- It demonstrates that CO is a determinant of NO’s HLV disruption and identifies a negative relationship between CO and satisfaction of NO’s HLV; and
- 2.
- It shows evidence of a neutrality principle between NO and mobility measured by the THTV.
- In mitigated fields, including projects to promote electric mobility;
- In the decarbonization of cities and industry; and
- In adaptation and cooperation on climate change, environmental education, water resources, circular economy, nature conservation and biodiversity.
4.1.2. Contemporaneous Governance Framework
- Recognizes the fundamental role of forests, biodiversity and ecosystem services in building a territory more cohesive and resilient to the effect of climate change; and
- Recognizes the need to protect and enhance the coast, promoting a sustainable economy, combating desertification and helping to face demographic challenges.
- 1.
- Decarbonization: through the energy transition, sustainable mobility, circular economy and enhancement of natural capital, territory and forests, promoting initiatives such as sustainable financing, green taxation and environmental education. Among several dimensions of this challenge, the energy transition is the one that will contribute most to the reduction of CO emissions in upcoming years. This should rely on the decarbonization of energy systems, with special emphasis on the end of electricity production from coal, focus on energy efficiency, promotion of energy from renewable sources, placing the citizen at the center of energy policies, ensuring a level playing field and a cohesive transition;
- 2.
- Circular economy, mobility and transport: it is essential to implement a circular economy model that contributes to an efficient management of resources, which allows the exploration of new economic opportunities and promotes efficient waste management. It is also necessary a renewed and competitive public transport network, as well as a sustainable–electric and active–mobility.
4.1.3. Theoretical Recommendations: Main Priorities
- Increase investment and employment through the development of new industries and services;
- Reinforce research and development (R&D) and higher education systems;
- Foster economic growth through a substitution model of imports; and
- Benefit consumers, who will have lower costs when compared to costs they would have in the case of maintaining fossil dependence.
- Energy efficiency;
- Reinforcement of the diversification of renewable energy sources;
- Increase in electrification;
- Modernization of network infrastructures;
- Development of interconnections, reconfiguration and digitization of the market;
- Encouraging scientific research and innovation;
- Promoting low carbon processes, products and services; and
- Ensuring a more active and informed consumer participation.
- Effective impact on the quality of transport services;
- Expectation of ensuring a strong contribution to the pursuit of public policies for a decarbonization of the transport sector; and
- The goal of boosting the sector’s energy transition to renewable sources in order to promote the reduction of greenhouse gas emissions and the incorporation of renewable energy in the transport sector.
4.1.4. Practical Recommendations: Main Measures
- Sectoral emission reduction targets;
- Targets for the incorporation of energy from renewable sources; and
- Targets for the reduction of energy consumption.
- An assessment of legislative impacts on climate action, where the transformation should involve different levels of the public administration, from parishes to the national territory as a whole;
- Promote the quality of regional roadmaps for carbon neutrality that translate the ambition set at the national level, which is expected to have a repercussion at the local level with the promotion of carbon neutral city pacts (e.g., gastronomy routes like Pestico);
- Promote the creation of sustainable network communities in articulation with municipalities to boost sustainability efforts (e.g., eco-neighborhood, national network of circular cities, network of municipalities for carbon neutrality);
- Promote initiatives to mobilize actors of the private sector for decarbonization and develop sectoral roadmaps for the decarbonization of different industries;
- Develop a territorial plan for a fair transition focusing on territories potentially most affected by the change to a carbon neutral economy;
- Monitoring by creating a platform that aggregates information and constitutes a decision support tool;
- Promote sustainable financing through the elaboration of a national strategy, which should include the identification of incentives and the creation of conditions to establish a Portuguese green bank;
- Definition of environmental criteria as a requirement for the allocation and articulation between different public funds in order to orient the public funding to investments that foster a resilient, circular and carbon neutral society; and
- Adopt a fiscal policy in line with energy transition and decarbonization goals, introducing more fiscal incentives and promoting more sustainable behavior.
- Development of plans to expand a sustainable network of collective transport systems in the Metropolitan Area of Porto (MAP), MAL and medium-sized cities;
- Continuous investment in the decarbonization of mobility, both in collective and individual transport modes (e.g., in terms of collective electric mobility, new programs should be launched to support the renewal of public transport fleets through the acquisition of clean vehicles; in terms of individual electric mobility, incentive programs should be reinforced for the acquisition of 100% electric light vehicles);
- Promote active mobility, while focusing on improving the quality of life in cities and the attractiveness of urban spaces;
- Increase the autonomy and capacity building process of sectoral transport authorities to ensure an increasingly efficient and more effective management of the country’s transport networks;
- Promote innovative and intelligent solutions for the mobility of goods and people, which requires to invest in new assets, improve interventions that promote intermodality with alternative transport modes (e.g., cycling) and increase the accessibility and connectivity between different regions of the country;
- Regarding the promotion of urban public transport, it is important to maintain the Public Transport Fare Reduction Support Program (PART) and the Program to Support the Densification and Reinforcement of the Public Transport Offer (PROTransP);
- The National Strategy for Active Cyclable Mobility (ENMAC), the National Strategy for Active Pedestrian Mobility (ENMAP) and the Portugal Cyclable 2030 Program should extend active mobility solutions for cities (e.g., construct new cycling routes and totally support the private acquisition of eco-friendly bicycles; implement solutions to sharpen the complementarity between private transportation use and public transport network);
- To promote greener cities, it is essential to conceptualize alternative solutions for urban spaces (e.g., new logistic applications to support decarbonization; increase the efficiency of mobility; optimize deliveries generated by e-commerce); and
- Increase training actions of the staff belonging to sectoral transport authorities and stakeholders in general.
4.2. Prediction
- 1.
- For the period before building DLNN models; and
- 2.
- For the period corresponding to the implementation of DLNN models together with the period after the development of DLNN models.
4.2.1. Recommendations before Building Deep Learning Neural Networks
- Either collect data from a reliable source [42] or guarantee the intellectual property right over them in the case of holding data collected by own means in order to be fairly compensated for such a work;
- Perform exploratory data analysis [43], which requires preliminary statistical tests considered valuable to explain weaknesses and strengths of the data;
- Differently from [47], sharing projects beforehand with other experts is not recommended because the market for ideas is scarce and classical industrial economics theory already identified 40 years ago the persistence of a negative relationship between horizontal differentiation and competitiveness [48]; instead, either find a credible research partner or work with someone who has notability in the scientific field where you are looking to acquire spillover effects;
- Survey the existing literature to confirm the reading of past studies and respect for the effort developed by peers in previous research [49], which means not incurring the fallacy of citing only those who have international reputation, but any researcher who produces valuable content regardless of the respective provenance or professional position;
- Think on new methodologies to combine knowledge from the field of Advanced Econometrics and AI; and
- In a market environment characterized by information asymmetry between editors, authors and readers, and where some authors are also editors and, consequently, any attempt to criticize a work coming from this side of the market, even in a healthy way, tends to be fruitless, it is mandatory to own international patents to protect the innovative character of methods developed by credible authors. This not only deters bad reviews of scientific work, but also fosters a good reputation within the scientific community. It also defines a clear and objective way of distinguishing, in the microeconomics terminology used by [50], lemons (i.e., low-quality research) from peaches (i.e., high-quality research) in a market environment characterized by asymmetry of information.
4.2.2. Recommendations during and after Building Deep Learning Neural Networks
4.3. Limitations and Future Research
5. Conclusions
6. Patents
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- WIPO. Multi-Convolutional Two-Dimensional Attention Unit for Analysis of a Multivariable Time Series Three-Dimensional Input Data. Patent WO/2021/255516, 23 December 2021. [Google Scholar]
- Alves, C.A.; Scotto, M.G.; Freitas, M.d.C. Air pollution and emergency admissions for cardiorespiratory diseases in Lisbon (Portugal). Química Nova 2010, 33, 337–344. [Google Scholar] [CrossRef] [Green Version]
- Borrego, C.; Monteiro, A.; Sá, E.; Carvalho, A.; Coelho, D.; Dias, D.; Miranda, A. Reducing NO2 pollution over urban areas: Air quality modelling as a fundamental management tool. Water Air Soil Pollut. 2012, 223, 5307–5320. [Google Scholar] [CrossRef]
- Russo, A.; Trigo, R.M.; Martins, H.; Mendes, M.T. NO2, PM10 and O3 urban concentrations and its association with circulation weather types in Portugal. Atmos. Environ. 2014, 89, 768–785. [Google Scholar] [CrossRef]
- Fernández-Guisuraga, J.M.; Castro, A.; Alves, C.; Calvo, A.; Alonso-Blanco, E.; Blanco-Alegre, C.; Rocha, A.; Fraile, R. Nitrogen oxides and ozone in Portugal: Trends and ozone estimation in an urban and a rural site. Environ. Sci. Pollut. Res. 2016, 23, 17171–17182. [Google Scholar] [CrossRef] [PubMed]
- Slezakova, K.; Castro, D.; Begonha, A.; Delerue-Matos, C.; da Conceição Alvim-Ferraz, M.; Morais, S.; do Carmo Pereira, M. Air pollution from traffic emissions in Oporto, Portugal: Health and environmental implications. Microchem. J. 2011, 99, 51–59. [Google Scholar] [CrossRef] [Green Version]
- Valente, J.; Pimentel, C.; Tavares, R.; Ferreira, J.; Borrego, C.; Carreiro-Martins, P.; Caires, I.; Neuparth, N.; Lopes, M. Individual exposure to air pollutants in a Portuguese urban industrialized area. J. Toxicol. Environ. Health Part A 2014, 77, 888–899. [Google Scholar] [CrossRef]
- Bernardo, A.; Gonçalves, L.L.; Zagalo, C.; Brito, J. Relationships between air pollutants and mortality in Portugal–an environmental health assessment. Ann. Med. 2019, 51, 69. [Google Scholar] [CrossRef] [Green Version]
- Silva, A.V.; Oliveira, C.M.; Canha, N.; Miranda, A.I.; Almeida, S.M. Long-term assessment of air quality and identification of aerosol sources at Setúbal, Portugal. Int. J. Environ. Res. Public Health 2020, 17, 5447. [Google Scholar] [CrossRef]
- Gabriel, M.F.; Paciencia, I.; Felgueiras, F.; Rufo, J.C.; Mendes, F.C.; Farraia, M.; Mourão, Z.; Moreira, A.; de Oliveira Fernandes, E. Environmental quality in primary schools and related health effects in children. An overview of assessments conducted in the Northern Portugal. Energy Build. 2021, 250, 111305. [Google Scholar] [CrossRef]
- Gamelas, C.; Abecasis, L.; Canha, N.; Almeida, S.M. The Impact of COVID-19 Confinement Measures on the Air Quality in an Urban-Industrial Area of Portugal. Atmosphere 2021, 12, 1097. [Google Scholar] [CrossRef]
- Slezakova, K.; Pereira, M.C. 2020 COVID-19 lockdown and the impacts on air quality with emphasis on urban, suburban and rural zones. Sci. Rep. 2021, 11, 21336. [Google Scholar] [CrossRef] [PubMed]
- Brito, J.; Bernardo, A.; Zagalo, C.; Gonçalves, L.L. Quantitative analysis of air pollution and mortality in Portugal: Current trends and links following proposed biological pathways. Sci. Total Environ. 2021, 755, 142473. [Google Scholar] [PubMed]
- Monteiro, A.; Menezes, R.; Silva, M.E. Modelling spatio-temporal data with multiple seasonalities: The NO2 Portuguese case. Spat. Stat. 2017, 22, 371–387. [Google Scholar] [CrossRef] [Green Version]
- Colette, A.; Rouïl, L. Air Quality Trends in Europe: 2000–2017. In Assessment for Surface SO2, NO2, Ozone, PM10 PM2; European Environment Agency, European Topic Centre on Air pollution, transport, noise and industrial pollution: Kjeller, Norway, 2020; Volume 5. [Google Scholar]
- APA. Ficha Temática ar e Ruído: Poluição Atmosférica por Dióxido de Azoto; Agência Portuguesa do Ambiente: Lisbon, Portugal, 2021. [Google Scholar]
- Hancock, A.A.; Bush, E.N.; Stanisic, D.; Kyncl, J.J.; Lin, C.T. Data normalization before statistical analysis: Keeping the horse before the cart. Trends Pharmacol. Sci. 1988, 9, 29–32. [Google Scholar] [CrossRef]
- Hamilton, J.D.; Waggoner, D.F.; Zha, T. Normalization in econometrics. Econom. Rev. 2007, 26, 221–252. [Google Scholar] [CrossRef] [Green Version]
- Deboeck, G.J. Trading on the Edge: Neural, Genetic, and Fuzzy Systems for Chaotic Financial Markets; John Wiley & Sons: Hoboken, NJ, USA, 1994; Volume 39. [Google Scholar]
- White, H. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econom. J. Econom. Soc. 1980, 40, 817–838. [Google Scholar] [CrossRef]
- Gujarati, D.N.; Porter, D.C. Basic Econometrics; McGrew Hill Book Co.: Singapore, 2003. [Google Scholar]
- Newey, W.K.; West, K.D. Hypothesis testing with efficient method of moments estimation. Int. Econ. Rev. 1987, 28, 777–787. [Google Scholar] [CrossRef]
- Cochrane, D.; Orcutt, G.H. Application of least squares regression to relationships containing auto-correlated error terms. J. Am. Stat. Assoc. 1949, 44, 32–61. [Google Scholar]
- McFadden, D. Frontiers in Econometrics, Chapter Conditional Logit Analysis of Qualitative Choice Behavior; Academic Press: New York, NY, USA, 1974. [Google Scholar]
- McFadden, D.; Tye, W.B.; Train, K. An Application of Diagnostic Tests for the Independence from Irrelevant Alternatives Property of the Multinomial Logit Model; Institute of Transportation Studies, University of California Berkeley: Berkeley, CA, USA, 1977. [Google Scholar]
- Belloni, A.; Chernozhukov, V.; Wei, Y. Post-selection inference for generalized linear models with many controls. J. Bus. Econ. Stat. 2016, 34, 606–619. [Google Scholar] [CrossRef] [Green Version]
- Ribeiro, V.M.; Bao, L. Professionalization of online gaming? theoretical and empirical analysis for a monopoly-holding platform. J. Theor. Appl. Electron. Commer. Res. 2021, 16, 682–708. [Google Scholar] [CrossRef]
- James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning; Springer: Berlin/Heidelberg, Germany, 2013; Volume 112. [Google Scholar]
- Hahn, E.D.; Soyer, R. Probit and logit models: Differences in the multivariate realm. J. R. Stat. Soc. Ser. B 2005, 67, 1–12. [Google Scholar]
- Gonçalves, R.; Ribeiro, V.M.; Pereira, F.L.; Rocha, A.P. Deep learning in exchange markets. Inf. Econ. Policy 2019, 47, 38–51. [Google Scholar] [CrossRef]
- van den Oord, A.; Dieleman, S.; Zen, H.; Simonyan, K.; Vinyals, O.; Graves, A.; Kalchbrenner, N.; Senior, A.; Kavukcuoglu, K. WaveNet: A Generative Model for Raw Audio. arXiv 2016, arXiv:1609.03499. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar]
- Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comp. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.C. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. arXiv 2015, arXiv:1506.04214. [Google Scholar]
- Greene, W. Fixed and random effects in stochastic frontier models. J. Product. Anal. 2005, 23, 7–32. [Google Scholar] [CrossRef] [Green Version]
- Battese, G.E.; Coelli, T.J. A model for technical inefficiency effects in a stochastic frontier production function for panel data. Empir. Econ. 1995, 20, 325–332. [Google Scholar] [CrossRef] [Green Version]
- Lee, Y.H.; Schmidt, P. A production frontier model with flexible temporal variation in technical efficiency. Meas. Product. Effic. Tech. Appl. 1993, 237, 255. [Google Scholar]
- Battese, G.E.; Coelli, T.J. Frontier production functions, technical efficiency and panel data: With application to paddy farmers in India. J. Product. Anal. 1992, 3, 153–169. [Google Scholar] [CrossRef]
- Cornwell, C.; Schmidt, P.; Sickles, R.C. Production frontiers with cross-sectional and time-series variation in efficiency levels. J. Econom. 1990, 46, 185–200. [Google Scholar] [CrossRef]
- Paullada, A.; Raji, I.D.; Bender, E.M.; Denton, E.; Hanna, A. Data and its (dis) contents: A survey of dataset development and use in machine learning research. Patterns 2021, 2, 100336. [Google Scholar] [CrossRef]
- Cox, V. Exploratory data analysis. In Translating Statistics to Make Decisions; Springer: Berlin/Heidelberg, Germany, 2017; pp. 47–74. [Google Scholar]
- Wong, S.C.; Gatt, A.; Stamatescu, V.; McDonnell, M.D. Understanding data augmentation for classification: When to warp? In Proceedings of the 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Gold Coast, QLD, Australia, 30 November–2 December 2016; pp. 1–6. [Google Scholar]
- Haixiang, G.; Yijing, L.; Shang, J.; Mingyun, G.; Yuanyue, H.; Bing, G. Learning from class-imbalanced data: Review of methods and applications. Expert Syst. Appl. 2017, 73, 220–239. [Google Scholar] [CrossRef]
- Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 1–48. [Google Scholar] [CrossRef]
- Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef] [Green Version]
- d’Aspremont, C.; Gabszewicz, J.J.; Thisse, J.F. On Hotelling’s “Stability in competition”. Econom. J. Econom. Soc. 1979, 47, 1145–1150. [Google Scholar] [CrossRef]
- Lones, M.A. How to avoid machine learning pitfalls: A guide for academic researchers. arXiv 2021, arXiv:2108.02497. [Google Scholar]
- Akerlof, G.A. The market for “lemons”: Quality uncertainty and the market mechanism. In Uncertainty in Economics; Elsevier: Amsterdam, The Netherlands, 1978; pp. 235–251. [Google Scholar]
- Carrasco, J.; García, S.; Rueda, M.; Das, S.; Herrera, F. Recent trends in the use of statistical tests for comparing swarm and evolutionary computing algorithms: Practical guidelines and a critical review. Swarm Evol. Comput. 2020, 54, 100665. [Google Scholar] [CrossRef] [Green Version]
- Yang, L.; Shami, A. On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing 2020, 415, 295–316. [Google Scholar] [CrossRef]
- He, X.; Zhao, K.; Chu, X. AutoML: A survey of the state-of-the-art. Knowl.-Based Syst. 2021, 212, 106622. [Google Scholar] [CrossRef]
- Cawley, G.C.; Talbot, N.L. On over-fitting in model selection and subsequent selection bias in performance evaluation. J. Mach. Learn. Res. 2010, 11, 2079–2107. [Google Scholar]
- Cai, J.; Luo, J.; Wang, S.; Yang, S. Feature selection in machine learning: A new perspective. Neurocomputing 2018, 300, 70–79. [Google Scholar] [CrossRef]
- Dong, X.; Yu, Z.; Cao, W.; Shi, Y.; Ma, Q. A survey on ensemble learning. Front. Comput. Sci. 2020, 14, 241–258. [Google Scholar] [CrossRef]
- Betensky, R.A. The p-value requires context, not a threshold. Am. Stat. 2019, 73, 115–117. [Google Scholar] [CrossRef] [Green Version]
- Bower, J.; Broughton, G.; Stedman, J.; Williams, M. A winter NO2 smog episode in the UK. Atmos. Environ. 1994, 28, 461–475. [Google Scholar] [CrossRef]
- Carslaw, D.C.; Murrells, T.P.; Andersson, J.; Keenan, M. Have vehicle emissions of primary NO2 peaked? Faraday Discuss. 2016, 189, 439–454. [Google Scholar] [CrossRef]
Dimension | Variable | Description | Unit | Mean | Std. Dev. | Min | Max |
---|---|---|---|---|---|---|---|
Target | NO | Nitrogen dioxide for prediction | g/m | 82.728 | 64.421 | 0 | 341 |
NO | Nitrogen dioxide above/below the HLV for classification | Dummy | 0.963 | 0.189 | 0 | 1 | |
Air quality | O | Ozone | g/m | 759.954 | 43.411 | 0 | 309 |
PM | Particle matter with a diameter of less than 2.5 m | g/m | 7.569 | 5.732 | 0 | 105 | |
PM | Particle matter with a diameter of less than 10 m | g/m | 20.661 | 13.591 | 2 | 251 | |
CO | Carbon monoxide | mg/m | 0.587 | 0.151 | 0.122 | 2.379 | |
Noise | ECSL | Equivalent continuous sound level | dB(A) | 68.508 | 3.603 | 40 | 93 |
Road traffic | THTV | Total hourly traffic volume | Veihicles | 853.027 | 883.473 | 0 | 3087 |
Weather | AP | Atmospheric pressure | mbar | 1005.860 | 211.626 | 972 | 1127 |
RH | Relative humidity | % | 0.651 | 0.153 | 0.164 | 0.937 | |
T | Temperature | °C | 19.042 | 5.004 | 7.2 | 40.8 | |
P | Precipitation | mm | 0.003 | 0.046 | 0 | 2.900 | |
WI | Wind intensity | km/h | 11.885 | 6.207 | 0.400 | 55.900 | |
WD | Wind direction | ° | 213.352 | 127.323 | 0 | 359.900 | |
UVI | Ultraviolet index | I | 0.731 | 1.729 | 0 | 10 |
NO | O | PM | PM | CO | ECSL | THTV | AP | RH | T | P | WI | WD | UVI | VIF | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NO | 1 | ||||||||||||||
O | 0.503 *** | 1 | 1.14 | ||||||||||||
PM | 0.347 *** | 0.109 *** | 1 | 331.30 | |||||||||||
PM | 0.347 *** | 0.109 *** | 0.999 *** | 1 | 331.08 | ||||||||||
CO | 0.504 *** | 0.145 *** | 0.460 *** | 0.459 *** | 1 | 1.33 | |||||||||
ECSL | 0.047 *** | 0.032 *** | −0.013 | −0.013 | −0.021 | 1 | 1.02 | ||||||||
THTV | −0.286 *** | −0.071 *** | −0.031 *** | −0.031 *** | −0.112 *** | −0.053 *** | 1 | 1.06 | |||||||
AP | 0.134 *** | 0.047 *** | 0.095 *** | 0.095 *** | 0.082 *** | −0.046 *** | 0.050 *** | 1 | 1.10 | ||||||
RH | 0.196 *** | 0.151 *** | 0.114 *** | 0.115 *** | 0.143 *** | −0.109 *** | −0.0001 | 0.037 *** | 1 | 1.06 | |||||
T | −0.196 *** | −0.177 *** | −0.167 *** | −0.168 *** | −0.097 *** | −0.009 | −0.020 * | −0.265 *** | −0.073 *** | 1 | 1.17 | ||||
P | 0.043 *** | 0.008 | 0.020 * | 0.020 * | −0.001 | 0.018 | −0.033 *** | 0.034 *** | −0.012 | −0.033 *** | 1 | 1.00 | |||
WI | −0.182 *** | −0.037 *** | −0.089 *** | −0.088 *** | −0.055 *** | 0.009 | 0.111 *** | -−0.032 *** | −0.036 *** | −0.014 | −0.004 | 1 | 1.09 | ||
WD | −0.226 *** | −0.059 *** | −0.062 *** | −0.062 *** | −0.116 *** | −0.010 | 0.121 *** | −0.123 *** | −0.094 *** | 0.159 *** | −0.037 *** | 0.238 *** | 1 | 1.13 | |
UVI | −0.078 *** | 0.195 *** | −0.059 *** | −0.060 *** | −0.099 *** | −0.016 | 0.101 *** | −0.070 *** | −0.016 | 0.077 *** | −0.012 | 0.115 *** | 0.026 ** | 1 | 1.10 |
Test | Null Hypothesis (H0) | Value | Decision Applied to the Prediction Exercise |
---|---|---|---|
1. Normality | Normal errors’ distribution | ||
Jarque–Bera LM | 120 *** [2] | Rej H0: absence of data normality. | |
Doornik–Hansen LM | 4,850,000 *** [24] | Thus, use feature engineering | |
Henze–Zirkler | 7.287 *** [1] | in the prediction exercise. | |
2. Skewness | Zero skewness | ||
Mardia | 1840.521 *** [364] | Rej H0: presence of data skewness. | |
Cameron–Trivedi | |||
Skewness decomposition | 777.260 *** [11] | ||
3. Kurtosis | Zero excess kurtosis | ||
Mardia | 2563.017 *** [1] | Rej H0. | |
Cameron–Trivedi Skewness decomposition | 1.300 [1] | Do not rej H0. | |
4. Homoscedasticity | Disturbance term has a constant variance | ||
White IM | 1542.610 *** [77] | Rej H0: presence of | |
Breusch–Pagan | 331.470 *** [1] | heteroscedasticity. | |
5. White noise | Disturbance term follows a white noise | ||
Portmanteau Q | 189,100 *** [40] | ||
6. Autocorrelation | No serial correlation | ||
Durbin–Watson (12; 7824) | |||
Durbin’s alternative | 37,343.759 *** [1] | Rej H0: evidence of | |
Breusch–Godfrey LM | 6470.582 *** [1] | positive autocorrelation. | |
ARCH LM | 3964.158 *** [1] | ||
7. Number of NO lags to add as inputs | |||
LL: −33,048.900; LR: 14.009 *** [1]; FPE: 276.947; AIC: 8.462; HQIC: 8.465; BIC: 8.472 | |||
8. Stationary | Target has unit root | ||
Augmented Dickey–Fuller | |||
9. Specification | Model has no omitted variables | ||
Ramsey F(3; 7809) | 65.090 *** |
Test | Null Hypothesis (H0) | Value | Decision Applied to the Classification Exercise |
---|---|---|---|
1. Joint significance | Nonsignificant covariates | ||
Wald with logit (probit) applied to ECSL, AP, RH, T, P, WI and UVI | 6.360 (5.180) [7] | ||
LR with logit (probit) applied to ECSL, AP, RH, T, P, WI and UVI | 5.020 (3.740) [7] | ||
2. Individual significance | Nonsignificant covariate | ||
Wald with logit (probit) applied to ECSL | 0.000 (0.010) [1] | ||
Wald with logit (probit) applied to AP | 2.810 * (1.130) [1] | ||
Wald with logit (probit) applied to RH | 0.010 (0.140) [1] | ||
Wald with logit (probit) applied to T | 0.380 (0.030) [1] | ||
Wald with logit (probit) applied to P | 0.020 (0.020) [1] | ||
Wald with logit (probit) applied to WI | 1.330 (1.170) [1] | ||
Wald with logit (probit) applied to UVI | 0.210 (1.140) [1] | ||
Wald with logit (probit) applied to O | 159.200 *** (127.540 ***) [1] | ||
Wald with logit (probit) applied to CO | 195.960 *** (183.290 ***) [1] | ||
Wald with logit (probit) applied to THTV | 6.170 ** (4.710 **) [1] | ||
Wald with logit (probit) applied to WD | 20.490 *** (23.780 ***) [1] |
Regression Models for the Target NO | Discrete Choice Models for the Binary Dependent Variable NO | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
(1) | (2) | (3) | (4) | (5) | (6) | |||||
Coefficient | Coefficient | Coefficient | Coefficient/ | Coefficient | Marginal Effect | Coefficient | Marginal Effect | |||
Marginal Effect | Average | At the Mean | Average | At the Mean | ||||||
Input variable | ||||||||||
CO | ||||||||||
O | ||||||||||
THTV | ||||||||||
WD | ||||||||||
Independent term | ||||||||||
Performance metrics | ||||||||||
RMSE | ||||||||||
AIC | ||||||||||
BIC | ||||||||||
Log-likelihood | ||||||||||
Pseudo- | ||||||||||
Sensitivity | ||||||||||
Specificity | ||||||||||
Accuracy |
Technique | k-Fold Cross-Validation (Assumption: 5 Folds) | Rigorous Penalization | Information Criteria | |||||
---|---|---|---|---|---|---|---|---|
Criterion | LOPT ( * = 6.473) | LSE ( * = 61.756) | Either AIC, BIC or EBIC ( * = 8.994) | |||||
Model | Logistic LASSO | Post-Estimation Logit | Logistic LASSO | Post-Estimation Logit | Logistic LASSO | Post-Estimation Logit | Logistic LASSO | Post-Estimation Logit |
Input Variable | Coefficient | Coefficient | Coefficient | Coefficient | Coefficient | Coefficient | Coefficient | Coefficient |
O | −0.027 *** | −0.030 *** | −0.015 *** | −0.029 *** | −0.008 *** | −0.028 *** | −0.026 *** | −0.030 *** |
CO | −12.885 *** | −13.428 *** | −10.411 *** | −13.486 *** | −9.223 *** | −13.641 *** | −12.706 *** | −13.428 *** |
AP | −0.013 *** | −0.021 *** | −0.009 *** | −0.021 *** | ||||
WI | 0.013 *** | 0.022 *** | 0.01 *** | 0.022 *** | ||||
WD | 0.003 *** | 0.004 *** | 0.0005 *** | 0.004 *** | 0.003 *** | 0.004 *** | ||
THTV | 0.0002 *** | 0.0003 *** | 0.0002 *** | 0.0003 *** | ||||
Independent term | 26.577 *** | 35.414 *** | 11.559 *** | 14.781 *** | 10.219 *** | 15.420 *** | 22.952 *** | 35.414 *** |
Performance metrics(after converting the classification problem into a regression problem) | ||||||||
RMSE | 0.129 | 0.137 | 0.129 | |||||
MAE | 0.032 | 0.041 | 0.033 | |||||
MAPE | 0.016 | 0.020 | 0.016 | |||||
Theil’s U | 0.851 | 0.851 | 0.844 | |||||
67.394 *** | 66.897 *** | 67.321 *** |
Method | Probit | Logit | ||||||
---|---|---|---|---|---|---|---|---|
Coefficient | Coefficient | Coefficient | Marginal Effect | Coefficient | Marginal Effect | |||
Input Variable | Average | At the Mean | Average | At the Mean | ||||
O | ||||||||
CO | ||||||||
WD | ||||||||
THTV | ||||||||
Independent term | ||||||||
Performance metrics | ||||||||
0.969 | 0.967 | |||||||
RMSE | 0.033 | 0.033 | ||||||
AIC | 896.743 | 909.199 | ||||||
BIC | 931.567 | 944.024 | ||||||
Log pseudolikelihood | −443.371 | −449.599 |
Type of Ordered Multinomial Discrete Choice Model | Probit | ||||||
---|---|---|---|---|---|---|---|
Input Variable | Coefficient | Marginal Effect | |||||
Average | At the Mean | ||||||
CO | |||||||
O | |||||||
WD | |||||||
THTV | |||||||
Cut | |||||||
Cut | |||||||
Performance metrics | |||||||
AIC | 4041.474 | ||||||
BIC | 4083.263 | ||||||
Log pseudolikelihood | −2014.737 | ||||||
CO | |||||||
O | |||||||
WD | |||||||
THTV | |||||||
Cut | |||||||
Cut | |||||||
Performance metrics | |||||||
AIC | 4040.452 | ||||||
BIC | 4082.242 | ||||||
Log pseudolikelihood | −2014.226 |
Model | Tfe SFA | Tre SFA | MLrei SFA | ILSfe SFA | MLred SFA | LSDVfe SFA |
---|---|---|---|---|---|---|
Input Variable | Coefficient | Coefficient | Coefficient | Coefficient | Coefficient | Coefficient |
O | ||||||
CO | ||||||
WD | ||||||
THTV | ||||||
Independent term | ||||||
Performance metrics | ||||||
Log likelihood | −40,854.017 | −40,854.017 | −40,840.225 | −40,890.490 | −40,890.490 | −40,890.490 |
DLNN Model | RMSE | MSE | MAE |
---|---|---|---|
WaveNet | 0.10169 | 0.01034 | 0.06713 |
TCN | 0.10383 | 0.01078 | 0.06753 |
LSTM | 0.09859 | 0.00972 | 0.06691 |
LSTM with Standard Attention Mechanism | 0.09550 | 0.00912 | 0.06586 |
LSTM with Conv1D Attention | 0.09192 | 0.00845 | 0.06499 |
ConvLSTM2D | 0.09889 | 0.00978 | 0.06597 |
ConvLSTM2D with Conv2D Attention and roll padding (VSCA) | 0.08940 | 0.00799 | 0.06414 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ribeiro, V.M.; Gonçalves, R. Classification and Prediction of Nitrogen Dioxide in a Portuguese Air Quality Critical Zone. Atmosphere 2022, 13, 1672. https://doi.org/10.3390/atmos13101672
Ribeiro VM, Gonçalves R. Classification and Prediction of Nitrogen Dioxide in a Portuguese Air Quality Critical Zone. Atmosphere. 2022; 13(10):1672. https://doi.org/10.3390/atmos13101672
Chicago/Turabian StyleRibeiro, Vitor Miguel, and Rui Gonçalves. 2022. "Classification and Prediction of Nitrogen Dioxide in a Portuguese Air Quality Critical Zone" Atmosphere 13, no. 10: 1672. https://doi.org/10.3390/atmos13101672
APA StyleRibeiro, V. M., & Gonçalves, R. (2022). Classification and Prediction of Nitrogen Dioxide in a Portuguese Air Quality Critical Zone. Atmosphere, 13(10), 1672. https://doi.org/10.3390/atmos13101672