1. Introduction
In recent years, there has been a growing interest among municipal authorities worldwide in the study and research of smart and sustainable cities. This concept combines human and social capital with the city’s infrastructure capital to achieve a sustainable, livable, and efficient city [
1]. These cities strive to contribute to the Sustainable Development Goals [
2] and attain a carbon-neutral future, as well as a better quality of life for their residents. To achieve these goals, cities are assessed in various areas to comply with national and international strategic plans for decarbonization and environmental sustainability. In this context, key performance indicators (KPIs) have been developed to measure the sustainability of cities, some of which are included in the list elaborated by the International Telecommunication Union in 2017 [
3], as well as in various European projects conducted to evaluate smart cities [
4]. Within the study of sustainable mobility in smart cities, particularly in terms of passenger and freight transport, aspects related to energy, environment, safety, and security are considered, taking into account the real-time socio-economic dimensions [
5]. These aspects are part of the pillars included in the studies of KPIs for smart and sustainable cities, which should be used to evaluate defined objectives with high viability (i.e., data availability and accessibility) and a perceived high importance value by the indicator [
6].
Specifically, Barcelona is one of the most populous cities in the European Union and was ranked among the top five smart cities in 2022, according to a study conducted by Juniper Research: “Smart Cities: Key Technologies, Environmental Impact & Market Forecasts 2022–2026” [
7]. Despite being considered a smart city, analyses initiated in the city on pollutant concentrations indicate that it is well above the targets set by the European Parliament [
8], both in terms of peak nitrogen dioxide (NO
2) concentration values and the citywide average in µg/m
3 [
9]. Another issue with this indicator is whether the average values obtained in the city capture the true behavior of NO
2 across its entire extent. Considering that Barcelona is the second most populous city in Spain, with a population density of 16,339 inhabitants per km
2 and an area of 101.37 km
2 [
10], using only eight air quality stations may not provide sufficient information about the behavior of pollutants in the city. This study stems from the need to create a KPI that accurately represents NO
2 concentrations throughout the city in a truthful and instantaneous manner [
11].
Therefore, the study has three main objectives: (i) to study the behavior of NO2 concentrations, (ii) to predict NO2 concentrations considering the influence of meteorological variables, and (iii) to propose new distributions of air quality stations in the city to obtain information on real pollution levels. To achieve this, we employ a variety of Artificial Intelligence (AI) and Machine Learning (ML) models for analysis and prediction, alongside an optimization algorithm to relocate the air quality stations and provide an optimal distribution. This algorithm includes an original concept for estimating potential new stations based on the identification of sensitive points in the city and the effective area of action of each station. One of the limitations of our study lies in the limited number of air quality stations available in the city. This results in restricted coverage and can provide a biased view of the actual pollution situation in Barcelona, as the measurements are concentrated in a few locations and do not adequately reflect the spatial variability of pollution levels across the entire city. Our proposed optimization algorithm allows for the identification and prioritization of sensitive points in the city, where the presence of emission sources or the vulnerability of the population makes the information obtained more valuable. By using this algorithm, we aim to improve the representativeness of the data and provide a more accurate and comprehensive assessment of air quality in densely populated urban areas. This comprehensive and detailed approach will allow us to identify specific areas with higher concentration levels, elucidate the concentration behavior, and assess the effectiveness of current policies and initiatives to reduce pollution levels. Ultimately, this approach will provide valuable insights for designing specific and practical strategies to improve air quality and promote sustainable urban development in the city.
The remainder of this paper is structured as follows.
Section 2 reviews the recent literature on the use of NO
2 indicators in sustainable mobility and its role as an indicator of pollution in smart cities.
Section 3 introduces the case study on Barcelona and the data sources used.
Section 4 presents the methodology used to achieve the aforementioned objectives, while
Section 5 presents the results obtained. Lastly,
Section 6 outlines the main conclusions and suggests potential avenues for future research.
2. Overview of NO2 Concentrations
The growing concentration of urban populations has led to environmental challenges, such as increased air pollution, which affects health in urban settings. A primary contributor to this pollution is NO
2, primarily emitted by diesel engines, which poses serious environmental and human health risks. Subramaniam et al. [
12] highlighted that NO
2 significantly contributes to global warming, the greenhouse effect, and climate change. This pollutant is also a primary cause of acid rain, which harms aquatic and terrestrial ecosystems. Regarding human health, Zhu et al. [
13] showed that exposure to elevated NO
2 is linked to respiratory diseases and lung cancer, with significant mortality rates from these diseases in a studied population in Hefei, China. Women with respiratory diseases appeared more susceptible to air pollution than men. Additionally, Gurjar et al. [
14] warned about health risks in megacities, noting that some cities face higher risks due to high levels of pollutants like NO
2, especially in South Asia.
To address these issues, traffic reduction measures, such as pedestrian zones and Low-Emission Zones (LEZs), have been implemented [
15]. The effectiveness of these measures is often questioned, and this is where smart city tools play a crucial role by providing data to scientifically evaluate their impact. Various approaches have been proposed to evaluate smart city interventions. For instance, Ntafalias [
16] proposed a seven-step methodology for assessing the impact of these interventions, emphasizing the importance of a comprehensive analysis of the city’s long-term vision and cooperation among all stakeholders. Additionally, Lebrusán and Toutouh [
17] analyzed the effectiveness of an LEZ in Madrid, demonstrating its capability to significantly reduce air pollution and noise in the city. Analyzing Shared Mobility Systems (SMSs) is another critical approach to addressing urban transportation challenges. Golpayegani et al. [
18] emphasized the need to address traffic-related NO
2 concentrations and how SMS solutions can contribute to reducing this pollution in urban areas. To evaluate cities, Angelakoglou et al. [
19] developed a repository of 75 KPIs in six dimensions, including environmental aspects and concentrations of air pollutants. This repository can serve as a basis for assessing the impact of solutions aimed at improving air quality and reducing pollution in urban environments. The mentioned articles highlight the importance of addressing gas pollutant concentrations, such as NO
2, in the context of smart and sustainable cities. A holistic approach involving all stakeholders and interconnected entities is essential to evaluate the impact of smart city interventions and achieve greater efficiency in the urban mobility system.
The application of AI and ML techniques, as shown by Subramaniam et al. [
12], has been instrumental in developing effective pollution control strategies by predicting NO
2 concentrations more accurately. In particular, several works on atmospheric pollution and NO
2 concentrations in Barcelona aimed to understand the dynamics of key pollutants, such as NO, NO
2, and O
3. Malik and Tauler [
20] utilized the Multivariate Curve Resolution–Alternating Least Squares technique to analyze temporal variations and diurnal profiles of these pollutants. Basagaña et al. [
21] examined the impact of public transportation strikes on air quality, revealing increased NOx and black carbon levels during strikes. Gignac et al. [
22] investigated the short-term effects of NO
2 exposure on cognitive and mental health, while Pierangeli et al. [
23] estimated childhood asthma cases attributable to air pollution. Benavides et al. [
24] developed accurate urban air quality models using operational prediction systems and specific dispersion models. Rodriguez-Rey et al. [
25] evaluated traffic restriction measures in Barcelona, focusing on reducing NO
2 levels. Recently, Cican et al. [
26] applied two ML techniques to predict the air quality in a city of Bucharest. In particular, the authors used advanced recurrent neural networks, specifically Long Short-Term Memory (LSTM) and Gated Recurrent Unit models, which achieved improved performance over traditional methods. Wu et al. [
27] introduced a novel deep learning model that combines Residual Neural Network, Graph Convolutional Network, and bidirectional LSTM architectures to improve the short-term regional predictions of NO
2 and O
3 concentrations in Shanghai (China). Similarly, Tao et al. [
28] developed an ensemble ML model that incorporated deep learning to forecast NO
2 levels using data from 1609 air quality monitors in China. Lastly, El Mghouchi et al. [
29] explored multivariable air quality predictions using five hybrid ML models to analyze the relationships between meteorological factors and particulate matter concentrations in Craiova (Romania).
The growing body of research highlights the necessity of addressing air pollution, particularly NO2, in urban areas to safeguard public health and improve air quality. The combination of traffic reduction measures, the promotion of SMS, and the use of smart city tools are critical steps toward creating healthier urban environments. Active collaboration with local communities and policymakers is essential to ensure the successful implementation of these strategies, ultimately contributing to a more sustainable future for cities.
4. Methodology
The methodology employed in this study to achieve the proposed objectives is described in this section. Various tools are used to conduct the descriptive analysis, predict NO
2 behavior, and optimize the distribution of stations using an optimization algorithm. In particular, the descriptive analysis is performed using Python 3.10. The libraries NumPy [
31], Pandas [
32], Seaborn [
33], and Matplotlib [
34] are used for data analysis and visualization. The Folium library [
35] is used to work with the map of Barcelona and generate heat maps. The Geocoder library is used to obtain coordinates for points of interest in Barcelona. The Scikit-learn library [
36] is used for clustering analysis concentrations at the stations. This same library is necessary to apply ML models to predict concentrations at each station. As for the optimization algorithm, it is implemented using Python 3.10 with support of Numpy and Pandas libraries to perform array manipulations and manage the input datasets, respectively.
4.1. Descriptive Analysis
A detailed analysis of the measured NO
2 concentrations across the city is conducted to identify and observe patterns over different time frames, including daily and weekly behavior. Various graphical representations are created to explore the temporal dynamics of NO
2 concentrations [
37].
First, daily averages are analyzed to examine general trends in NO
2 behavior, investigating daily and weekly variability, showing the influence of traffic emissions and weather conditions [
38]. Additionally, weekly patterns are explored to identify potential cyclical behaviors influenced by factors such as urban mobility and industrial activity. Finally, a spatial analysis is then conducted to distinguish the behavior of NO
2 concentrations at individual monitoring stations. This approach examines the spatial and temporal representativeness of NO
2 monitoring stations in urban settings, highlighting the importance of capturing local variations to obtain a complete picture of air quality [
39]. The analysis is further complemented by using meteorological variables obtained from stations distributed throughout the city. A wind rose is generated to visualize and understand the predominant wind directions and speeds in the city. This tool is essential for identifying local climate patterns and variations in different parts of the city [
40]. This analysis helps in understanding how meteorological conditions contribute to the dispersion of atmospheric pollutants, highlighting the influence of urban climates on air quality [
41]. Additionally, integrating meteorological data with air quality models is crucial to obtain a more accurate view of pollutant dispersion in urban areas [
42].
As the final step of the descriptive analysis, a cluster analysis is performed to study the behavior of NO
2 concentrations at the monitoring stations. The hourly average values of NO
2 concentrations at each station during the study period are used to identify common patterns among the stations [
43]. For this analysis, the K-means algorithm is selected, which requires specifying the number of clusters in advance [
44]. To determine the optimal number of clusters, the elbow method is employed [
45]. This method involves plotting the number of clusters against inertia (the sum of squared distances within each cluster), and the point where a significant change in the inertia decrease occurs indicates the optimal number of clusters. In addition, agglomerative hierarchical clustering is applied, a method that does not require specifying the number of clusters initially [
46]. This algorithm treats each point as an individual cluster and successively merges the closest clusters, creating a hierarchical structure. The result is represented by a dendrogram, where the branches indicate the clusters, and the height of each union reflects the Euclidean distance [
47]. In our study, the Euclidean distance is used as the metric that captures the separation between points in an n-dimensional space, providing a clear representation of how NO
2 concentrations vary according to the location of the monitoring stations.
4.2. Behavior Prediction of NO2
Afterward, the NO
2 concentration predictions for each station are analyzed using different methodologies. The objective is to determine which methodology best approximates the data for each particular station. Drawing on several studies [
12,
48,
49], we decide to utilize the following methods:
K-Nearest Neighbors (KNN): This algorithm is based on the idea that data points with similar characteristics tend to have similar output values. It works by finding the K closest points in the training dataset and predicting the output value based on the majority of the K-nearest neighbors [
50].
Decision Tree: This model uses a decision tree to make predictions. Each internal node represents a feature, each branch represents a decision rule, and each leaf represents the prediction result [
51].
Support Vector Regression (SVR): SVR is a regression technique based on support vectors that seeks to find an optimal regression function within a feature space. It uses a supervised learning approach to predict output values [
52].
Random Forest: Random Forest is an ensemble of decision trees, where each tree votes for the predicted output. The final prediction is determined by selecting the output with the most votes. By combining multiple trees, the risk of overfitting is reduced, and the prediction accuracy is improved [
53].
Artificial Neural Network (ANN): ANN is a model inspired by the structure and functioning of the human brain. It consists of a network of interconnected artificial neurons used for making predictions. The model learns from training data by adjusting the synaptic weights of neurons [
54].
All the prediction models proposed in this study were validated with the Holdout method, where a
data split was used for training (training) and the remaining
for testing (test). This division made it possible to evaluate the predictive capacity of the models [
55]. By applying these methodologies to predict NO
2 concentrations at various stations, we aim to identify the best-suited approach for each station based on its specific characteristics and patterns. This analysis will contribute to a better understanding of the performance of different prediction techniques in capturing the complexities and variations of NO
2 concentrations across different locations. To compare the effectiveness of each method at the stations, we use evaluation metrics such as Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the Coefficient of Determination (R
2). These metrics help determine which models offer the best approximation to the observed data at each station. For more details about these statistics, readers are referred to Hrust et al. [
56].
4.3. Optimal Location of Stations
Lastly, a new placement of stations at strategic locations is proposed to improve the NO
2 pollution index in Barcelona, and obtain a more representative average of NO
2 concentrations across the city. This is supported by the analysis since it has been determined that the current 8 monitoring stations are insufficient, as they do not cover a large or representative area of the city. To determine the potential new locations, we rely on the report from the European Commission’s Mobility Observatory, using the Eltis method [
57], which recommends selecting 40 sensitive locations in the city distributed as follows: (i) 5 locations near highways, (ii)5 locations near ring roads, (iii) 10 locations near access roads to the city center, (iv) 10 locations near sensitive facilities (schools, hospitals, residences, etc.), (v) 5 locations in low-income neighborhoods, and (vi) 5 locations in recreational areas (sports facilities, parks, museums, etc.). The points chosen for this study are shown in
Appendix A,
Table A1. These strategically selected points cover the entire metropolitan area and are added to the existing stations to form an expanded set of air quality monitoring stations (
Figure 2). This approach allows for a more detailed understanding of NO
2 concentrations in the city, facilitating the implementation of effective measures to improve air quality in critical areas and throughout Barcelona. These results will support future research and environmental management actions, contributing to more effective policies to reduce NO
2 pollution and its adverse impacts.
To optimize the placement of air quality monitoring stations in Barcelona, we consider the effective radius of each station, which varies according to traffic levels in the area [
58]. We choose an average radius of 500 m due to the city’s high traffic intensity. This approach also aims to maximize city area coverage, minimizing the overlap between stations to avoid redundant measurements and ensure no areas are left uncovered. However, the budget constrains the number of stations that can be installed, necessitating a balance between maximizing coverage and managing the costs of implementation and maintenance of the stations. This problem can be modeled as a Capacitated Dispersion Problem (CPD) [
59], which aims to maximize the minimum distance between elements. Various approaches have been proposed in the literature to solve the CDP, but it is common to employ heuristics and metaheuristics to solve large-scale instances in short computing times [
60]. This study uses an adapted version of the approach proposed by [
61], which has been proven to generate high-quality solutions within short computational times. Algorithm 1 outlines the main steps of our algorithm.
Algorithm 1 Biased-randomized algorithm. |
- 1:
function BR-CDP() - 2:
DestructiveHeuristic(I) - 3:
LocalSearch() - 4:
while do - 5:
DestructiveHeuristic() - 6:
LocalSearch() - 7:
if then - 8:
- 9:
end if - 10:
end while - 11:
return S - 12:
end function
|
The algorithm receives as input an instance comprising the stations and the distances between stations, denoted by
I, as well as the maximum execution time
and the maximum number of iterations without improvement
. The algorithm generates a feasible initial solution
S by applying a destructive heuristic followed by a local search operator. The destructive heuristic and local search procedures are presented next. At this point, this initial solution becomes the best-found solution so far. Next, the algorithm performs a multistart procedure to generate new solutions until a maximum execution time is reached. In each iteration, the algorithm generates a new solution
using a biased-randomized version of the destructive heuristic combined with the local search operator. The biased-randomized heuristic introduces a slight modification in the greedy constructive behavior, which provides a certain degree of randomness while maintaining the logic behind the constructive heuristic. The biased-randomized version considers each element in the edges list with a probability that follows a geometric distribution with a single parameter
, which controls the relative level of greediness present in the randomized behavior of the algorithm [
62]. By employing a biased-randomized version of the constructive heuristic, multiple alternative solutions can be generated without losing the logic behind the original heuristic. Next, the algorithm compares the newly generated solution to the best-known solution. If the new solution has a lower objective function value, the best-known solution is updated. Once the stopping criterion is met, the algorithm returns the best-found solution.
Algorithm 2 shows the destructive heuristic procedure. Initially, the heuristic assumes all stations are opened. Then, the edges connecting the stations are sorted in ascending order according to their distance between the stations. Next, an iterative process begins, where certain stations are removed from the solution. At each iteration, an edge is selected from the list of edges in a greedy or biased-randomized manner. The facility to be removed is chosen randomly from the two stations connected by the selected edge, and the edges connected to the deleted facility are also removed from the list of edges. This procedure is repeated until the percentage of open stations falls below the required threshold. Then, the last facility that was removed is reintroduced in the solution to preserve its feasibility, and the initial solution is returned by the procedure.
The local search procedure is depicted in Algorithm 3. This procedure involves removing the oldest station from the solution and reconstructing the solution with a station not currently included. It is important to note that a removed station will not be considered for generating a new solution until all older stations (i.e., those added earlier) have been eliminated from the solution. This approach facilitates efficient space exploration while avoiding redundancy in the search process. The procedure continues until a maximum number of iterations without improvement is reached.
Algorithm 2 Destructive heuristic procedure. |
- 1:
function destructiveHeuristic() - 2:
- 3:
getEdges(I) - 4:
sort() - 5:
while isFeasible(S) do - 6:
selectEdge() - 7:
selectNode() - 8:
drop() - 9:
end while - 10:
add() - 11:
return S - 12:
end function
|
Algorithm 3 Local search procedure. |
- 1:
function localSearch() - 2:
- 3:
- 4:
while do - 5:
- 6:
oldestSelectedNode(S) - 7:
drop() - 8:
selectBestNode() - 9:
add() - 10:
if then - 11:
- 12:
- 13:
end if - 14:
end while - 15:
return - 16:
end function
|
Given the budget constraints on the number of stations that can be installed, we also analyze the impact of the percentage of open stations, which is controlled by a parameter m. When m is set to 0.1, it indicates that only 10% of the stations are open; conversely, setting m to 0.9 means that 90% of the air quality stations are operational. Our objective is to identify the optimal combination of area coverage percentage and overlap among these stations, resulting in the most efficient air quality monitoring network. To achieve this, we utilize a Pareto frontier to evaluate the effects of the parameter m on both the percentage of area coverage and the percentage of overlap between stations. By generating the Pareto frontier, we can discern the trade-offs between the percentage of covered area and the percentage of overlap, enabling us to identify the best configurations that balance maximizing coverage with minimizing redundancy.
5. Computational Results
In this section, we present the computational results derived from the methodologies outlined in the previous section. The analysis encompasses the evaluation of NO2 concentration predictions, and the optimization of the location of stations.
5.1. Behavior of Measured NO2 Concentrations
The obtained results highlight the importance of temporal and spatial variability in air quality analysis, demonstrating how local conditions and weekly traffic patterns can have a significant impact on NO2 concentrations.
Figure 3 shows the average value per year for each time slot at each station. Although most stations exhibit a similar pattern, with peak concentrations observed around 9:00 a.m. and another local peak in the late hours, the concentration ranges differ across stations. Similarly,
Figure 4 shows the average value per year for each time slot, considering the day of the week. This is because it is known that traffic patterns depend on the day of the week, which directly affects NO
2 concentrations [
63]. In this case, the values from Monday to Friday are similar, while weekends show a different behavior, with significantly lower concentrations.
Except for station 58, which is located in a residential area on the city’s outskirts, the annual mean values in the remaining stations are much higher than the 10 µg/m
3 threshold proposed by the WHO. Moreover, outliers with values significantly exceeding the 24 h limit of 25 µg/m
3 are present, especially in stations 57 and 54. The behavior is consistent across different years, although higher annual mean concentration values are obtained for the year 2022 in all stations (
Figure 5).
Considering the adaptation periods proposed by the WHO to reduce NO
2 concentration levels (40 µg/m
3 for adaptation level 1, 30 µg/m
3 for adaptation level 2, and 20 µg/m
3 for level 3), it can be concluded that the city of Barcelona currently falls within adaptation level 2 regarding the NO
2 concentration limits established by the WHO. The WHO recommends a daily average NO
2 concentration of 25 µg/m
3 (level 3 in the air quality guidelines), which should not be exceeded on more than 3 to 4 days per year.
Figure 6 illustrates the daily average values during the study period alongside the recommended daily average limit (dashed red line).
As observed, the number of days surpassing the threshold per year exceeds the 3–4 days limit. Taking a more flexible approach and considering the adaptation period to reduce NO2 concentrations, the maximum daily concentration could be considered to be 120 µg/m3 (adaptation level 1) and 50 µg/m3 (adaptation level 2). If we consider the adaptation periods rather than the strict guidelines, it can be concluded that the city of Barcelona is at adaptation level 2 but still far from achieving compliance with the guidelines.
Figure 7 presents a wind rose diagram depicting the average wind direction trends for each station throughout the study period. The differences in altitude among the stations, combined with their proximity to the sea, result in varying wind rose diagrams, despite the stations being relatively close to one another. Station X4, located nearest to the sea, experiences winds from multiple directions. In conjunction with the other variables studied, a similar behavioral pattern is observed across the stations, with stations X4 and X8 exhibiting more comparable behavior than station D5, which is situated further inland.
The dendrogram illustrating the average concentrations at the air quality stations reveals interesting clustering patterns among the stations (
Figure 8a). Station 58 emerges as a distinct cluster, with a different behavior from any other station due to its location in the city’s peripheral areas. On the other hand, stations 44, 50, and 43 exhibit similar behavior, indicating their central location in the city’s high-traffic zone with increased concentrations. Likewise, stations 4 and 42 demonstrate identical patterns, while stations 57 and 54 share similar behavior. These two groups of stations are clustered together, forming a distinct cluster defined by their proximity to the city’s peripheral highways. For the analysis of the meteorological stations, the representation of wind speed along the x-axis and y-axis is utilized (
Figure 8b). This method demonstrates the existence of two distinct groups with different behaviors. Stations X4 and X8 form one cluster in the southern part of the city at an altitude of 47 m within a high-traffic area. The other cluster consists of station D5, located in the northern region at an altitude of 415 m and away from heavy traffic.
In the case study of Barcelona, there are fewer meteorological stations than air quality stations. Therefore, the KNN algorithm is employed to classify and predict meteorological variables at each station. The goal of applying this algorithm is to obtain meteorological variables for all air quality stations. The results are presented in
Table 3.
The descriptive study highlights the importance of the location of air quality stations, as NO2 concentrations largely depend on traffic in the area. It is crucial to distinguish whether the station is situated in a city center or a residential neighborhood, as these characteristics influence both the average and maximum concentrations recorded at each station. Additionally, traffic patterns not only affect concentration levels but also the dynamics of variations in NO2 concentrations. Furthermore, it has been shown that weather conditions vary depending on the location within the city, which can also influence the distribution and effectiveness of air quality stations. These factors underscore the importance of correctly locating each station to accurately represent the real pollution situation in the city.
5.2. Prediction of NO2 Concentration per Station
Next, we present the results obtained with the different prediction models for our dataset.
Table 4 demonstrates that Random Forest outperforms other methodologies regarding prediction accuracy for NO
2 concentrations at each station. The lower MAE and RMSE values indicate a closer approximation to the observed data, while the higher R
2 suggests a better fit to the data variability. These findings underscore the efficacy of Random Forest for predicting NO
2 concentrations in diverse locations, and they explain the specific characteristics influencing the predictive performance of each method.
When applying the Random Forest algorithm to predict NO
2 concentration in different stations, it is observed that the evaluation metrics showed similar and close values across all stations. However, the question arises as to whether the model’s effectiveness in predictions varies depending on the concentration of NO
2 concentrations at each station. To address this question, a correlation analysis is conducted between the normalized evaluation metrics and the annual average concentration of NO
2 at each station. The results indicate that in stations with lower NO
2 concentrations, the MAE is lower than those with higher NO
2 concentrations (
Figure 9a). This relationship between NO
2 concentration and MAE yields a correlation coefficient of 0.82. To ensure robustness, station 50 is excluded from the analysis, as it exhibits poor fit and appears as an outlier (
Figure 9b).
The study finds a strong positive correlation between NO2 concentration and prediction error (RMSE), indicating that stations with higher NO2 concentrations also experience higher prediction errors. Additionally, a moderate negative correlation is observed between the NO2 concentration and the coefficient of determination (R2), suggesting that the model is less effective in stations with high NO2 concentrations, where it fails to adequately explain the variability of the observed data. These findings reveal that the performance of the Random Forest model in predicting NO2 concentrations varies according to the concentration level at each station; it is more effective in stations with low NO2 concentrations and less accurate in stations with high concentrations.
5.3. Optimization of Location of Stations
Lastly, the obtained results of the proposed optimization algorithm for the optimal location of air quality stations are presented. First, the results of the Pareto frontier are shown to obtain the optimal number of stations to maximize the percentage of area coverage and minimize the percentage of overlap between stations. In addition, a scenario with minimal changes is considered, in which a total of 10 stations are placed throughout the city, ensuring that they are as far apart from each other as possible.
Figure 10 illustrates the results of the Pareto frontier, depicting two key relationships concerning the number of open air quality stations. The left sub-plot displays the number of open stations alongside the percentage coverage achieved by each configuration. As expected, increasing the number of open stations generally leads to improved coverage across the air quality monitoring network. The right sub-plot exhibits the number of open stations concerning the percentage of overlap between these stations. As the number of open stations increases, so does the likelihood of overlap between their coverage areas. Initially, as the number of stations grows, the overlap percentage remains at 0%. However, beyond a certain threshold, the percentage of overlap escalates rapidly.
Observe that the Pareto frontier demonstrates the trade-off between increasing the number of open stations to enhance coverage and the consequent rise in overlap, which may lead to redundant monitoring and inefficient resource allocation. The optimal configuration is identified as since an increase in the number of stations with the chosen locations does not lead to an increase in coverage due to overlaps and the distribution of stations. At this point, the plot displaying coverage indicates that 35 open stations offer the highest coverage without significantly increasing overlap. This suggests that the air quality monitoring network achieves a well-balanced configuration, maximizing coverage while minimizing redundancy.
Figure 11 presents the optimal scenario of open air quality stations obtained through the Pareto frontier analysis. It highlights the selected stations that maximize coverage while minimizing overlap. This optimal set includes both the initial air quality stations (denoted by red circles) and newly opened potential stations (blue circles). However, budget constraints limit the number of stations that can be installed, requiring a balance between maximizing coverage and managing the costs associated with implementation and maintenance. Thus, the goal is to demonstrate the optimal placement of the current stations, with the addition of two new stations for improved distribution, without incurring significant investment. This minimum scenario is shown in
Figure 12. The stations that remain in the exact location as the initial arrangement are marked with red circles, while the new stations, selected at points of interest, are marked with blue circles.
To compare the NO
2 concentration KPI calculated in both scenarios with the current situation, we will use the annual mean NO
2 concentration measured across all stations as the KPI. Each new station will be assigned an average concentration value using the KNN algorithm, which has been previously utilized. In this case, the average value for each new station will be determined by the proportional mean distance to the three nearest current stations. The values for the nearest current stations for the studied period are shown in
Appendix A,
Table A2. After applying the algorithm, the values obtained for each new station are included in
Appendix A,
Table A3.
Figure 13 shows the created heat maps to facilitate visual comparison, considering the annual mean concentration value for each station for 2023. The comparison of these maps reveals that the optimal scenario covers a larger portion of the city and provides a more representative depiction of the current situation. Although the algorithm’s assigned values do not account for traffic conditions or other urban characteristics, they are considered a reasonable approximation for comparing the scenarios.
Using the values assigned to the new stations, the NO
2 concentration KPI for the study period is calculated for the different scenarios. The values obtained for each case are shown in
Table 5. Throughout the study period, both the minimum and optimal scenarios yield higher KPI values. For the minimum scenario, the KPI increases by 4% to 6%, while for the optimal scenario, the KPI increases by 6% to 9% compared to the current situation. This confirms that the current scenario may be underestimating this indicator for the city and may not capture all the necessary information to accurately represent reality.
6. Conclusions
This study evaluates the suitability of using the daily average concentration of NO2 as a KPI to assess air quality in a smart city like Barcelona. This evaluation builds on the availability of high-quality initial data that accurately reflect the actual concentration levels across the city as measured by monitoring stations. After analyzing the behavior of NO2 using the available data, important relationships are evident between the concentration measured at each station and its location. This is not only linked to traffic in the area but also to the city’s meteorological conditions.
Moreover, the study highlights the importance of station placement, as an inadequate distribution could result in a distorted KPI: overestimated if the stations are concentrated in high-traffic areas, or underestimated if they are mainly located in residential zones. For this reason, strategic points are identified in the city where measuring the NO2 concentration would provide significant added value. Given that the optimal solution for the distribution of air quality monitoring stations could be very costly, a more conservative alternative is also proposed that minimizes investment. This solution only requires the installation of two additional stations and the relocating of some existing ones to achieve more representative results. Furthermore, it is observed that when the stations are better distributed, the NO2 KPI value exceeds the thresholds set by the WHO. This suggests that Barcelona needs the continuous and precise monitoring of NO2 levels to quantify the effects of the policies implemented in the city, enabling informed decision-making that improves air quality.
As a future line of research, it would be highly relevant to correlate real-time traffic data in the city with NO2 concentration data and consider the population density in each zone. This integration would allow for a better understanding of the relationship between vehicular flow, atmospheric pollutant concentrations, and the population exposed to this pollution, providing a more comprehensive perspective on the impact of traffic on air quality in Barcelona.