1. Introduction
The Moravian-Silesian Region in the Czech Republic, particularly the city of Ostrava, is recognized as a significant air pollution hot spot in Europe. This situation arises historically from a combination of industrial activity, high population density, and geographical factors that worsen air quality levels. The topography of the Moravian-Silesian Region contributes to the accumulation of pollutants, especially during winter months when temperature inversions are common. This leads to poor dispersion, resulting in elevated concentrations of air pollutants. Additionally, air pollution from neighboring areas, particularly from the Silesian Voivodeship (Poland), also plays a significant role in the given context [
1,
2,
3].
Recently, “Polish smog” was proposed as a specific type of air pollution that occurs in Poland, particularly in the winter months. It is characterized by high concentrations of particulate matter (PM), such as
and
, as well as polycyclic aromatic hydrocarbons like benzo(a)pyrene with adverse health effects [
4]. This type of smog is particularly prevalent in Eastern Europe, where it arises from the burning of coal and other solid fuels for heating purposes, especially during the winter months. When compared with photo-chemical smog found in industrialized urban areas, which is driven mainly by volatile organic compounds (VOCs) and nitrogen oxides leading to high ozone levels, Polish smog is more closely linked to residential heating practices and industrial emissions [
5].
Yet, another type of smog event, which is further considered throughout this paper, is caused by particulate matter originating from Sahara desert. Dust storms can transport particles over thousands of kilometers, affecting air quality far from their source. The transport of Saharan dust poses significant challenges for air quality management and public health in Europe. Although this phenomenon is more common in the Mediterranean region, it can occasionally cause a significant deterioration in air quality for Central European countries.
Low-cost sensor (LCS) networks have emerged as a promising solution for monitoring air pollution and providing smog alerts as they can supplement data from regulatory-grade reference instruments [
6]. By filling in spatial and temporal gaps in air quality monitoring, such information can provide a more comprehensive understanding of pollution patterns both at local and regional level. Citizen science projects involving the public in sensor deployment can further expand the reach of these networks [
7,
8,
9]. However, the proper calibration of LCSs is crucial to ensure data accuracy and reliability [
10].
In principle, uncertainties of factory-calibrated LCS response are studied by experiments in controlled (laboratory) or uncontrolled (field) ambient conditions [
11,
12,
13]. Such a calibration procedure is followed by a selection of suitable numerical correction methods and an estimation of parameters using reference datasets for model training or testing purposes [
14]. The co-location of the LCS node with reference instruments in real outdoor atmosphere is usually performed over a period of several weeks in order to accumulate appropriately large datasets required for the reliable outputs of the calibration process. A prolonged period of co-location enables us to investigate the seasonal variability of LCS performance across a wide range of environmental conditions relevant for the target locality.
In the given context, our study focuses on three key objectives related to the performance and application of an LCS network in the Moravian-Silesian Region:
Aim 1 is to evaluate the accuracy and reliability of a prototype sensor node specifically designed for local conditions in the region, focusing on the winter and spring periods. This study analyzes the data obtained from low-cost sensors, particularly the Sensirion SPS30 and Alphasense CO-B4 sensors, and compares them with data from regulatory AQM stations to evaluate sensor accuracy in detecting PM and CO during air pollution episodes.
Aim 2 is to investigate whether and how the reliability of the SPS30 sensor can be enhanced under local conditions. While the SPS30 sensor has shown promise in monitoring particulate matter, it also presents limitations, particularly in its ability to accurately measure larger particles, such as . This study explores potential calibration improvements, including corrections based on particle size distribution, to address the observed biases in measurements.
Aim 3 is to analyze the potential use of data from the Copernicus Atmospheric Monitoring Service (CAMS) as an alternative to increasing the complexity of the LCS node. The CAMS provides regional air quality forecasts, including and CO concentrations, which could potentially complement or replace the need for additional sensors in the low-cost monitoring framework.
Overall, the goal of our research is to develop a new framework for data analysis and practical guidance for the calibration of low-cost sensors in urban air pollution environments by combining observations from AQM stations and CAMS model predictions. In particular, we aim to provide new insights into the reliability of the SPS30 sensor for further use in citizen science-based air quality monitoring initiatives, which has been critically discussed in the recent literature, e.g., [
15].
2. Materials and Methods
Our prototype LCS sensor node (more details are described in
Appendix A) was mounted on the roof of the AQM station (see
Figure 1). This setup allows for a direct comparison between the outputs of the LCS node and data from reference-grade instruments.
2.1. Selection of Low-Cost Sensors
Following our goals, the LCS node is equipped with a set of low-cost sensors suitable for the monitoring of primary air pollutants for the specific area, which are particulate matter (PM) and carbon monoxide (CO). The selection of the gas and PM sensors was mainly based on the extensive literature survey and experience of previous investigators. The availability of the sensors (their distribution in the EU) was also taken into account as well as the affordability of the entire LCS node setup and level of complexity relevant to the requirements for its integration and further development.
For the particulate matter (PM) measurement, the node utilizes a pair of commercially available sensors: Sensirion SPS30 and Alphasense OPC-N3. The SPS30 is a laser-based optical sensor well suited for measuring fine particulate matter (especially ) mass concentration based on the principle of light scattering. The OPC-N3 also uses optical particle counting when detecting a wider range of particle sizes, from 0.35 µm to 40 µm, across 24 size bins, enabling particle size distribution to be determined, which is critical for understanding the composition of atmospheric aerosols and the determination of their origin in relation to source apportionment.
For carbon monoxide monitoring, the LCS node incorporates the Alphasense CO-B4 electrochemical sensor. This sensor can detect CO concentrations between 0 and 1000 ppm, with a resolution of 0.1 ppm. Its sensitivity ranges from 55 to 85 nA/ppm, providing an accurate detection of small changes in CO levels.
Knowledge of ambient temperature and relative humidity (RH) is essential when aiming at corrections of LCS response in various ambient conditions. For this purpose, a digital module including the Bosch Sensortech BME280 sensor was integrated into the LCS node. This sensor operates with a temperature accuracy of °C and a humidity accuracy of RH.
2.2. Co-Location Site and Reference Instruments
The co-location measurements of the LCS node were conducted at an air quality monitoring (AQM) station of the health institute located in the municipal area of the city of Ostrava, which is close to various industrial sites (e.g., metallurgical, chemical, etc.). The AQM station provides information on meteorological conditions, such as wind speed and direction, as well as reference air quality data in hourly intervals. Atmospheric pressure, temperature, and humidity are measured using the COMET T3113D sensor for temperature and humidity, and the NXP Semiconductor MPX4115A for pressure. For CO measurements, an HORIBA APMA-370 analyzer is used. The station also monitors nitrogen dioxide (
) with the HORIBA APNA-370 analyzer and ozone (
) with the HORIBA APOA-370 analyzer. A TEOM 1400 analyzer was used as a reference for
during the entire winter evaluation period, including the S1 episode. At the end of March 2024, the TEOM 1400 was replaced by a Palas FIDAS 200 analyzer, which provides continuous real-time reference measurements of size distribution, allowing the quantification of
,
, and
. Detection principles for each measurement are detailed in
Table 1.
2.3. Overview of Co-Location Measurement
Low-cost sensors are often collocated with reference instruments in laboratory or field conditions for a period of several weeks in order to improve their performance. In our case, an evaluation measurement campaign lasting for three months (from mid-November 2023 to mid-February 2024) was initially planned to be carried out in order to verify and validate the performance of individual LCSs and their variation for seasonal meteorological conditions typical in the Moravian-Silesian Region.
Additionally, smog alert events, which occurred in Ostrava during December 2023 and April 2024, were also recorded during an extended period of co-location. These datasets enabled us to focus our attention on the performance of LCS and CAMS data versus reference measurements during the episodes of serious air quality deterioration (see
Table 2).
In general, the differences in meteorological conditions between S1 and S2 are mostly due to the distinct seasonal conditions of Central Europe. Specifically, S1 is associated with high atmospheric stability and the inflow of cold and wet air masses typical for continental weather during the winter season, while S2 is characterized by a long-range (i.e., inter-continental) transport of dry and warm air from the south via the so-called “Moravian Gate”, associated with relatively higher wind speeds.
It is worth noting that atmospheric conditions for the S1 episode are typical for the above-mentioned “Polish smog”. In the given case the highest concentrations of particulate matter are generally recorded at low temperatures, specifically between −10 °C and 0 °C. Additionally, higher atmospheric pressure correlates with increased PM concentrations, as stable air masses inhibit vertical mixing and allow pollutants to accumulate near the surface [
5]. The evolution of the above-described meteorological situation can be identified from
Figure 2 based on reference data from the AQM station.
2.4. Datasets and Preparatory Analysis
Following the aims of this study, we utilized three sets of time series to evaluate the LCS measurements and analyze the data, whose contents and temporal resolution are illustrated in
Figure 3. These data (including the interactive Python notebooks), enabling their processing and analysis, are fully available from the public repository mentioned in the Data Availability Statement.
Datasets extracted from the LCS node were converted into hourly time series using the pandas (Python library) resample method and ordered with the reference and CAMS model data according to relevant GMT timestamps.
Reference AQM data were extracted from datasets provided by the health institute (the AQM station) after their verification involving the replacement of values of measurements below the limit of detection (
, see
Table 1) by the value equal to
.
Site-specific time series of concentrations for the selected pollutant (CO,
,
,
,
and Dust) based on the forcast of the CAMS ENSAMBLE model [
16] with 11 km spatial resolution were downloaded in the form of comma-separated value (CSV) files from the Open Meteo (
https://open-meteo.com/) webpage.
The CAMS ENSAMBLE model provides daily high-resolution air quality analyses and forecasts for Europe. It utilizes an ensemble of eleven air quality forecasting systems, generating a median ensemble from individual model outputs to enhance predictive performance. This approach allows for better uncertainty estimation based on the variability among the models. Data assimilation techniques integrate model outputs with observations from the European Environment Agency (EEA), creating a comprehensive dataset. Forecasts are produced daily for the next four days, available at hourly intervals and multiple vertical levels. Practical implementation on the CAMS for urban air quality monitoring and more detailed information concerning the model predictions are described, e.g., in [
17].
Figure 2.
Temporal evolution of wind speed, wind direction, and reference particulate matter [ concentrations during S1 (a–c) and S2 (d–f) episodes, respectively. Wind direction in degrees indicates the origin of the wind ( = = north, = east, = south, = west).
Figure 2.
Temporal evolution of wind speed, wind direction, and reference particulate matter [ concentrations during S1 (a–c) and S2 (d–f) episodes, respectively. Wind direction in degrees indicates the origin of the wind ( = = north, = east, = south, = west).
Figure 3.
Schematic representation of datasets and quantities used for exploratory and regression data analyses (in bold) with corresponding temporal resolution. These datasets as well as data processing tools (including interactive Python notebooks) are available at the Zenodo repository (see the Data Availability Statement).
Figure 3.
Schematic representation of datasets and quantities used for exploratory and regression data analyses (in bold) with corresponding temporal resolution. These datasets as well as data processing tools (including interactive Python notebooks) are available at the Zenodo repository (see the Data Availability Statement).
The multilinear fitting (MLR) values of the carbon monoxide LCS voltage were converted into concentration using linear model form
scikit-learn (Python library). The entire dataset of winter measurements, i.e., from November 2023 to February 2024, was assumed as a representative for the given step. Temperature readings from the LCS node were converted into Kelvins in order to avoid numerical issues relevant to negative values. The entire dataset was split into training and testing subsets, with 1232 and 2392 data points, respectively. Finally, the MLR model, with a high coefficient of determination, i.e.,
, and acceptable mean average error (MAE < 50 μ
) for the predicted CO concentration was estimated assuming only
T[K] and CO-B4 sensor voltage values as predictors with the presumed parametrization of Equation (
1)
where
is a working electrode voltage [V],
is an auxillary electrode voltage [V] at given Greenwich (i.e., GMT) time
and case-specific values of relevant MLR coefficients are as follows:
and
,
.
An exploratory data analysis including Correlation Matrix and Kernel Density Estimation (KDE) of dataset pairs was performed using seaborn (Python library). Subsequently, simple linear regression (SLR) and plots of diurnal variations in air pollutant concentrations were performed employing the relevant methods implemented in atmospy (Python library). Particle size distribution measured by the Alphasense OPC-N3 sensor was analyzed using the smps-py (Python library).
4. Discussion
We first discuss our results regarding the response of the CO-B4 sensor and compare them with the observations of previous researchers. In the work of Camprodon et al. [
9], a very high correlation (
) and low error (
ppm) of CO measurements were observed during more than two months of CO-B4 sensor deployment. The sensor was found to behave linearly with respect to the CO concentrations and its decrease during the co-location period was negligible, which is quite consistent with our measurements. Our data obtained during the winter evaluation period (3 months) show a slightly higher coefficient of determination
when re-calibration according to the Equation (
1) is evaluated and compared with the reference data. A similar performance of this sensor is reported in Han et al. [
12], evaluating an almost identical season with similar ranges of air pollutants, but with temperatures in the range 0–20 °C. Our co-location was carried out at much lower temperatures, while temperature correction following Equation (
1) seemed to be less effective at extremely low ambient temperatures (below −10 °C) and high CO levels (see
Figure 11). Conversely, slightly overestimated values of [
were observed during the warmer days (with
5 °C). In the given case, we can attribute biased [
values to a direct temperature effect on the sensing mechanism, i.e., a reduced rate of (electro-)chemical reactions, and the corresponding non-linearities.
On the other hand, as far as the influence of temperature on the response of the SPS30 sensor in our local conditions is concerned, we anticipate rather an indirect effect consisting in the change in particle size distribution due to the increased need for domestic and industrial heating at lower ambient temperatures.
This hypothesis is consistent with a number of previous publications, e.g., [
19,
20], mentioning in particular the work of Zareba et al. [
20], who show a negative correlation between ambient temperature and the air pollution in an area close to our co-location site. Their study confirms that in moderate climate zones with coal burning as the primary source of air pollution, temperature is the most significant factor influencing monthly average
concentrations.
Figure 11.
Correlation of reference measurements and LCS response during the winter evaluation period and the effect of ambient temperature on sensor performance (shown by the color of the data point).
Figure 11.
Correlation of reference measurements and LCS response during the winter evaluation period and the effect of ambient temperature on sensor performance (shown by the color of the data point).
As in the case of our co-location site, many AQM stations in the region covered by this paper are not yet equipped with reference instruments for measuring fine PM fractions. Moreover, the air quality criteria recommended by the WHO, EU, or local authorities for declaring a smog alert situation usually take into account concentrations or rarely . Therefore, our main motivation was to find a solution to reliably determine values based on LCS data.
Below, we briefly summarize some of the findings from previous studies on the performance of SPS30 sensors, in particular on their reliability in measuring fine and coarse PM concentration.
In a study by Roberts et al. [
21], co-locating SPS30 with regulatory methods, they achieved an average bias adjusted
for 24 h averages and 0.57 for 1 h averages, suggesting reasonable accuracy in real-time monitoring. The mean bias error was minimal, indicating that the SPS30 provided reliable data for
levels.
According to Kuula et al. [
22], the SPS30 sensor is suitable to be used for measuring
particles when
, indicating high accuracy and consistency. Whereas for
particles, this value was 0.83, for
particles, it was 0.12, indicating low measurement reliability and that sensor is not suitable for larger particle sizes.
Vogt et al. [
23] also confirm that the SPS30 sensor is mostly accurate and reliable for
particles with
. For
particles, the
value was around 0.73. The results for
particles indicate a higher value (
) compared to the results of Kuula et al. [
22], yet the sensor is still not suitable for practical AQM applications.
Molino Ruada et al. [
15] confirmed the trend of the SPS30 sensor being able to measure
particles with a high accuracy of
. As the particle size increases, the accuracy of the sensor decreases, yielding
for
and
for
, respectively.
The physical explanation for the unreliable measurement of larger particles is related to the design of optically based LCSs and the principle of their operation (i.e., light scattering). Above all, shortened viewing angles, losses occurring during particle intake and also differences in particle shape and refractive index need to be taken into account as well as the effect of humidity and sensor aging when these LCSs are exposed to realistic outdoor conditions.
Considering these findings together with the results of our LCS node measurements against the reference data, we can conclude that the SPS30 provides a reliable response to fine dust particles, especially
, even under Saharan dust storm conditions. The
readings from the SPS30 sensor according to its original calibration (i.e., factory setting) are burdened with a systematic bias, whose trend (negative or positive) depends on the type of smog situation. Therefore, to conclude this discussion, let us take a closer look at the size-resolved histogram of the PM volume concentration distribution obtained from the OPC-N3 sensor on days with maximum
concentration in the case of S1 and S2 episodes (see
Figure 12). The difference in particle size resolution is noticeable, with both data showing significant bimodality. In the case of the Polish smog (S1), the total volume is clearly dominated by
. On the other hand, in the case of the Saharan dust storm (S2), particles with aerodynamic diameter
have the highest volume concentration from the total
found in the size-resolved distribution.
In an analogy to the recent work by Kaur and Kelly [
24], we propose a strategy to derive
concentrations from the biased PM-LCS response based on correction factors obtained from the OPC-N3 sensor working in concert. Further, we use Equations (
3) and (
4), which can be used to adjust slopes
and
, respectively, to the ideal value
. Then, we can use the inverse estimation in order to determine that the calibration of SPS30 is presumably carried out with the aerosol mixture having [
, which corresponds to common (traffic-related) air pollution in urban areas.
Therefore, the values measured by SPS30 are systematically biased if the actual [] values differ significantly from the [. In other words, it was proved that the biased SPS30 reading of could be roughly corrected using [ divided by a factor (). More precise corrections will only be possible after further analysis and experimentation.
4.1. Practical Applicability
This work represents a significant step towards strengthening the role of citizen science and democratizing environmental data in AQM, and it demonstrates the importance of academic support for these efforts as the current state of knowledge and technology is still rather prohibitive to the straightforward deployment of commercially available LCS systems in their default (factory-calibrated) setup. Therefore, a careful evaluation of LCS performance (in the form of co-location measurements) and a consideration of specific conditions of their deployment at local and regional levels before their practical application are inevitable.
In the framework of this work, we were able to explain the seasonal variability of the Sensirion SPS30 sensor response, and a correction method increasing the reliability of its response has established. According to our findings, we can exploit the strengths of the SPS30 sensor and overcome its previously reported limitations. A correction of its biased response can be expressed based on the fine-to-coarse particle ratio, e.g., as evaluated from the OPC-N3 sensor. It was also found that an additional temperature correction needs to be estimated for the CO-B4 sensor to account for a biased response at extremely low temperatures.
Our future aim is to enhance the reliablity of the regional AQM data when combining CAMS model predictions and LCS response by means of machine learning approaches employing parameterized (e.g., MLR or HDMR [
25]) or non-parameterized methods [
26].
4.2. Limitations
This study has several limitations, mainly due to the seasonal character and the influence of weather conditions relevant to the location and the winter season. It also specifically focuses only on the response of selected LCS systems integrated into a prototype node that is still under development. In our study, only individual pieces of the selected LCSs were tested and evaluated, thus not including the influence of inter-unit variability. Due to the duration of the co-location measurement, LCS aging factors were neglected.