1. Introduction
In recent years, the emission of polluting gases into the atmosphere has increased considerably due to the growth of the fleet of motor vehicles and industrial activities. This pollution is mainly concentrated in large urban centers, which contributes to people having greater exposure to substances considered toxic such as carbon dioxide (CO
2). For these reasons, the monitoring of air quality is important, especially the concentration of gases that pose a greater health risk. Ideally, monitoring should take place at strategic points in cities, especially in regions with a heavy flow of vehicles and people [
1]. CO
2 atmospheric levels have been gradually increasing in recent years. According to data from the National Oceanic and Atmospheric Administration (NOAA, Washington, DC, USA), one of the most-respected laboratories in the USA, in 1960, the CO
2 concentration was 320 ppm; in 2015, this value surpassed 400 ppm for the first time, reaching close to 410 ppm in 2021. These values were obtained from sensors installed at an altitude of 3500 m, in a region far from direct sources of pollution [
2]. In places where the burning of fossil fuels is more intense, particularly in urban centers, the CO
2 amount tends to have even higher values. This is due to the large flow of vehicles and industries.
According to previous studies, the ideal CO
2 concentration in outdoor environments should be between 450 and 550 ppm [
3], although this limit may vary according to different regional legislations. However, studies show that the human organism begins to subtly feel the first CO
2 toxic effects from 600 ppm [
4]. CO
2 is a gas considered to be less toxic than others. However, it can pose a danger to the population’s health (when above safe limits) and to the environment. In humans, it can cause headaches, breathing problems, difficulty concentrating, compromise athletes’ performance, increase the risk of cancer, and even lead to death in extreme cases [
1]. In addition, CO
2 is considered one of the main gases responsible for the greenhouse effect, which is a worrying environmental problem. The amount of air pollutants is large and, therefore, it is an arduous and expensive task to quantify the concentration of all of them simultaneously. Each substance requires specific equipment and techniques for its correct monitoring. To alleviate this difficulty, it is possible to measure only CO
2 levels as it is considered an excellent indicator of air quality. When the CO
2 concentration is high in an environment, most of the other polluting gases tend to follow the same trend as well [
5].
In recent years, several studies have proposed air-quality-monitoring mechanisms [
6,
7,
8,
9,
10]. Most platforms are formed by a wireless-based sensors network, which measures different points and can display the information in real time or store it for future consultation. One of the great advantages of this type of architecture is the possibility of distributing sensors in an environment and monitoring it completely. Overall, the projects aim to optimize certain hardware components, wireless sensor network communication, and even battery life. The reported systems show high communication efficiency and are able to monitor different types of pollutants in different environments (such as subways, other indoor environments, and open areas in cities). However, all the proposed systems fail to perform a systematic analysis of the collected data. Therefore, they can only generate graphs with the concentration of a particular pollutant. This information may not necessarily be useful to a common user who has no knowledge of environmental pollution. Furthermore, there are some limitations that end up compromising the functioning of the system and compromising its use [
11,
12]. On the other hand, there are other works that do not address the measurement through sensors and focus on the development of algorithms and data analysis techniques to generate structured information [
13,
14,
15,
16,
17]. The projects show the implementation of algorithms to predict the concentration of toxic gases and classifiers to categorize critical days and times, among other data analysis techniques. However, the systems are only able to apply the analysis tools in databases freely available on the internet or in data obtained from their own measurements using commercial equipment.
Our group has already reported an agile method of assessing air quality based on measurements of CO
2 concentrations. For this, the construction of a monitoring platform composed of CO
2 sensors connected in a wireless network was shown. The measurements performed by each sensor, spread across multiple points in the measurement environment, were sent to a central sensor, which was the only one capable of replicating all the data to a central database. Subsequently, the data were analyzed to generate knowledge through queries and data-mining techniques [
18]. Despite enabling air quality analysis, the use of a wireless sensor network created some difficulties and limitations. The most common problems were interference and loss of communication between sensors due to the existence of physical obstacles. This caused data loss. Therefore, environments such as forests, woods, and urban centers (due to the buildings) can compromise the network’s functioning. Furthermore, the area covered by the network was limited as the number of sensors in the network increases proportionally to the monitoring area. A very large number of sensors increases management complexity and can generate network overload [
18].
For these reasons, a new version of the monitoring platform was proposed in this work. The architecture consists of autonomous and independent CO2 sensors. Each sensor is able to store (in the absence of the internet) and replicate data to the central server. Then, the central server performed data analysis and mining to generate relevant air quality information on demand via some query methods. The objective was to develop a robust platform that can be used as a technical solution for the reliable monitoring of atmospheric CO2 in different environments (even in more remote places such as forests and agricultural areas). Furthermore, we aim to produce a system that can analyze data and present structured information about air quality to the user. The proposed monitoring system was used to assess the outdoor CO2 concentration at various points in a Brazilian urban center, which was only used as a test environment for the platform.
3. Results
The monitoring platform measured CO
2 levels for 8 weeks in three different locations in a Brazilian city. The sensors generated 6120 daily records and a total of 342,720 at the end of the eighth week. The disk space occupied was approximately 1.1 GB. The database query process loses performance as it becomes lengthy. This causes the response time to increase. To alleviate this problem, an indexing structure was implemented in the measurement storage table. After these settings, analyses on air quality at the three test sites could be searched. Through the query interface, the highest and lowest CO
2 concentrations were sought over the 8 weeks of monitoring for all sensors. These data were accompanied by the date and time the requested measures were taken.
Table 2 displays these data.
The data in
Table 2 show that the city entrance (monitored by sensor 2) had higher CO
2 concentration peaks than the other locations and reached the highest CO
2 concentration, which was 829 ppm. The lowest CO
2 levels were found on the highway (monitored by sensor 3), mainly around 5:00 when the CO
2 level remained below 500 ppm.
The query system also has tools for generating graphs that show the variation in CO
2 levels. The user only needs to specify the time range (dates or times) that he wants to query, and the system dynamically generates a CO
2 concentration curve over the selected period. As an example,
Figure 3,
Figure 4 and
Figure 5 show curves with mean CO
2 concentrations obtained throughout the day for sensors 1, 2, and 3, respectively. The average was obtained from the daily values obtained at each time over the 8 weeks of monitoring. Only weekdays were considered. It was evident that CO
2 levels equal to or greater than 600 ppm (inappropriate for humans) were reached at certain times of the day for the different places evaluated.
Mapping was performed on the data collected by the sensors over the 8 weeks using data mining techniques. The C4.5 classification algorithm was configured to classify the CO
2 measurements stored in the central database and generate a decision tree (
Figure 6) to show the times of the weekdays when the air quality is poor (levels of CO
2 are above 600 ppm).
For downtown, there are three periods of the day in which the pollutant emissions were greater than recommended for outdoor environments: 07:13–08:41, 11:54–13:44, and 18:07–19:32. Both at the city entrance and at the highway, only two time-ranges displayed compromised air quality: 07:17–09:36 and 18:07–19:32, and 07:21–09:11 and 18:26–19:04, respectively.
To ensure good reliability of the results presented by the decision tree, the data stored in the central database were divided into two sets: The first set, corresponding to 70% of the records, was used to learn the C4.5 algorithm; the second one, corresponding to the remaining 30%, served to test the results generated from the data mining process. The success degree was satisfactory, reaching 86.7%. This means that only 13.3% of the data was outside of the pattern.
4. Discussion
The maximum and minimum values found by the query interface (
Table 2) are the first point to be analyzed. The highest CO
2 level found in the data contained in the central database was reached at the city entrance (829 ppm). This value is well above 600 ppm, which is considered safe for humans. Overall, this location had the highest CO
2 levels compared to the other locations evaluated. This occurred because the city entrance shows large traffic jams throughout the day. A high CO
2 peak (788 ppm) was also recorded downtown. In both cases, these high values were measured close to 18:00, which corresponds to the period of the day with the highest vehicular traffic. On the highway, the highest value measured by the sensor was 694 ppm, around 8:53. This possibly occurred because vehicles start leaving the city for the highway at 8:00, which is the time commercial activities begin on weekdays. The highway registered the lowest CO
2 levels among the evaluated locations. Few traffic jams are observed, and the flow of vehicles is distributed throughout the day. The lowest CO
2 concentration value obtained in the central database was recorded at this location (503 ppm). This value was acquired at 5:07, where vehicular traffic is very low.
The curves illustrated in
Figure 3 and
Figure 4 show more clearly how the average CO
2 levels vary throughout the day during the 8 weeks of testing, considering only weekdays.
Figure 2 represents the curve obtained for downtown and makes it clear that there are three CO
2 concentration peaks where the limit concentration for human safety is exceeded. These high values occurred precisely at times with greater vehicle traffic and when traffic jams are normally observed. These are times when people from the city are in transit to go to work (beginning of the day), for lunch (midday), and to go home (end of the day).
Only two peaks were observed where the air quality was compromised for the entrance to the city, as shown in
Figure 3. The average CO
2 level at this location was remarkably high and had average values above 700 ppm at critical times. In this location, it usually presents extensive traffic jams daily. The city is the largest in the region and receives daily vehicular traffic from neighboring cities (people who are coming for work or for commercial reasons). The sensor was positioned at a roundabout that connects several highways, and traffic flows slowly during critical periods (start and end of the day).
The curve shown in
Figure 5 clearly shows that the evaluated highway has superior air quality compared to the other monitored locations. The CO
2 concentration usually exceeds 600 ppm for a few hours of the day, usually between 8:00 and 9:00. The present highway is in the interior of the state and major traffic jams are not common. There are only slight increases in vehicle flow when vehicles leave the city for external commercial activities.
The query system implemented in the monitoring platform allows a date range to be used as a parameter for generating graphics. This increased the data analysis flexibility. Thus, it is possible to generate daily, weekly, and monthly curves, among other possibilities. In the case of
Figure 3 and
Figure 4, the curves were generated based on the average CO
2 concentration for each hour of the day, considering the 8 weeks of tests. The average value can bring some distortions. To obtain the average value, all data from the selected period are used. These data can bring extremely high (or low) CO
2 concentration peaks, which are generated by a sporadic increase due to truck traffic, for example. This is an unusual situation within the dataset and can contribute to an erroneous impression of the local air quality.
The air quality profile can be more reliably obtained if the data mining technique is used. This was implemented in the monitoring system. The C4.5 algorithm was used to generate a decision tree that can determine time patterns when the CO2 concentration is above the safe limit (600 ppm), considering an acceptable error rate. Thus, the influence of extreme values on the measurements is disregarded since a broad analysis in the database is performed.
The decision tree generated in the data mining process (
Figure 5) showed that, by default, the time ranges 07:13–08:41, 11:54–13:44, and 18:07–19:32 do not have good air quality downtown. Pollution rates were high in the 07:17–9:36 and 18:07–19:32 intervals at the city entrance. On the highway, the highest CO
2 concentration values during monitoring were found in the 07:21–09:11 and 18:07–19:32 intervals. The 13.3% margin of error found in the decision tree means that out of 100 times when the air quality is considered poor, 13.3 are erroneous decisions by the algorithm (the air quality is adequate).
Finally, it is important to highlight that sensors 1 and 2, as they are in the city, were able to record data directly in the central database. The sensor 3, which was located on a highway, recorded CO2 concentrations locally as there was no internet signal. However, synchronization was performed later, and no problems were detected in this process.
5. Conclusions
This work demonstrated the development of an air-quality-monitoring platform that operated using autonomous sensors to measure CO2 concentrations. These devices had a technical condition that allowed the collected data to be recorded directly in a central database located in the cloud. In the absence of local internet, data could be stored for future synchronization. The project was tested for 8 weeks in three different external locations, and one of them had no internet signal. This proved the effectiveness of using autonomous sensors in architecture. In this way, the platform can act in the most diverse environments, even the most remote ones, such as agricultural regions and forests, among others. After the monitoring period, the platform was able to generate numerous graphs and updated information about air quality. Moreover, a decision tree was generated from the data mining process. Patterns were detected that allowed us to determine the periods when air pollution is above the levels considered ideal for humans.
Problems related to the security of the data collected and the limitation of the monitoring area were observed by an air-quality-monitoring platform (based on a wireless sensor network) previously reported by the group. All these problems were overcome with the development of this new monitoring architecture.
In addition to the great flexibility in monitoring air quality, the platform offered a wide range of query and data analysis tools, as well as an automated module in the data mining process. This generates knowledge that can help engineers and environmentalists in their decision making.