1. Introduction
The growing number of motor vehicles and industrial activities in recent years has significantly increased the emission of toxic agents into the atmosphere. Thus, most pollution occurs in cities and cities are subjected to poorer air quality. Pollution causes people to have more significant contact with substances that are harmful to their health [
1]. Thus, monitoring air quality has become vital, especially in places with the highest concentration of people. Several studies have proposed air quality monitoring systems in recent years. The popularization of programmable microcontrollers, such as Arduino, and low-cost sensors compatible with this technology have helped popularize those devices. Most projects follow the logic of connecting different types of environmental sensors with Arduino. Then these devices are fixed in the locations of interest to measure the pollutant concentrations. The results are available on displays or via the internet [
2,
3,
4,
5]. The use of pollutant sensors adapted to operate on a mobile basis has also been reported. The devices were fixed to cars, buses, and people’s bodies to estimate the amount of pollution a citizen is subjected to when transiting a large urban center [
6,
7,
8].
Some reported monitoring systems are more technologically sophisticated. They demonstrate the implementation of data analysis capabilities using artificial intelligence algorithms to establish patterns and predict future values. A linear regression algorithm has been widely used when the objective is to scan a database and estimate the future curve of pollutants [
9]. On the other hand, association algorithms have efficiently established associations or correlations in the data [
10].
The use of an association algorithm has been carried out by the Apriori tool. The Apriori algorithm is well-known in data mining operations to obtain association rules. It uses the depth search technique and generates a grouping of numerous items known as standardized candidate items. These candidate items are associated with the object used as a parameter. Patterns considered infrequent are automatically excluded. The entire database is evaluated, and frequent item sets are obtained from candidate item sets [
11]. Apriori has been widely used in commercial applications to predict user interest in new products based on already purchased products [
11]. It has been used in monitoring systems to make predictions of the concentration of pollutants in a given city using measurements taken in other cities with similar characteristics [
10].
Among the various pollutants monitored by proposed devices, particulate matter (PM) is considered one of the most dangerous because it is easily inhaled. PM can reach deep into the respiratory system and cause severe damage to health, such as respiratory and cardiovascular diseases [
12,
13]. PM is a group of pollutants formed by dust, smoke, and all types of solid or liquid material that remain suspended in the air due to their small size. PM emitting sources are categorized as natural and anthropogenic. The primary natural sources are volcanoes, dust from air displacement, forest fires, and marine aerosol.
On the other hand, the primary anthropogenic PM sources are burning fossil fuels, thermoelectric plant usage, and industrial activities [
14]. PM is classified according to particle size. The two main types are PM2.5 and PM10, where the particles have mean aerodynamic diameters of less than 2.5 μm and within the 2.5–10 μm range, respectively. PM2.5 is the most harmful to humans because it remains suspended longer (due to its tiny size) and is more easily inhaled [
15]. According to World Health Organization (WHO) data, PM, especially PM2.5, is responsible for 4.2 million annual deaths. Research indicates that particulates can cause heart and neurological problems as well as respiratory health problems, such as asthma, bronchitis, respiratory failure, and lung cancer, among many others [
15].
It is a complex and costly task to quantify the concentration of all air pollutants simultaneously. Typically, specific equipment and techniques are used to monitor each substance. Air quality is usually determined using only the CO
2 concentration measurement to reduce costs. CO
2 is considered an excellent indicator, and when the concentration of CO
2 in an environment is high, then the other pollutants (such as sulfur dioxide, nitrogen dioxide, and PM) are also usually at high levels [
16,
17]. This behavior is plausible because CO
2 pollution shares the same anthropogenic sources of production of these other gases (mainly fossil fuel burning and industrial activities). However, this approach only qualitatively estimates air quality, it does not allow for determining the concentration of other pollutants or even knowing which toxic agents are effectively present. CO
2-based air quality measurement prevents more detailed studies from being carried out that could lead to decision-making or policy establishment to constrain any specific pollutant’s alarming increase (above acceptable levels). For PM, the WHO strongly recommends avoiding environments with atmospheric levels above 25 μg/m
3.
Our group has already acquired know-how in building air quality monitoring platforms based on CO
2 measurement and using artificial intelligence tools [
17,
18,
19,
20,
21,
22,
23]. Here we present significant advances on the previously proposed platform. PM sensors were added to the CO
2 sensors of the previous design. Furthermore, an association algorithm was implemented in the data analysis module to expand the use of artificial intelligence. The algorithm’s function was to determine correlations between CO
2 and PM concentrations. The platform’s new features were tested and evaluated by measuring CO
2 and PM in a Brazilian city, and it was possible to determine a relationship between the pollutants. It was demonstrated that it would be possible to obtain quantitative data about the concentration of the two contaminants by only measuring one. Additionally, the platform can expand the number of correlated pollutants.
3. Results
The monitoring system allowed CO2 and PM (PM10 and PM2.5) concentrations to be measured. A daily average of 6120 records were acquired, and a total of 171,360 by the end of the fourth week. The disk space occupied by the MySQL database was approximately 23 MB.
Data analysis modules (C4.5 and association algorithms) and queries were implemented on the main server to work on the collected data. The query module has several interfaces that allow the user to access information in several different ways and has a tool that allowed the generation of tables to illustrate a specific moment concerning the concentration of pollutants. For example,
Table 1 displays the maximum and minimum values of CO
2, PM10, and PM2.5 during the 4-week monitoring period. The date and time of the records are also displayed.
The WHO recommends that the PM concentration be less than 25 μg/m
3, especially of PM2.5, which has more harmful effects on health. For CO
2, problems for the human organism are observed at concentrations above 600 ppm. Furthermore, the high concentration of CO
2 indicates the presence of dangerous concentrations of other pollutants in the air [
19].
The data in
Table 1 show that the city entrance (sensor node 2) showed the highest peak concentrations of CO
2, PM10, and PM2.5 as being 817 ppm, 26.4 μg/m
3, and 28.7 μg/m
3, respectively. In general, concentration peaks for pollutants in all locations always occurred around 18:00. On the other hand, the lowest levels of air pollution were recorded around 05:00 at sensor node 3 (highway).
The query module also has a tool for generating graphs, allowing users to analyze the variation in pollutant concentrations over a period in detail. The user informs the period, and the system dynamically generates a graph describing the average CO
2 and PM concentrations curve.
Figure 3 and
Figure 4 show the average variation in CO
2 and PM (PM10 and PM2.5) concentration, respectively, measured by the sensors during the 4-week monitoring period. The query can also be performed for an individual sensor node.
Figure 3 and
Figure 4 show CO
2 and PM concentrations considerably increasing in some periods of the day (in three periods, as highlighted in the figures). Visually, the curves do not allow precise determination of when the air quality was compromised (CO
2 and PM concentrations above 600 ppm and 25 μg/m
3, respectively). So the data analysis module used the C4.5 algorithm to do this work.
Figure 5 shows a decision tree generated by the query module as a result of data analysis by algorithm C4.5. The tree points out when air quality is compromised, that is, the concentrations of CO
2 and PM are high. Three periods of the day that presented poor air quality were observed for the city center (07:16–09:11, 11:32–13:54, and 18:02–19:49) and city entrance (07:18–09:32, 12:07–13:16, and 17:57–19:25). On the highway, poor air quality was observed in only two periods: 07:42–08:58 and 18:13–19:24.
The records stored in the database over the 4 weeks were divided into two groups. The first set (75% of the records) was used to train the C4.5 algorithm, and the second (25% remaining) was used to test. The percentage of correct answers was 78.2%, that is 22.8% of the data was outside the pattern found.
The monitoring platform also used the association algorithm known as Apriori. The reason for using this algorithm is to check if there is a relationship between CO
2 and PM concentrations. In practice, the system should determine whether, when CO
2 is at high levels (above 600 ppm), PM2.5 and PM10 also exceed safe limits (25 μg/m
3). For this analysis, 75% of the data was also used for training the algorithm and 25% for testing. The result is shown in
Table 2.
The data in
Table 2 show that 81.4% (for PM10) and 86.8% (for PM2.5) of the times when the CO
2 concentration was within the safe levels, the PMs were also within the appropriate limit. On the other hand, when CO
2 exceeded 600 ppm, PM10, and PM2.5 also exceeded 25 μg/m
3 77.5% and 71.9% of the time, respectively.
4. Discussion
The concentrations of pollutants in the air mainly occur due to vehicles’ movement. Air quality decreases as the number of vehicles increases. The highest concentration peaks for CO
2, PM10, and PM2.5 (
Table 1) were observed for sensor node 2, located at the entrance to the city. At this point, large and frequent traffic jams are recorded, which explains the poor air quality in this region. Furthermore, concentration peaks were noticed for all sensor nodes at specific times. They were observed in the early morning (7:00–9:00) and in the late afternoon (18:00–19:00). The vehicle flow increases considerably in all parts of the city and on the highway in these periods due to people commuting between home and work (and vice versa). The monitoring system records corroborated this. Likewise, the lowest levels of pollutants in the air were observed in the early morning (around 5 am), before working hours. This time coincides with significantly reduced vehicular traffic (mainly on highways) as the population still rests in their homes.
The curves in
Figure 3 and
Figure 4 show the profile of the pollutant concentration variation throughout the day. It was noted that the monitored pollutants are more concentrated always in the same times of the day. This profile was observed to be the same for all monitored locations. These periods of higher incidence of pollutants coincide precisely with when people travel from home to work and vice versa (at the beginning of the working day, lunchtime, and the end of the day). The increase in pollutant levels was always lower at lunchtime, as many people have their meals at work or in nearby places (they do not use a vehicle). It is important to note that measurements taken on weekends were not considered as they would distort the average values due to the typical low flow of vehicles throughout the weekend.
Algorithm C4.5 was used to determine the pattern of times when the air quality became inadequate because the visual analysis of the curves in
Figure 3 and
Figure 4 does not allow doing this precisely. The C4.5 classification algorithm works very well for situations like this because it can generate a decision tree with the patterns found in a mass of data. The decision tree illustrated in
Figure 5 showed that, by default, in the city center and at the entrance to the city, the air quality is too poor precisely in the three periods of the day with the highest vehicle flow. More specifically, it determined that during the time ranges 07:06–09:11, 11:32–13:54, and 18:02–19:49 it is not recommended for humans to be downtown. At the entrance to the city, pollution rates are alarming between 07:18–09:32, 12:07–13:16, and 17:57–19:25. Finally, on the monitored road, in the intervals between 07:42–08:58 and 18:13–19:24, the highest concentration of pollutants was verified. The alarming worsening of air quality expected for lunchtime was not observed for the highway as it is a region outside the city. The margin of error for the decision trees was 22.8%.
Regarding
Table 2, some very relevant information was found because of the association algorithm action. For the monitored environments, there is a significant relationship between the increase in CO
2 concentration and PM. In 77.5% and 71.9% of the times that the CO
2 concentration exceeded 600 ppm, PM10, and PM2.5 were also above the ideal limit. It is important to emphasize that it is impossible to state that the cause of the increase in PM concentration is the high concentration of CO
2. In any case, the result is plausible since the sources of CO
2 and PM are essentially the same (vehicle flow). Artificial intelligence algorithms do not determine cause or effect; they only find patterns between variables that allow for predicting the behavior of a variable based on the value of another [
17].
The results of the classification and association algorithms were within an acceptable margin of error. However, the relationship between CO
2 and PM decreased on rainy days, which increased the error percentage. Rainwater accelerates the PM’s decanting process and reduces its concentration in the air [
24]. On the other hand, no significant change was observed in the CO
2 concentration profile under the same conditions. In this way, relative humidity measurements can contribute to a better analysis of the behavior and correlations of air pollutants.
5. Conclusions
This work demonstrated the improvements implemented in an air quality monitoring platform that has been developed since 2018. The first novelty allowed the measurement of PM10 and PM2.5 levels in the air, in addition to the CO2 concentration measurement previously implemented. The platform showed the versatility to monitor different environment types as it was used in urban areas and on the highway, which did not have an Internet signal. The query system was also efficient. It could quickly display all the information and historical data of the measurements performed by the sensors. The C4.5 classification algorithm was adapted to generate a decision tree containing the times when, by default, the levels of CO2, PM10, and PM2.5 were inadequate. The new artificial intelligence algorithm, by the Apriori tool, was added to the platform. It determined the association between CO2 and PM concentrations at the monitored sites. It was shown that 77.5% and 71.9% of the time when the CO2 concentration was at unsafe levels, the PM10 and PM2.5 were also at elevated concentrations. However, it should be noted that on rainy days the association between pollutants decreases. Humidity has a more significant influence on PM10 and PM2.5 than on CO2. Thus, including a humidity sensor would be interesting for better correlation analysis. Finally, it was possible to prove that CO2 is a good indicator of air quality, making it possible to trace an association with PM. The monitoring platform was effective and efficient and can be a handy tool for researchers and engineers in the environmental area. However, the project will continue to be improved to present new possibilities and more complete analyzes of air quality.