1. Introduction
Vehicle exhaust contains hundreds of harmful substances and a large number of greenhouse gases [
1]. It is of great significance to strengthen the monitoring of vehicle exhaust pollutants for environmental protection. The main components of vehicle exhaust pollutants include carbon monoxide (CO), nitrogen oxides (NO
x), hydrocarbons (HC) and sulfur dioxide (SO
2) [
2].
At present, the main vehicle exhaust pollutant detection methods include remote sensing detection [
3], vehicle equipment detection [
4] and traditional bench test [
5]. However, these methods have many problems, such as large volume, high cost, complex process, long waiting time, etc.
An electronic nose is an odor recognition system [
6,
7,
8] composed of sensor arrays, which simulates the working principle of mammalian olfactory organs. Electronic nose has been widely used in various fields, such as the food industry, chemical industry, medical field, etc. [
9,
10,
11,
12,
13]. In previous work, it has been proved that the use of gas sensors can identify CO, NO
x, HC, SO
2 and other exhaust pollutants and judge their concentration levels [
14,
15,
16,
17]. Therefore, the electronic nose with 12 gas sensors designed independently was used in this paper to carry out real-time and rapid detection of vehicle exhaust pollutants.
In order to make it possible for the vehicle-mounted electronic nose to rapidly detect vehicle exhaust pollutants in real time, it is necessary to simplify the sensor array to achieve the miniaturization and low-cost of the electronic nose. At the same time, because the sensor array in the electronic nose has cross sensitivity, there will be redundancy in the sensor array. Optimizing the sensor array can not only reduce the volume and cost [
18,
19,
20], but also remove redundant information and improve the recognition rate of the electronic nose in identifying pollutants.
At present, many feature selection methods have been used to optimize the sensor array of electronic nose, and have achieved good results. Recursive Feature Elimination (RFE) is a feature selection algorithm that searches for the optimal feature subset by repeatedly constructing models, and has been widely used in the optimization of electronic nose’s sensor array [
21,
22]. Genetic Algorithm (GA) is a randomized search method with global optimization capability, and has also been used in sensor array optimization of electronic nose [
23,
24]. While Random Forest (RF) can not only solve classification and regression problems, it also has certain applications in the optimization of an electronic nose’s sensor array [
25]. This paper used two popular feature selection methods: Recursive Feature Elimination with Cross Validation (RFECV) and Random Forest Feature Selector (RFFS). As a contrast, this paper also used the traditional Principal Component Analysis method to optimize the sensor array, and compares the optimization results of the popular feature selection methods and the traditional method.
2. Materials and Methods
2.1. Structure of the Electronic Nose
The electronic nose system designed in this paper mainly includes a sampling unit and a detection unit. The sampling unit is mainly composed of a sampling pipe, three-way valve, flowmeter and chamber. The detection unit mainly includes a sensor array, regulating circuit board, analog digital converter (16 channel 12 bits of Beijing Pop WS-5921/U60216), computer, several connecting wires and a power supply (5V DC). The sensor converts the odor information into electrical signals through the change of the internal resistance value, then transmits it to the regulating circuit board in turn, and converts the electrical signals into digital signals through the analog digital converter, finally transmits the digital signals to the Vib’SYS signal acquisition, processing and analysis software at the computer terminal.
As the main components of vehicle exhaust pollutants are carbon monoxide (CO), nitrogen oxides (NO
x), hydrocarbons (HC), Pb compounds (Pb), carbon dioxide (CO
2), particulate emissions (PM), sulfur dioxide (SO
2) and other gases [
2], this paper used 12 gas sensors that are sensitive to the above gases. The details of the selected gas sensors are shown in
Table 1.
The sensor array chamber is composed of an external cavity column and an octagonal plate, respectively. The gas sensor is fixed through the hole on the octagonal plate in the same direction, and the detection surface of the gas sensor is in the inner plane of the chamber. The 3D schematic diagram of the sensor cavity structure is shown in
Figure 1.
2.2. Engine Bench Test
In order to summarize the concentration variation law of diesel engine’s exhaust pollutants, we conducted the engine bench test at first. The CA4D28C5 diesel engine and the G01 gasoline engine were used in the test. The bench was set at the same rotary speed and different torques, and the exhaust pollutants concentration was measured in a continuous time. The engine’s testing bench and device are shown in
Figure 2.
An AVL DICOM 4000 pollutant analyzer and a HORIBA MEXA-7100DEGR pollutant analyzer were used for vehicle exhaust pollutant detection, respectively. The former was used for diesel engine pollutant detection, and the latter was used for gasoline engine pollutant detection. These two pollutant analyzers not only meet the standards of vehicle exhaust pollutants detection, but also have the advantages of moderate measurement range, high measurement accuracy, stable reading and good anti-interference performance. The two pollutant analyzers are shown in
Figure 3. The changes diagrams of exhaust pollutants concentration of CA4D28C5 diesel engine measured by AVL DICOM 4000 pollutant analyzer under the operating conditions of 1600 r/min and 2200 r/min, respectively with different torques (Nm) are shown in
Figure 4.
It can be seen from the changes in exhaust pollutants concentrations under different working conditions in
Figure 4 that, at the same rotary speed, with the increase in torque, the concentrations of carbon monoxide (CO) and total hydrocarbons (THC) gradually decrease, however the concentrations of nitric oxide (NO) and nitrogen oxides (NOx) gradually increase, and their change ranges are similar; at the same torque with different rotary speeds, the change trends of CO, THC, NO and NOx are similar, but the concentrations are different.
According to the above concentration variation law of the diesel engine’s exhaust pollutants, we believe that the sensor array in the electronic nose needs to have the ability to accurately identify the exhaust pollutants from different engines or different concentration levels from the same engine. The test scheme developed according to the concentration variation law of the diesel engine’s exhaust pollutants obtained from the above engine bench test is shown in
Section 2.3.
2.3. Experimental Setups
According to the concentration variation law of the diesel engine’s exhaust pollutants in
Section 2.2, we designed five groups of experiments using electronic nose to detect vehicle exhaust pollutants. The first four groups used a CA4D28C5 diesel engine, which was tested under the conditions of the same rotary speed with different torques. That is, the experiments were conducted under the conditions that the types of exhaust pollutants gases were the same, but the concentration levels of each pollutant were only slightly different. The fifth group used G01 gasoline engine, which means that the experimental conditions of different exhaust pollutants gas types and different concentration levels were taken as the control group.
The total test time of each sample was 300 s, of which 90 s is the data acquisition time, and the data acquisition frequency was 50 Hz, so that each sample contains 12 × 90 × 50 = 54,000 data points. The other 210 s is the cleaning time of the electronic nose chamber and the zero-setting time of the resistance.
The actual experimental steps are as follows:
- (1)
Check the connection tightness and safety of each component.
- (2)
Power on, start the electronic nose detection system, warm up for 30 min, and expose the sensor to clean air.
- (3)
The sample gas is introduced into the electronic nose’s chamber through the catheter connected with the three-way valve.
- (4)
Collect and store the signal of the sample gas.
- (5)
After the data collection of one group of samples is completed, clean the electronic nose with clean air for about 210 s.
- (6)
Perform the next set of experiments and repeat steps (2–5).
A total of 225 samples were obtained in the experiments. The concentration levels of the main exhaust pollutants under the test conditions and corresponding operating conditions of the samples are shown in
Table 2.
2.4. Feature Extraction
After completing the data acquisition steps in
Section 2.3, in order to reduce the data dimension and ensure the effectiveness of the subsequent pattern recognition algorithm [
26], we extracted four features from each data sample obtained: Maximum Value (MAX), Average Value (Mean), Integral Value (IV) and Wavelet Transform (WT). MAX reflects the steady state information of the whole gas sensor response curve. Mean and IV combine all the information of the whole gas sensor response curve. WT can better reflect the transient information of the whole gas sensor response curve. MAX, Mean, IV and WT were extracted from each data sample obtained from 12 sensors. After feature extraction, each data sample changes from a data sample containing 54,000 data points to a feature sample containing only 12 data points. The feature samples extracted from 225 samples were spliced together to obtain a feature matrix containing 12 feature vectors and each feature vector contains 225 feature values.
2.5. Sensor Array Optimization
In order to enable the electronic nose to realize real-time and rapid detection of vehicle exhaust pollutants, it is necessary to make the electronic nose more miniaturized and low-cost. On the one hand, the sensor array optimization method can simultaneously realize the miniaturization and low-cost of the electronic nose. On the other hand, due to the cross sensitivity of gas sensors, sensor array optimization can reduce the training time of classification models, improve the recognition rate of classification models, and avoid the occurrence of over fitting problems. Three sensor array optimization (i.e., feature selection) methods based on different principles and optimization strategies were used in this paper: Recursive Feature Elimination with Cross Validation (RFECV) based on the packaging method, Random Forest Feature Selector (RFFS) based on an embedding method, and traditional Principal Component Analysis (PCA) as the comparison. They are briefly introduced below.
2.5.1. Sensor Array Optimization Based on RFECV
Recursive Feature Elimination (RFE) was proposed by Guyon [
27] and has been widely used in solving feature selection problems. RFECV is a feature selection process for recursive feature elimination in the cross-validation cycle [
28], which can automatically find the feature subset with the optimal number of features to obtain feature selection results. In this paper, Random Forest (RF) was used as the classification model in the process of recursive feature elimination. In the recursive step of each iteration, remove the last feature according to feature ranking, retrain the RF model with the retained features, and cross verify the performance of the RF model until there is only one feature left. Finally, according to the performance of RF model in different feature numbers and feature subsets composed of different features, the optimal number of features and the optimal feature subset can be obtained.
2.5.2. Sensor Array Optimization Based on RFFS
Random forests can not only deal with classification and regression problems, but also can be used to evaluate and select features because they can estimate the importance of features [
29]. The principle of using random forests to evaluate and select features is based on the difference between the classification performance of random forest on the original dataset and the randomly extracted dataset. By calculating the classification performance difference of each decision tree in the random forest on different randomly extracted datasets, the importance of features can be estimated and the feature ranking can be obtained. The importance of the features is estimated by Equations (1) and (2):
The importance of feature
is estimated as:
where
represents the performance difference of decision tree
and
represents the standard error of all decision trees:
where
is the standard deviation of
and
is the number of elements in the dataset.
2.5.3. Sensor Array Optimization Based on PCA
Principal Component Analysis is a dimension reduction method, which transforms the original multivariable in high-dimensional space into a set of linear independent comprehensive indexes in low-dimensional space through orthogonal transformation [
30]. The eigenvalues obtained through Principal Component Analysis are sorted from large to small to measure the importance of features. Finally, features are selected according to the importance of features [
18].
Assuming that
is the original variable (the feature extracted from 12 gas sensors),
, (
p ≤ 12) as the comprehensive indexes in the low dimensional space. The transformation process from the original variable matrix
X to the comprehensive index matrix
Z can be expressed as:
where
represents the 12th coefficient in the
p-th comprehensive index.
Calculate the absolute value of the sum of the coefficients of the original variable corresponding to each feature in all comprehensive indexes, such as the absolute value of the sum of coefficients of the feature values extracted from the 12th sensor in all comprehensive indexes
:
The absolute value of the sum of the coefficients of each feature is used to represent the contribution degree of each feature in the comprehensive indexes, and then the feature selection can be obtained by descending order.
3. Results and Discussion
Different sensor array optimization methods will result in different sensor array optimization results, in which the number and combination of sensors in the sensor array will be different. The experimental results of this paper were obtained by taking the original data and the data after sensor array optimization as the input data of the Random Forest (RF) classifier. The classification recognition rate of the test set is the main index to evaluate the effectiveness of the original sensor array and the optimized sensor array. Therefore, this paper used the classification recognition rate of the test set to evaluate the results of sensor array optimization. In order to make the classification results more reliable and credible, the stratified sampling strategy was used to conduct 3-fold cross-validation 100 times, and the average of the 300-test set classification recognition rate was calculated as the final classification recognition rate of the test set. In the sensor array optimization stage, the original data set was divided into 2/3 training set and 1/3 test set; the test set was not used in the sensor array optimization stage.
The classification recognition rate obtained using RF without sensor array optimization is shown in
Figure 5a. It can be seen that the four feature extraction methods have achieved high classification recognition rate when using RF as the classifier. The highest MAX has an average classification recognition rate of 99.92%, and the WT with the lowest average classification recognition rate has also reached 98.16%. This shows that the original sensor array is effective and has the ability to accurately identify exhaust pollutants from different engines or different concentration levels from the same engine.
The results of sensor array optimization of the four extracted eigenvalues based on RF model and RFECV method are shown in
Table 3. The optimized sensor array includes six gas sensors: MP135, TGS2600, TGS2610, TGS2611, TGS2612, and TGS2620. The optimization of sensor array based on RFECV method has achieved good results.
Figure 5b shows that Mean and IV, which have the lowest average classification recognition rate after the optimization of sensor array, still reached 97.94%, almost without any loss. This shows that the RFECV method is very effective for sensor array optimization.
The classification recognition rate of sensor array optimization using RFFS is shown in
Figure 5c. Compared with the RFECV method, RFFS retains a total of eight gas sensors. In addition to the six same gas sensors selected in the RFECV method, it also retains two gas sensors, GSBT11 and TGS2602. It can be seen from the comparison of average classification recognition rate between
Figure 5b,c that the classification recognition rate of RFFS with eight gas sensors is slightly improved in Mean and IV compared with RFECV with six gas sensors, but it is decreased in MAX with the highest classification recognition rate, and it also need to bear the cost of increasing the development of two gas sensors.
Using PCA to optimize sensor arrays requires setting corresponding thresholds to limit the absolute value of the sum of coefficients of each feature. In this paper, in order to compare with RFECV and RFFS, the threshold values are set at the values required when six and eight gas sensors were reserved. The sensor array optimization results and the corresponding RF classification recognition rate is shown in
Table 4. Obviously, when the number of optimized sensors is limited, the classification recognition rate of sensor array optimized by PCA is worse than that of other sensor array optimization methods. When eight sensors are retained, the classification recognition rate of MAX decreased by more than 2% compared with RFFS. When six sensors are retained, the classification recognition rate of MAX, Mean and IV declined compared with RFECV.
When PCA was used as the sensor array optimization method, TGS2600, TGS2603, TGS2610, TGS2611, TGS2612 and TGS2620 were selected to be retained in almost every feature extraction method, regardless of whether six or eight gas sensors were retained, which indicates that they have a good response to vehicle exhaust pollutants. However, MP135 and MP901, which were retained for many times when eight gas sensors were retained, were rarely selected when six gas sensors were retained. This may be because the target gases detected by them overlap with the six gas sensors frequently selected above. When different feature extraction methods retained the same number of gas sensors, the main reason why the gas sensors selected for retention were different and the main reason why the sensor array optimization using PCA method was not as effective as the other two sensor array optimization methods may be because it is an unsupervised dimension reduction method, and its realization method is to maximize the variance in the projection direction, so the category information is not fully utilized.
When the number of sensors retained after using different methods to optimize the sensor array is the same, the sensor array with higher classification recognition rate is better. When the classification recognition rate of sensor arrays is the same, the sensor array with fewer sensors is better. A good sensor array needs to achieve the highest recognition rate when the number of sensors is as small as possible. Considering that the recognition rate of the sensor array composed of six gas sensors is almost no lower than that of the original sensor array composed of 12 gas sensors, and the sensor array composed of six gas sensors can reduce volume and save the cost to make it possible for the vehicle-mounted electronic nose to rapidly detect vehicle exhaust pollutants in real time. We believe that the optimal number of sensors is to retain six gas sensors. After limiting the number of sensors in the sensor array to six, the recognition rate of each sensor array optimization method is shown in
Table 5.
In this case, the average RF classification recognition rate of MAX after using RFECV and RFFS for sensor array optimization has both reached 99.77%. The classification recognition rate of WT after using RFECV was higher than using RFFS. It means that RFECV is a better sensor array optimization method, and MAX is a better feature extraction method than the other three. In addition, the sensor array optimized by RFECV method and using MAX as the feature extraction method includes sensors: MP135, TGS2600, TGS2610, TGS2611, TGS2612 and TGS2620.
The cost of using only the above six gas sensors is $32, which can save about 56% of the cost compared with the original 12 sensor arrays. It can also greatly reduce the volume of the electronic nose to achieve the purpose of miniaturization. The average time of using MAX as the feature extraction after using RFECV to test a new real sample was 0.021 s (using Python 3.10.5 and Visual Studio Code 2022). The miniaturization and the rapid detection time make it possible for the vehicle-mounted electronic nose to rapidly detect vehicle exhaust pollutants in real time.
4. Conclusions
In this paper, a self-designed electronic nose composed of 12 gas sensors was used to detect vehicle exhaust pollutants from different engines or the same engine at different concentration levels. Firstly, we conducted an engine bench test to summarize the concentration variation law vehicle exhaust pollutants. After analyzing the experimental data and extracting the features, the highest RF classification recognition rate was up to 99.92% without optimizing the sensor array. In order to enable the vehicle-mounted electronic nose to quickly detect vehicle exhaust pollutants in real time, reduce the volume, save development cost, and save detection time, we used RFECV, RFFS, and PCA to optimize the sensor array. When the number of sensors was not fixed, the classification recognition rate of MAX after using RFECV and RFFS methods reached 99.77% and 99.44% respectively, while RFECV retained less sensors. When the number of sensors was limited to six, the classification recognition rate of MAX after using RFFS was up to 99.77%, as high as that of RFECV. The classification recognition rate of WT after using RFECV was higher than using RFFS, which means RFECV is a better sensor array optimization method in this case. The cost of the sensor array optimized by RFECV method and using MAX as the feature extraction method is only 32$ with almost no loss of recognition rate. In addition, its average detection time of a new sample was 0.021 s.
In summary, through the research in this paper, we found that it is feasible to use electronic nose for real-time and rapid detection of vehicle exhaust pollutants, and electronic nose has a good application prospect in this area. At the same time, combined with the sensor array optimization methods, the electronic nose can be miniaturized and low-cost, which makes it possible to detect vehicle exhaust pollutants in real time. In the future, we will use the vehicle-mounted electronic nose in combination with the developing edge computing and cloud computing technologies to develop corresponding cloud calculating platforms. We can monitor vehicle exhaust pollutants in real time, and make corresponding predictions regarding air quality.