1. Introduction
International agreements such as the Paris Agreement call for significant reductions in carbon emissions to mitigate global warming [
1,
2,
3]. China aims to achieve a carbon peak by 2030 and is striving to achieve carbon neutrality by 2060. The transportation industry plays an important role in this process. Carbon emissions from heavy-duty vehicles (HDVs) are among the significant contributors to greenhouse gas emissions [
4,
5,
6]. Although HDVs account for a small proportion of vehicle ownership, nitrogen oxides and particulate matter emissions are huge, accounting for 83.5% and 90.1% of total vehicle emissions, respectively [
7,
8]. To reduce these harmful effects on the atmospheric environment and human health, heavy-duty diesel vehicles (HDDVs) have mainly applied selective catalytic reduction technology by adding urea to reduce nitrogen oxide emissions such that vehicles could reach the level of the China V and China VI emission standards [
9]. However, due to the material cost, some vehicle owners use various cheating methods to invalidate the nitrogen oxide control device or use inferior urea water solutions. Therefore, it is necessary to strengthen the supervision measures and law enforcement methods in this regard.
Some gas stations have not obtained the relevant permits in accordance with the law to engage in oil business activities without the qualification of dangerous goods business, which may result in serious safety hazards. The fuel sold by these gas stations cannot be supervised effectively, and uneven quality could lead to excessive emissions of vehicle nitrogen oxides and other pollutants. Additionally, lacking fuel and gas recovery devices may result in fuel leakage when refueling and unloading, which could cause direct pollution of the atmosphere, soil, and groundwater.
The current method for detecting fuel/urea refueling behavior mainly includes setting up vehicle identification and verification devices at gas stations and then using wireless communication to transmit vehicle fuel/urea refueling information to a remote server. The high cost, including the timely update rate and reading and writing storage data structure, has caused it to be unable to provide useful information to detect unregistered gas/urea stations and also their related fuel or urea refueling behaviors.
Gas station recognition is essentially outlier identification based on group behaviors. Few related studies have addressed this issue. However, some similar studies have been reviewed as follows: outlier identification is currently widely used in fault diagnosis, multimedia traffic detection, vehicle status detection, video and audio detection, and other fields. Muhammad et al. [
10] applied an outlier detector based on Gosset’s (Student’s)
t-test to distinguish a faulty condition and identify the nature of the fault for large-scale grid-tied photovoltaic power plants. Sahil et al. [
11] proposed a hybrid deep-learning-based anomaly detection scheme for suspicious flow detection in the context of social multimedia. The anomaly detection module was based on leverage improved restricted Boltzmann machine and gradient descent-based support vector machine. Morteza et al. [
12] developed an ARIMA-based anomaly detection framework to identify abnormal states of the vehicles based on the multiple-channel operating time-series data. The state anomaly is captured by the deviation of real-time values at different channels from the predictions. Rui-kai et al. [
13] proposed an audio-based algorithm to detect faults of pumps of air-conditioning systems, which can monitor the abnormal sound of pumps, utilizing Fourier transform, a finite impulse response digital filter, and an autoregressive integrated moving average model.
In addition, with the rapid development of computer technology, researchers have proposed many new anomaly detection methods. Wen [
14] presented the problem of detecting outliers in fixation gaze data through a novel mixed-integer optimization formulation and subsequently strengthened the formulation using two geometric arguments to provide enhanced bounds. Ji-hao [
15] compared eight common anomaly detection methods based on the statistical distribution of data and features to detect anomalies in real-time body weight (BW) recorded by a precision feeding (PF) system. Ekin [
16] proposed a new nonparametric outlier detection technique in the preprocessing stage of data analyses, which was based on the frequency-domain and Fourier transform definitions, called the frequency-domain based outlier detection (FOD). Hee [
17] proposed a new two-stage procedure for detecting multiple outliers when the dimension of the data is much larger than the available sample size.
However, the existing methods lack research on road infrastructure identification, and the anomaly detection algorithms are computationally intensive and have few engineering applications. Research based on spatiotemporal GPS trajectory focuses on the driving behavior of vehicles and lacks the analysis of vehicle behavior during the stay. In response to the current problems of the high cost of gas station information collection, long-term update cycle and traditional oil analysis methods are very labor intensive. This study fully utilizes the vehicle trajectory data and provides the possibility of data-driven gas station location identification, fuel/urea refill behavior, and oil quality analysis. With the development of Internet of Vehicles technology, a large amount of vehicle monitoring data has been collected and integrated. China has successfully built a Heavy-duty Vehicle Emission Service and Management Platform (HVESMP) based on the China VI Emission Standard, and the sampling period of vehicle data is 1 s. The data are wirelessly transmitted to the cloud platform via T-BOX. The data include key parameter information such as vehicle speed, engine speed, engine fuel flow, etc. This paper implements a highly robust recognition method for the spatiotemporal characteristics of mobile vehicle fuel/urea refueling behavior in mobile source big data scenarios. This proposed method innovatively uses the time-series data change curve of the fuel tank or urea level detected by the vehicle-mounted T-BOX to detect the fuel/urea refueling behavior, which can ensure the robustness of the recognition of the fuel/urea refueling behavior. Additionally, the method can not only reduce the omission and lack of information but can also process large data scales. At the same time, in the process of identifying the behavior of fuel/urea refueling, the CART algorithm is introduced to reduce the interference of sensor noise, which can improve the recognition accuracy. In addition, this paper uses accumulative mileage, fuel/urea consumption/100 km, and the DBSCAN clustering algorithm to verify the accuracy of recognition results.
The remainder of this paper is structured as follows:
Section 2 introduces the system framework, including gas stations recognition and oil quality analysis.
Section 3 gives a brief introduction of the proposed method for identifying gas stations and evaluating oil quality.
Section 4 provides the data collection method and shows a recognition result of Tangshan, China. Finally, the key conclusions are summarized in
Section 5.
3. Gas Stations Recognition Model
3.1. Data Preprocessing
In the process of platform data collection, there are many interference factors, such as onboard sensors, transmission lines, transmission network signals, and so on. It is very common to have invalid values and missing values in the original data. So, data preprocessing is an important step in the entire data analysis process [
20]. For example, if the collection time of multiple frames is the same, only one frame of data can be selected, and also when the fuel tank level error data and missing data continuously exceeds 10 frames, the corresponding problem data frame would be directly deleted. In addition, in order to smooth the noise data existing in the sensor and data transmission process, this paper will adopt the moving average filter. The filter window size is set to 10 in this paper.
3.2. Refueling Feature Detection Window Selection
After data preprocessing and filtering, a full historical data set of the vehicle fuel tank level that can be analyzed is formed. The data volume is very large, and the hardware required for data processing at the same time is extremely high. Therefore, in order to realize the rapid screening of the characteristics of the refueling behavior data and improve the calculation efficiency of the algorithm, this paper divides the entire data into several basic data analysis windows according to the time series. The length of the window is 900 s (15 min). The full data of vehicle fuel tank level is represented as:
where
is the
i-th basic data analysis window, and
i = 1, 2, …,
n.
n represents the total number of basic data analysis windows. Using
,
representing the maximum and minimum fuel tank level of the
i-th basic data analysis window, respectively, the fuel tank level extremum matrix can be expressed as:
Furthermore, the matrix elements in (2) are sorted by time index to obtain the fuel tank level extremum series. Considering that the time sequence of the two extreme values in the same basic data analysis window is uncertain, and also in order to improve the efficiency of subsequent calculations, the basic windows that meet the rules will be merged. If the maximum/minimum values of two consecutive windows are adjacent in the extremum series, the corresponding two basic windows are merged, and the maximum/minimum value in the extreme value is updated as the maximum/minimum value of the new window. Finally, the number of updated basic data analysis windows is
x, and the full data of vehicle fuel tank level can also be expressed as:
where
is the
j-th basic data analysis window after the windows are merged, and
i = 1, 2, …,
x.
Then, the new fuel tank level extremum matrix after shape transformation and windows merging can be expressed as:
The difference between adjacent values in Matrix (4) is calculated, and the absolute value is taken. The result is as follows:
where
is the
k-th tank level difference, and
k = 1, 2, …, 2
x − 1.
Based on the feature that the fuel tank level suddenly increases in the fueling data windows, the judgment threshold values
and
are set to construct the filtering conditions of the fueling feature detection window. The specific conditions are as follows:
If the above conditions are met, it indicates that the fluctuation range of the fuel tank level in the basic windows corresponding to the
k − 1,
k + 1 difference is relatively stable. The change in the level of the basic window corresponding to the
k-th difference is likely to be caused by the user’s fueling behavior rather than the sensor noise. Therefore, the value of k that meets the above conditions is recorded, considering that the input data of the CART algorithm needs to ensure enough data frames, so the basic data analysis window corresponding to the
k − 1,
k,
k + 1 index is extracted as the refueling feature detection window. The result can be expressed as
3.3. Gas Stations Recognition Method
Based on historical operating data, we constructed a CART classification model, which can identify whether it is a mutation threshold point through the sequence points in the input window. Each value in the level value sequence is used as an input to the CART model. The output value is determined by the Gini coefficient using
uy and
Y representing the
y-th value in the detection window and the length of the detection window matrix, respectively. The loss function of the CART algorithm can be expressed as:
where
is the average of the first
y values,
is the average of the rest values, and
xi and
xj are the
ith and
jth liquid level value in the detection window.
Finally, the y value is obtained when the loss function is the smallest. It is used as the refueling point where the tank level value changes suddenly, and the corresponding time stamp and latitude and longitude position are recorded. At the same time, the height t of the tank level value in the window is sorted to obtain the maximum and minimum values as the liquid level values after and before refueling, respectively.
In a relatively long time scale, the repeated refueling behavior of diesel vehicles at a fixed gas station shows spatial aggregation. Therefore, Grid-Search tuning and the DBSCAN clustering algorithm are used to cluster a large number of the identified refueling points within a certain period of time. The locations of the refueling behavior and the locations of the registered legal gas station are used as input to the DBSCAN model. If there is a possible station near the refueling location, it will be recognized as refueling at a legal station; otherwise, it will be recognized as an outlier, which means an illegal station. Then, the area and the total amount of refueling of the cluster rectangular box are calculated, which are determined by the four values of the maximum/minimum latitude and longitude of all points in the cluster. Based on the result, the optimal parameters as the gas station location are selected.
The clustering results can be matched with the existing registered gas station data based on the distance threshold. If the registered gas station cannot be matched, it may be an unregistered gas station, and this method is an auxiliary basis for the investigation of unregistered gas stations.
3.4. Oil Quality Analysis
The quality of oil sold at unregistered gas stations cannot be monitored, and substandard-quality fuel leads to increased vehicle fuel consumption and excessive vehicle emissions. Therefore, based on the collected data, vehicle fuel consumption and NOx emissions are calculated after each refueling act as a basis for judging the oil quality of the gas station and validating the gas station identification algorithm.
The first step is to preprocess the raw data to ensure the accuracy of the NOx emission calculation, extract the data suitable for the calculation, and exclude the emission exceedance caused by the abnormal operation of the SCR system. The rules for data screening are as follows:
- (1)
Urea level: >0;
- (2)
Atmospheric pressure: >76 kpa;
- (3)
SCR inlet temperature: >200 °C;
- (4)
Engine cooling temperature: >75 °C.
After the data has been processed, the NOx emission rate of the vehicle can be obtained from Equation (9) [
9].
where
ENOx (g/h) is the NOx emission rate,
Qfv (L/h) is the engine fuel flow rate,
Qam (kg/h) is the mass air flow,
PNOx(down) (ppm) is the SCR downstream NOx sensor output,
ρa is the air density, and
M is the molecular weight of NOx.
M = 30.4 according to NO: NO
2 = 95:5, and
ρf is the diesel density.
In a realistic driving environment, drivers tend to refuel before they run out of fuel, and the refueling behavior is shown in
Figure 2. When using vehicle emissions as an oil quality assessment indicator, it is necessary to adjust the weight distribution of emissions after one refueling act according to the percentage distribution of refueling volume at each refueling station, and the adjusted NOx emission rate of the refueling station is calculated, as shown in Equation (10).
where
Ti is the NOx emission rate of vehicle
Ni in a single day at gas station
Si,
Qi is the amount of fuel remaining in the tank before the No.
i refueling,
Vi is the amount of fuel added for the No.
i refueling,
Ei is the NOx emission rate between the No.
i refueling and the No.
i + 1 refueling, and
n is the number of refueling of the vehicle after the No.
i refueling act.
The NOx emission rate per vehicle at each gas station was calculated using the above algorithm, and the daily NOx emissions were further calculated using 10 h of driving per day. The results are shown in
Figure 3. Each dot in the diagram represents a refueling act. The orange dots represent vehicles refueling at unregistered gas stations, the purple dots represent refueling at legal gas stations, and it can be concluded that there is a significant correlation between single-vehicle NOx emissions and the oil quality of gas stations. Vehicles refueling at unregistered gas stations for a long time emitted a daily average of 82 g of NOx much higher than 22 g after refueling at legal gas stations. There was a steep rise in NOx emissions after each unregistered refueling act, which caused serious pollution to the environment.
4. Case Study and Discussion
To verify the effectiveness of the method proposed in this paper, the historical operation data of all diesel vehicles in Tangshan area from November 2019 to March 2020 were retrieved from the platform. The model calculation was performed according to the above-mentioned gas station identification method. The overall distribution of detected refueling points in Tangshan area is shown in
Figure 4.
In the identification results, there are a total of 5086 diesel vehicles that were online during this time period, and also there are a total of 71,087 refueling points in Tangshan. After using the DBSCAN clustering algorithm, 523 suspected gas station locations were obtained, in which a total of 303 gas stations successfully matched the registered legal gas station. It can be inferred that there are a total of 220 suspected unregistered gas stations. Therefore, the proposed method in this paper can effectively identify the refueling point time and location information based on the vehicle operation big data, thereby realizing effective detection and precise positioning of fueling stations.
In addition, we selected a partially enlarged view of a certain area for further analysis, as shown in
Figure 5. It can be seen that the extracted refueling points show obvious aggregation characteristics, and the cluster spatial position formed by the aggregation matches well with the registered gas stations. Some categories cannot match known gas stations but have obvious aggregation characteristics. So, there is a certain probability that the clusters are unregistered gas stations. About 8% of the data points in the refueling points did not match the appropriate cluster centers. However, after a comparative analysis with the actual gas station situation, 96% of the suspected gas stations were found to be real illegal gas stations. According to the data quality analysis of a single refueling point, it may be related to the time positioning error caused by the lack of data and the spatial positioning error caused by GPS drift.
In order to further analyze the oil quality of the gas stations in the identification results, based on the identification results in Tangshan area, five registered gas stations and four unregistered gas stations were selected. Based on the identified refueling points, the operation data of heavy diesel vehicles of the same model that had refueling behavior at these gas stations during June 2021 were extracted, which is shown in
Figure 6 and
Figure 7. On this basis, daily emissions of all refueling vehicles in these gas stations can be calculated based on the NOx emission rate per minute of a single vehicle and vehicle running time.
Figure 8 shows the NOx emissions of all refueling vehicles in June for the nine gas stations identified in Tangshan area, of which Nos. 1–5 are legal gas stations, and Nos. 6–9 are unregistered gas stations. It can be seen that there is a significant difference between unregistered and legal gas stations in terms of the amount of data for refueling vehicles and the NOx emissions from a single vehicle. Although the emissions from different vehicles at the same station vary greatly due to the different maintenance conditions and driving habits of drivers, the overall emissions from unregistered stations are much higher than those from legal stations. In addition, the daily emissions from legal gas stations are relatively stable, within the range of 0–40 g, while the daily emissions from unregistered gas stations fluctuate greatly, within the range of 0–300 g. This demonstrates the validity of single-vehicle NOx emissions as an assessment of oil quality at gas stations.
In addition, in order to analyze the overall situation of oil quality of the nine gas stations in June, the total NOx emissions and the total number of refueling vehicles for each of the nine gas stations in June were obtained. The result is shown in
Figure 9. It is indicated that the total number of refueling vehicles in the registered gas stations is generally more than that of the unregistered gas station. However, the number of refueling vehicles at some unregistered gas stations is similar to that of the registered gas station. In terms of total NOx emissions, the maximum emissions of unregistered gas stations are 8005.72 g, and the minimum emissions are 1739.17 g. Overall, the average emissions of the four unregistered gas stations are 4373.41 g, and the average emissions of the five registered gas stations are 244.54 g, which is a 17.88 times difference. It can be seen that the oil quality of the unregistered gas stations is extremely poor, which can easily cause serious environmental pollution problems.
Further, in order to analyze the stability of oil quality in the nine gas stations, the monthly average total NOx emissions per vehicle for these stations from January to June was calculated, as shown in
Figure 9. It can be found that there is a significant difference between the monthly NOx emissions from both unregistered and legal gas stations. The average monthly emissions from unregistered gas stations fluctuate in range and have extremely large emission values, and the quality of the oil is extremely unstable. On the contrary, legal gas stations are below 400 g and remain largely stable over time. Based on this conclusion, it is possible to distinguish between legal and unregistered gas stations by setting a threshold value for total monthly NOx emissions, which was set to 400/g in this batch of data, which can accurately distinguish between unregistered and legal gas stations. In addition, the above analysis further verifies the validity and reliability of the unregistered gas station identification method proposed in this paper.
In summary, the gas station identification method proposed in this paper was successfully applied to Tangshan area. Based on the oil quality analysis results of the identified gas stations, the accuracy of the identification of gas stations was further verified. Based on the full amount of diesel vehicle operation data, a regional gas station supervision map was constructed, which effectively detected and located gas stations in Tangshan area. In addition, the results provide a reliable and reasonable basis for the ecological protection department law enforcement and supervision.