Prediction of Air Quality Combining Wavelet Transform, DCCA Correlation Analysis and LSTM Model
Round 1
Reviewer 1 Report
Dear Authors
I have now completed the review of the manuscript titled “Prediction of air quality combining wavelet transform, DCCA correlation analysis and LSTM model”. This study proposes a three-step method of data analysis, in which the t-SNE method was used to visualize and reduce the dimensionality of the data, and the k-median method was used to seek general relationships and relationships between groups of data. The topic is quite interesting and relevant. I have few comments to improve the quality and clarity of the manuscript.
1. Flow is missing from the Abstract, first author discussed about the impact of global climate change and suddenly started LSTM in the second statement.
- Line 29-31: The Air Quality Index (AQI)…. of air pollution” should be supported by [1].
- Line 70-73: The utility of LSTM should be supported by more natural/environmental monitoring or with weather analysis and prediction [2-3]. 1: SMOTEDNN: A Novel Model for Air Pollution Forecasting and AQI Classification; 2: CDLSTM: A Novel Model for Climate Change Forecasting; 3. Deep Learning Based Modeling of Groundwater Storage Change
- Section 2.4. Long short-term memory neural network: This section is very important however it seems that LSTM itself is developed within this study. The idea of citing relevant works to support through the literature is missing.
- Section 3.2.1 to 3.2.4 all these correlations in different sub heading should be represent using one heatmap.
- I suggest to add NSE as evaluation metrics.
- The authors should clarify the optimization for the models.
- Authors should add the computational complexity of the model, see, DNNBoT, PCCNN.
- Results and discussion need limitations and the future scope of the investigation.
Author Response
- Reply to Reviewer 1
Dear reviewers, Thank you for reading and sharing such insightful feedback.
- Flow is missing from the Abstract, first author discussed about the impact of global climate change and suddenly started LSTM in the second statement.
Reply: As a transitional statement, the extensive usage of machine learning for time series prediction is introduced to the abstract.
- Line 29-31: The Air Quality Index (AQI)…. of air pollution” should be supported by [1].
Reply: Added citations to the article Line 33-35
- Line 70-73: The utility of LSTM should be supported by more natural/environmental monitoring or with weather analysis and prediction [2-3]. 1: SMOTEDNN: A Novel Model for Air Pollution Forecasting and AQI Classification; 2: CDLSTM: A Novel Model for Climate Change Forecasting; 3. Deep Learning Based Modeling of Groundwater Storage Change
Reply: Added extra environment- and nature-related LSTM descriptions and references to the unit of LSTM introduction.
- Section 2.4. Long short-term memory neural network: This section is very important however it seems that LSTM itself is developed within this study. The idea of citing relevant works to support through the literature is missing.
Reply: We've added a reference to section 2.4 per your excellent suggestion, dear commenter.
- Section 3.2.1 to 3.2.4 all these correlations in different sub heading should be represent using one heatmap.
Reply: Dear reviewers, the sections 3.2.1 to 3.2.4 provide explanations for the various color curves in Figure 3. Section 3.2.1 exhibits the number of DCCA interrelationships between temperature and AQI. Section 3.2.2 illustrates the number of DCCA interrelationships between humidity and AQI. Section 3.2.3 illustrates the number of DCCA interrelationships between air pressure and AQI.
- I suggest to add NSE as evaluation metrics.
Reply: To the commenter: Because of the computer, we did not include NSE as an assessment indicator; if you require us to include it, we will do so in the future; thank you for the recommendation!
- The authors should clarify the optimization for the models.
Reply: Dear Reviewer, we have rewritten the conclusion section to illustrate the optimization of the model.
- Authors should add the computational complexity of the model, see, DNNBoT, PCCNN.
Reply: Dear reviewer, Regarding the computational complexity section, we read the articles PCCNN and DNNBoT. However, we only used a simple forward LSTM in the Tensorflow framework and did not include this section due to equipment and time constraints. We are willing to add the results of this section of the analysis later if possible.
- Results and discussion need limitations and the future scope of the investigation.
Reply: Dear reviewer, In the discussion area, future applications have been added.
Reviewer 2 Report
Please find attached file
Comments for author File: Comments.pdf
Author Response
- Reply to Reviewer 2
Dear reviewers, Thank you for reading and sharing such insightful feedback.
1.Please add more details to the abstract regarding air pollutants and meteorological elements
Reply: We have expanded and clarified the summary of air contaminants and meteorological parameters.
- briefly introduce DCCA
Reply: The briefly introduce has been added in DCCA introduce section.
- SO2, NO2: see previous comment
Reply: The symbols have been harmonized throughout the text.
- this is from the template!
Reply: We are sorry for this error and have removed this item
- provide units
Reply: Picture 2 reflects a unitary addition.
- incomplete sentence
Reply: Revised and corrected the syntax of the article
- how can you explain? usually wind speed is a very important parameter in air quality
Reply: Dear reviewers, Table 2's DCCA coefficients in the 24-hour time window indicate that the A-W coefficient is approximately -0.125 at this time, which is within the interval [-0.2,0.2], indicating that the association between wind speed data and AQI is very poor, hence no wind speed data are used for AQI prediction. The fundamental idea is to investigate the DCCA coefficients of the two time series. The two types of time series with low coefficients are classified as uncorrelated or poor correlation and are not included in the forecasting work of the model.
Reviewer 3 Report
The manuscript compare two different approches to predict the AQI for 4 window sizes (24, 48, 168 and 5000 hores). The authors affirm that to combine DCCA with LSTM model is better than the only LSTM to increase the performance and reduce the running time (with wavelet in preprocessing).
Specifically, in the conclusion section, it is said that the results show that combining DCCA with LSTM is cost-effective both in terms of performance (MSE) and in terms of running duration.
As far as performance is concerned, I believe that the two models are comparable (Table 3 shows that the increase is to three decimal places).
In addition, the running duration is already reduced with the LSTM model that a reduction of a few seconds does not deserve attention and it should not be forgotten that for the DCCA-LSTM model the execution time of the DCCA alone must also be considered.
Although there are no particular benefits in this particular case study, readers could use this study as a model and evaluate it in their own region.
Comments:
1) In Figure 2 you wrote time(s). It means that the time is measured in seconds? At line 103 you wrote hourly resolution.
2) Figure 2. When a figure is shown, it is desirable that in the text this figure is quoted and explained.
3) In Table 1, it would help to show a comparison between the measured and reconstructed data.
I suppose that the SNR value shown in table 1 concerns only the reconstructed signal while the RMSE was calculated by comparing the two data series, the measured one and the reconstructed one. If it is correct, it would be interesting to estimate the SNR also for the measured data.
4) line 195, 203, 211, 216: The reference figure I suppose is 3, not 4.
5) line 198: could be n<1000 instead of n1000?
6) line 186: Never before you have talked about DPCC coefficients. In the plot of figuer 3 it is indicated DCCA coefficient.
7) Line 230-232: How it is possible to use AQI as input and output at the same time?
Could be that AQI in output refers to future instants with respect to AQI values in input?
If it is correct, it would be better to clarify this.
8) Table 2. When a table is reported in the document, it is desirable that in the text this table is quoted and commented.
9) line 236: the vertical coordinates represent the true AQI … Could be represent the AQI values.
10) The text does not refer to the comparison of the two panels (a) and (b) of figs 4, 5 and 6.
Small reports:
a) ABSTRACT: it is usefull to specify that window size is in houres
b) line 32: So2, No2 ………… SO2, NO2
c) line 43: al ……………… “al.”
d) line 57: detrended correlation analysis …………… detrended cross-correlation analysis
c) lines 66, 73, 76, 79, 81, 85, 88: I suggest to move the reference after “et al.”
d) I suggest to define all abbreviations used in the text such us SVM (line 81), CNN-LSTM (line 74), …
e) line 104: So2, No2, Pm2,5 ………...SO2, NO2, PM2,5
f) line 158: there are 2 points more
g) figure 2: I suggest to add Units of measurement in the plot.
h) line 214, 249, 252: AP….A-P (as reported in figure 3)
i) line 247: AW….A-W (as reported in figure 3)
l) line 285: point
Author Response
- Reply to Reviewer 3
Dear reviewers, Thank you for reading and sharing such insightful feedback.
- In Figure 2 you wrote time(s). It means that the time is measured in seconds? At line 103 you wrote hourly resolution.
Reply: Horizontal coordinate unit mistake in Figure 2 has been corrected.
- Figure 2. When a figure is shown, it is desirable that in the text this figure is quoted and explained.
Reply: We have quoted and explained below Figure 2.
- In Table 1, it would help to show a comparison between the measured and reconstructed data.I suppose that the SNR value shown in table 1 concerns only the reconstructed signal while the RMSE was calculated by comparing the two data series, the measured one and the reconstructed one. If it is correct, it would be interesting to estimate the SNR also for the measured data.
Reply: Dear Reviewer, The SNR in Table 1 compares the signal-to-noise ratio between the original time series signal and the time series signal after using the wavelet transform noise reduction technique.
- line 195, 203, 211, 216: The reference figure I suppose is 3, not 4.
Reply: This error has been corrected in the text.
- line 198: could be n<1000 instead of n1000?
Reply: This error has been corrected in the text.
- line 186: Never before you have talked about DPCC coefficients. In the plot of figuer 3 it is indicated DCCA coefficient.
Reply: This error has been corrected in the text.
- Line 230-232: How it is possible to use AQI as input and output at the same time?Could be that AQI in output refers to future instants with respect to AQI values in input?If it is correct, it would be better to clarify this.
Reply: Dear commenter, the explanation for this problem is that AQI, temperature, humidity, barometric pressure, and wind speed from the previous moment are utilized as inputs to calculate AQI values for the next moment.
- Table 2. When a table is reported in the document, it is desirable that in the text this table is quoted and commented.
Reply: Diagrams and tables have been added to the text to explain.
- line 236: the vertical coordinates represent the true AQI … Could be represent the AQI values.
Reply: This error has been corrected in the text.
- The text does not refer to the comparison of the two panels (a) and (b) of figs 4, 5 and 6.
Reply: Figures 4-figure 6 depict the visualization of the model prediction results, while Tables 3 indicate numerically the comparison of model effects.
- Small reports:
- a) ABSTRACT: it is usefull to specify that window size is in houres
- b) line 32: So2, No2 ………… SO2, NO2
- c) line 43: al ……………… “al.”
- d) line 57: detrended correlation analysis …………… detrended cross-correlation analysis
- e) lines 66, 73, 76, 79, 81, 85, 88: I suggest to move the reference after “et al.”
- f) I suggest to define all abbreviationsused in the text such us SVM (line 81), CNN-LSTM (line 74), …
- g) line 104: So2, No2, Pm2,5 ………...SO2, NO2, PM2,5
- h) line 158: there are 2 points more
- i) figure 2: I suggest to add Units of measurement in the plot.
- j) line 214, 249, 252: AP….A-P (as reported in figure 3)
- k) line 247: AW….A-W (as reported in figure 3)
- l) line 285: poin
Reply: The above suggestions have been amended in the text. Thank you very much for your careful reading.
Reviewer 4 Report
Some spaces missing around literature references "Author[Nr]next Word"
Fig. 2: Suggestion: scaling in 1800 or 3600s intervals rather than 2500
Reference in Text should be given to Fig. 2 and Fig. 3
Fig.2 should include units (°C, %, hPa, m/s)
Where does data in Fig.2 come from? Did you measure them at a wheather station? No explanation given on which date data was taken, etc.
Why does temperature rise to ~30°C at 2500s, then drop to 15°C at 7000-7500s and then increase back to ~30°C? Such short cylces are against experience (drop and increase by 15°C within 3-4h)
Why does RH drop with temperature? I would expect the oppsosite, when T drops, RH should increase
§3.2.1 to 3.2.4 reference Figure 4. Should it be Figure 3?
This section discusses the relationship between AQI and the wheather conditions. However, no information on determined AQI for the test data has been given? So you relate to something unknown to the reader.
Fig. 4 to Fig. 6 are barely legible
The added value of these plots is questionable, as it essentially shows somewhat larger blue spikes than orange ones.
Fig 7.: barely legible. Caption says 5000h, but plot only shows 1000h
The paper lacks discussion of deviation quanitification and a statement to the imporance of these deviations. Was the model compared agains AQI data derived from measurement (not clear)? What is the benefit of this model, how will it help to predict AQI and eventually trigger counter-measures if it is too poor. THe introduction mentions quantities like PM, NO, etc. influencing the AQI, but how where these signals for your comparison?
General comments: Figures lack introduction/refenece/explanation in the text. Figures are bad resolution and do not follow a common style for the paper.
Author Response
- Reply to Reviewer 4
Dear reviewers, Thank you for reading and sharing such insightful feedback.
- Some spaces missing around literature references "Author[Nr]next Word"
Reply: Appropriate modifications have been made in the manuscript.
- 2: Suggestion: scaling in 1800 or 3600s intervals rather than 2500
Reply: Dear Reviewer, Here is a comparison visualization display of the time series after wavelet transform noise reduction processing. The total length of the time series is 19432, therefore the display effect is improved by the division by 2500.
- Reference in Text should be given to Fig. 2 and Fig. 3
Reply: Reference explanations for Figures 2 and 3 have been added to the text
- 2 should include units (°C, %, hPa, m/s)
Reply: Figure 2 adds the vertical coordinate unit representation.
- Where does data in Fig.2 come from? Did you measure them at a wheather station? No explanation given on which date data was taken, etc.
Reply: Dear Reviewer, this study utilizes data from the China Graduate Student Mathematical Modeling Competition, which has a 2-year time span (April 2019 to July 2021), hourly resolution, and a 10-dimensional dataset containing SO2, NO2, PM10, PM2.5, CO, O3, temperature, barometric pressure, humidity, and wind speed, with a total of 252,616 valid values.
- Why does temperature rise to ~30°C at 2500s, then drop to 15°C at 7000-7500s and then increase back to ~30°C? Such short cylces are against experience (drop and increase by 15°C within 3-4h)
Reply: Dear Reviewer, Thank you for your feedback. The temperature profile reflects a seasonal cycle, and as our data ranges from April 2019 to July 2021, the change in temperature reflects the seasonal change in the region precisely.
- Why does RH drop with temperature? I would expect the oppsosite, when T drops, RH should increase
Reply: Dear Reviewer, I am sorry for RH, I didn't understand very well. If you are referring to the data on Figure 2, the visualization depicted here is the time series over 19,432 hours, and the China Graduate Mathematics Competition is the source of the data. Figure 4 displays the DCCA intercorrelation index curve between the air pollution index (AQI) and meteorological parameters (temperature, humidity, barometric pressure, and wind speed).
- 3.2.1 to 3.2.4 reference Figure 4. Should it be Figure 3?
Reply: Appropriate modifications have been made in the manuscript.
- This section discusses the relationship between AQI and the wheather conditions. However, no information on determined AQI for the test data has been given? So you relate to something unknown to the reader.
Reply: The AQI values in the text are derived using meteorological pollutant data (SO2, NO2, PM10, PM2.5, O3, CO) according to the 2.2 section methodology.
- 4 to Fig. 6 are barely legible
Reply: Thank you very much for your careful review, Figures 4 through 6 have been resized from 300dpi to 600dpi as a result of the present computer equipment. If there are still deficiencies, I hope to continue to improve the image quality in the follow-up.
Round 2
Reviewer 1 Report
Dear Authors
I have now completed the review of the revised manuscript, titled " Prediction of air quality combining wavelet transform, DCCA correlation analysis and LSTM model”. I have observed that the authors put in good efforts to address most of the comments satisfactorily.
Best wishes
Author Response
Dear reviewer, thank you for reprocessing the paper and taking the time to review our work.Wish you a happy life and all the best!Reviewer 3 Report
In this new version it seems to me to be better.
Author Response
Dear reviewer, Thank you for reprocessing the paper and taking the time to review our work.Wish you a happy life and all the best!Reviewer 4 Report
Thank you for your revisions. I would suggest you to include the clarifications you provided to me into the paper, then the interpretation becomes clearer. Else, please see comments in the file:
Reply: Dear Reviewer, this study utilizes data from the China Graduate Student Mathematical Modeling Competition, which has a 2-year time span (April 2019 to July 2021), hourly resolution, and a 10-dimensional dataset containing SO2, NO2, PM10, PM2.5, CO, O3, temperature, barometric pressure, humidity, and wind speed, with a total of 252,616 valid values.
Reply: Dear Reviewer, Thank you for your feedback. The temperature profile reflects a seasonal cycle, and as our data ranges from April 2019 to July 2021, the change in temperature reflects the seasonal change in the region precisely.
Comments for author File: Comments.pdf
Author Response
Dear reviewer, thank you for reprocessing the paper and taking the time to review our work. In response to your suggestions, we have edited the manuscript. Following is a detailed point-by-point response.
- I would suggest you to include the clarifications you provided to me into the paper, then the interpretation becomes clearer.
Reply: Dear reviewer, we have added the response notes to the material description and the explanation below Figure2, we want to make our work more obvious.
- Humidity in line 199.
Reply: Corrected to pressure.
- sure? 1 year = 8760h, thus the larger window sizes cover >2years?
Reply: Dear reviewer, our data ranges from April 2019 to July 2021, with a data length of 19,432 hours, and we've clarified the errors in the text.
Wish you a happy life and all the best!