Application of Statistical Learning Algorithms in Thermal Stress Assessment in Comparison with the Expert Judgment Inherent to the Universal Thermal Climate Index (UTCI)
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsDear Editor and Authors:
The proposed investigation mainly used several machine learning algorithms to analyze and predict the UTCI. The study extended the application of machine learning algorithms. The detailed comments are presented as below:
1) In the common machine learning approaches, it’s highly recommended to do the pre-analysis of data, including data cleaning, independence analysis, and data preprocessing. Especially for data preprocessing, it can significantly affect the machine learning results. Please explain the reason for not using data preprocessing.
2) The training data is important to the performance of machine learning algorithms. Besides focusing on the total number of training data in the manuscript, it’s recommended to show the total distribution of the whole data. In addition, what are the resources of the data (from experiment, observation, or simulation)?
3) It’s not enough to evaluate the prediction performance by only using RMSE. It will be more convictive by showing the combination of different performance metrics, such as R-square and RMSE.
4) The regression results in Figure A1 is not fully analyzed. What are the contributions of the regression analysis in Figure A1 for the proposed investigation?
5) The title of this manuscript is “ ... compared the expert judgement ...”. However, the expert judgement is only used as the training data for the machine learning algorithms. The comparison between the machine learning and the expert judgement is not clear. It’s suggested to present the detailed comparison or to revise the title of the manuscript.
6) It’s still not clear about the reasons of low accuracy for PCA and t-SNE analysis. It may attract more research interests. It’s suggested to further explain the results to attract the potential readers.
7) The conclusions are highly suggested to be revised.
For example, the statement “the potential supportive role for AI and the utilized SL algorithms..” is not clear. How is AI used and discussed in the manuscript?
Besides, the conclusions are not fully covered the main research results in the manuscript.
Author Response
Reviewer 1
The proposed investigation mainly used several machine learning algorithms to analyze and predict the UTCI. The study extended the application of machine learning algorithms. The detailed comments are presented as below:
We are grateful to this reviewer for the positive attitude towards our study and for the constructive comments, which we have addressed as detailed below.
1) In the common machine learning approaches, it’s highly recommended to do the pre-analysis of data, including data cleaning, independence analysis, and data preprocessing. Especially for data preprocessing, it can significantly affect the machine learning results. Please explain the reason for not using data preprocessing.
Thank you for pointing us to the relevant topic on data preprocessing. In fact, we had made limited use of data preprocessing by applying log-transformation to one variable. We have added corresponding information to Figures 2 and A1, and a paragraph at the beginning of the “Data analysis” section 2.2, which reads as:
“As the data were simulated by the UTCI-Fiala model [27], data cleaning was not necessary. To restrict (data) expert input to the algorithms, we applied only limited data preprocessing by log-transforming the percentage skin blood flow data (VblSk).”
2) The training data is important to the performance of machine learning algorithms. Besides focusing on the total number of training data in the manuscript, it’s recommended to show the total distribution of the whole data. In addition, what are the resources of the data (from experiment, observation, or simulation)?
We agree with this comment concerning the importance of showing the data distribution and had therefore included Figure 2 showing the distribution of the training data and Figure A1 showing the distribution of the combined training (reference) and test (non-reference) data, respectively. We now include a corresponding text referring to the distribution in the caption and the manuscript at the beginning of section 3, which reads:
“Figure 2 illustrates the distribution of the training data, i.e., the 48-dimensional set of twelve physiological output variables at four points in time predicted by the UTCI-Fiala model [27] for the UTCI reference conditions with air temperature ranging from -55 °C to +55 °C.”
In addition, we state that the data arose as output from simulations performed using the UTCI-Fiala model, e.g., in section 2 ‘Methods and materials’:
“Our approach was to apply selected SL algorithms deeming representative for recent applications [5-11,35-41] to the multi-dimensional data simulated over a comprehensive grid of relevant climatic conditions by the UTCI-Fiala model [27] at stage 2 of the UTCI development (Figure 1).”
3) It’s not enough to evaluate the prediction performance by only using RMSE. It will be more convictive by showing the combination of different performance metrics, such as R-square and RMSE.
In response to this comment, we have added the new supplemental Figure A2 to Appendix A showing that utilizing the squared correlation (R2) and the mean absolute error (MAE) as alternative metrics confirmed the results obtained with RMSE as presented in Figure 3A. We refer to this new Figure expanding the results section 3.1 as follows:
“UTCI and the equivalent temperatures predicted by the diverse algorithms were highly correlated with squared correlation (R2) exceeding 0.95 for any dataset and with any method as shown in Appendix A by the supplemental Figure A2.
[…]
The results for RMSE were confirmed by the corresponding analyses utilizing the squared correlation coefficient (R2) and the mean absolute error (MAE) as alternative performance metrics, as shown in Figure A2.”
4) The regression results in Figure A1 is not fully analyzed. What are the contributions of the regression analysis in Figure A1 for the proposed investigation?
In response to this comment, we have expanded the explanation of Figure A1 in Appendix A, which now reads:
“Figure A1 illustrates the distribution and correlation of air temperature with the twelve physiological responses at four points in time predicted by the UTCI-Fiala model [27] for the combined reference and non-reference datasets, i.e. training and test data, including the individual regression lines. The latter were used by the KDM algorithm [58,59] for ensemble modelling of the equivalent temperature as averaged predictions from the inverted regression lines weighted by the squared correlation coefficients, as discussed in section 4.2.”
5) The title of this manuscript is “ ... compared the expert judgement ...”. However, the expert judgement is only used as the training data for the machine learning algorithms. The comparison between the machine learning and the expert judgement is not clear. It’s suggested to present the detailed comparison or to revise the title of the manuscript.
In response to this comment (and to a related comment from reviewer 2 concerning novelty aspects), we have re-phrased the second para of the introduction referring to ‘expert judgement’, as follows:
“The rising number of SL applications trigger a demand for quantitatively assessing the skills of statistical learning in comparison to results involving expert judgement. Our study aimed to contribute to such an assessment utilizing as a testbed the development of the Universal Thermal Climate Index (UTCI)…”
6) It’s still not clear about the reasons of low accuracy for PCA and t-SNE analysis. It may attract more research interests. It’s suggested to further explain the results to attract the potential readers.
Adhering to this hind, we have expanded the corresponding para in the discussion, which stresses on the different approaches of clustering algorithms looking for patterns in the data, vs limit values derived from external sources involving expert judgement in thermos-physiology and ergonomics, and now reads:
“The dimensionality-reduction results by PCA and t-SNE in Figure 4A both suggested a one-dimensional structure of thermos-physiological strain concurring with the original analyses presented as stage 4 of Figure 1 [25]. In addition, the clustering algorithms were able to identify a trend from extreme heat to extreme cold. However, they did not reliably discriminate between the intermediate categories, with only marginal differences between the diverse algorithms. This demonstrates the discrepancy between clustering algorithms searching for patterns in the data and forming groups of data of comparable size on one hand [57] and the definition of UTCI stress categories based on thermo-physiological knowledge and ergonomics reasoning on the other hand [25,34].”
In addition, we have added a subsection including an outlook, which may further trigger further research:
“4.3 Limitations and outlook
As a limitation concerning generalizability, the database underlying the UTCI development had been generated by a deterministic model of human thermoregulation. Thus, it lacks random variation induced by, e.g., inter-individual or day-to-day variability of the human physiological responses, which might have complicated the analyses of this study. However, the best performing SL algorithms for equivalent temperature prediction, i.e., concerning stage 4 in Figure 1, like random forests and k-nearest neighbors had already been suggested and successfully applied in other fields pertaining to climatic change and thermal stress [35-41].
While the KDM approach as outlined in section 4.2 might facilitate the development of equivalent temperature indices by avoiding the necessity for specifying reference conditions (stage 3 in Figure 1), assessment scales (stage 5 in Figure 1) automatically derived by clustering algorithms may still require expert knowledge for their refinement. Future enhancements at this stage may be possible by including further artificial intelligence tools such as expert systems or large language models working with knowledge databases [62,63].”
7) The conclusions are highly suggested to be revised. For example, the statement “the potential supportive role for AI and the utilized SL algorithms.” is not clear. How is AI used and discussed in the manuscript?
Besides, the conclusions are not fully covered the main research results in the manuscript
We are grateful to this reviewer for pointing out that the manuscript deals with statistical learning (SL) as one pillar of AI application, and not with AI per se. Thus, in response to this comment, we do not longer refer to ‘AI’ in the conclusion. In addition, we now recapitulate the major outcomes in the conclusion section, which reads as:
“In summary, a potential supportive role for the utilized SL algorithms when analyzing high dimensional input in thermal index development can be concluded from their adequate performance in equivalent temperature prediction. On the other hand, the low agreement of the assessment scale defined by the UTCI expert group with clustering-based thermal stress categories suggested that statistical learning algorithms will not (yet) fully replace the knowledgeable expert in biometeorological and inter-disciplinary research and thermal risk assessment.”
Please also refer to our response to the similar last comment of reviewer 2.
In addition, please refer to the detailed information in the track changes attached to our responses.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe work is devoted to the topic of modeling and statistical assessment of weather conditions. This topic may be of interest to a wide range of researchers.
Figure 1 – Scheme is very interesting but hardly could be read. Please, rework the size. Also, highlighted stages can’t be seen clearly.
Abstract should be reworked. First sentence is not suitable.
Also, this sentence is duplicated in line 37-28 and it does not sound well. Please, reformulate it.
What is the novelty of the work? Please, highlight novelty in the text in the Introduction section.
Conclusion section is to short. It must be obtained. It is necessary to clearly summarize all the observations established in the work. It is important to write whether the tasks of the work have been solved and to what extent, as well as whether there are prospects for this topic or it has been exhausted.
In general, the work seems to be holistic and interesting. Results are presented well. The material is presented sequentially. Perhaps the work needs a more detailed discussion of the limitations and disadvantages of the method proposed by the authors.
Author Response
Reviewer 2
The work is devoted to the topic of modeling and statistical assessment of weather conditions. This topic may be of interest to a wide range of researchers.
We very much appreciate that this reviewer recognizes the value of our study, and are gratefully acknowledging the detailed constructive comments, which we have addressed as outlined below.
Figure 1 – Scheme is very interesting but hardly could be read. Please, rework the size. Also, highlighted stages can’t be seen clearly.
Following this hint, we have reworked Figure 1 by enlarging the figure, increasing the text size and changing the color used for highlighting; and we hope that this will increase the legibility of Figure 1.
Abstract should be reworked. First sentence is not suitable. Also, this sentence is duplicated in line 37-28 and it does not sound well. Please, reformulate it.
In response to this comment, we have rephrased the corresponding sentence and re-structured the abstract, which now reads:
“The Universal Thermal Climate Index (UTCI), a complex tool for the assessment of outdoor thermal stress created by an international expert group, is an equivalent temperature index based on the 48-dimensional output of an advanced model of human thermoregulation formed by 12 variables at four consecutive 30-minute intervals, which were calculated for 105,642 thermal conditions from extreme cold to extreme heat. This study assessed the skills of statistical learning (SL) in supplementing or even replacing the expert knowledge inherent to UTCI by comparing the performance of SL algorithms to the results accomplished by an international endeavor involving more than 40 experts from 23 countries, We found that random forests and k-nearest neighbors closely predicted UTCI values, but that clustering applied after dimension reduction algorithms (principal component analysis and t-distributed stochastic neighbor embedding) were inadequate for risk assessment in relation to the UTCI stress categories. This indicates a potential supportive role for SL, as it will not (yet) fully replace the bio-meteorological expert knowledge.”
We have also re-phrased the second para of the introduction, as outlined below in our response to the next comment.
What is the novelty of the work? Please, highlight novelty in the text in the Introduction section.
In response to this comment, we have re-phrased the second paragraph of the introduction focusing on the assessment of SL skills, as announced by the title. The text now reads:
“The rising number of SL applications trigger a demand for quantitatively assessing the skills of statistical learning in comparison to results involving expert judgement. Our study aimed to contribute to such an assessment utilizing as a testbed the development of the Universal Thermal Climate Index (UTCI)…”
Conclusion section is too short. It must be obtained. It is necessary to clearly summarize all the observations established in the work. It is important to write whether the tasks of the work have been solved and to what extent, as well as whether there are prospects for this topic or it has been exhausted.
We appreciate this comment, which is similar to comment (7) of reviewer 1. Thus, we have not only updated the conclusion section accordingly, but have also included a new subsection 4.3 “Limitations and outlook”, which now includes the prospected utility of our findings, where the whole section reads as follows:
“4.3 Limitations and outlook
As a limitation concerning generalizability, the database underlying the UTCI development had been generated by a deterministic model of human thermoregulation. Thus, it lacks random variation induced by, e.g., inter-individual or day-to-day variability of the human physiological responses, which might have complicated the analyses of this study. However, the best performing SL algorithms for equivalent temperature prediction, i.e., concerning stage 4 in Figure 1, like random forests and k-nearest neighbors had already been suggested and successfully applied in other fields pertaining to climatic change and thermal stress [35-41].
While the KDM approach as outlined in section 4.2 might facilitate the development of equivalent temperature indices by avoiding the necessity for specifying reference conditions (stage 3 in Figure 1), assessment scales (stage 5 in Figure 1) automatically derived by clustering algorithms may still require expert knowledge for their refinement. Future enhancements at this stage may be possible by including further artificial intelligence tools such as expert systems or large language models working with knowledge databases [62,63].
- Conclusions
In summary, a potential supportive role for the utilized SL algorithms when analyzing high dimensional input in thermal index development can be concluded from their adequate performance in equivalent temperature prediction. On the other hand, the low agreement of the assessment scale defined by the UTCI expert group with clustering-based thermal stress categories suggested that statistical learning algorithms will not (yet) fully replace the knowledgeable expert in biometeorological and inter-disciplinary research and thermal risk assessment.”
In addition, please refer to the detailed information in the track changes attached to our responses.
Author Response File: Author Response.pdf