Assessment of Thermal Comfort in an Electric Bus Based on Machine Learning Classification
Round 1
Reviewer 1 Report
To start with, I would like to thank authors for their work in terms of interesting topic and well written article.
The paper under my reviewing is devoted to using different Machine Learning approaches for binary classification problem: to predict a group of thermal sensation of bus passengers. Authors collected a dataset with 12 features and 284 objects, fitted 7 ML models (Artificial Neural Network, Random Forest, Adaptive Boosting, k-nearest neighbors, Support Vector Machine, SVM Linear, SVM Polynomnial, SVM Radial Basis Function. checked the F1 metrics, compared with Predicted Mean Vote (PMV) model with a threshold at 0.5 and 1.5.
Positive sides of paper:
· The theme of article is in the scope of journal “Applied Science “
· The research is carried out as accepted in scientific society
· There are all essential sections, methodology is well written and easy to follow the work of authors.
· Abstract is adequate to article content
· Conclusion give the main findings
· References are sufficient and up-to-date
· Material is cohesive and coherence
· Figures are clear
· Used English is at high level, text is easy to read.
In general, I liked the paper from the point of presentation of the material and how all the stages were described. However, I have some concerns:
1. What is the contribution to the field? Is it possible to field implementation? I guess the set binary classification will not enough.
2. Other types of buses can be considered for your approaches? I think such an attention to “electrical bus” is excessive. No used features are unique to “electrical bus”. Please, change the title.
3. It will be positive and useful to provide EDA. Show histograms of values in the context of the target, conduct a correlation analysis.
4. Lines 264-267 “To achieve a better estimation and discard the possibility of obtaining better results based on the data selected for training, the train-test split and fitting was randomly repeated 100 times. Additionally, random seeds are used to ensure that the same train-test splits are employed for all the classifiers.” What means 100 times? Why was it necessary to do train-test split 100 times?
5. Additionally to F1 score, error matrix for best model(s) could be a good presentation of model quality.
6. Figure 4. Please, confirm the correctness of caption text. It seems it is no consistent with the description in line 296
7. No hyperparameters used for models are described. The right choice of them has the largest impact on the quality of prediction.
8. Lines 356-359 “Within the better performing models it can be noticed that in every case the models 356
9. had the least accuracy when using the parameter seе. This might be counter intuitive as one would expect that using more features would lead to more passenger information to use in order to make a better prediction, which is exactly the opposite of what is observed”. To prove your result, conduct EDA, correlation analysis, find multicollinearity if there is. A good choice is also study feature importance to study what parameters have the most influence on prediction. All ML models have such attributes.
10. The last one, please, reduce impropriate self-citation.
Author Response
- What is the contribution to the field? Is it possible to field implementation? I guess the set binary classification will not enough.
The aim of this paper is to provide a first step on the development of ML models for thermal comfort prediction. The contributions to the field lies on the fact that we were able to show that even using a reduced dataset a better performance was achieved than the PMV-PPD model. This is very promising as it can be further developed when larger datasets including winter-measurements are available. With this predictions on the whole ASHRAE scale will be possible.
- Other types of buses can be considered for your approaches? I think such an attention to “electrical bus” is excessive. No used features are unique to “electrical bus”. Please, change the title.
You are right that this research is also relevant to conventional buses. However, as explained in the introduction, our research is especially of importance for e-buses. Moreover, this research have been conducted within the funded e-bus project E-MetroBus. Therefore, we decided to leave the title - It will be positive and useful to provide EDA. Show histograms of values in the context of the target, conduct a correlation analysis.
Thank you very much for your valuable comment. An EDA, histograms, correlation and multicollinearity analysis showed great improvement in our results. In the supplementary material you can find all of the aforementioned topics explained in detail.
- Lines 264-267 “To achieve a better estimation and discard the possibility of obtaining better results based on the data selected for training, the train-test split and fitting was randomly repeated 100 times. Additionally, random seeds are used to ensure that the same train-test splits are employed for all the classifiers.” What means 100 times? Why was it necessary to do train-test split 100 times?
We have rephrased this section. See line 271-281
- Additionally to F1 score, error matrix for best model(s) could be a good presentation of model quality.
The error matrix for the SVMRbf using the subset S2 (the model with the best performance) is now provided in Figure 7.
- Figure 4. Please, confirm the correctness of caption text. It seems it is no consistent with the description in line 296
The caption was improved.
- No hyperparameters used for models are described. The right choice of them has the largest impact on the quality of prediction.
We altered the following paragraph to mention that we use the default hyperparameters from the sciki-learn library. We agree on the importance of hyperparameter tuning which is why it has been included as an important topic within the future development of the project. We believe that with the correct hyperparameter tuning even better performances compared to the PMV-PPD model will be achieved. See line 286.
- Lines 356-359 “Within the better performing models it can be noticed that in every case the models had the least accuracy when using the parameter seе. This might be counter intuitive as one would expect that using more features would lead to more passenger information to use in order to make a better prediction, which is exactly the opposite of what is observed”. To prove your result, conduct EDA, correlation analysis, find multicollinearity if there is. A good choice is also study feature importance to study what parameters have the most influence on prediction. All ML models have such attributes.
Thank you very much for your valuable comment. An EDA, histograms, correlation and multicollinearity analysis showed great improvement in our results. In the supplementary material you can find all of the aforementioned topics explained in detail.
- The last one, please, reduce impropriate self-citation.
One german paper has been removed.
Reviewer 2 Report
This study presents seven different ML models for the prediction of TS in an electric urban bus. The goal is to control the thermal comfort of passengers by minimize the energy consumption. Then, for an accurate prediction of TS the authors propose using three different sets of parameters: the first set comprises five parameters similar to the PMV-PPD model, the second uses only two, and the third uses all parameters available. The data measurements were made in summer for ambient temperatures between 14.70C and 320C in an electric bus in Berlin, Germany, and the thermal comforts of 284 passengers’ assessment were obtained via surveys. Therefore the proposed algorithm arranges the passengers in two groups representing mild and warm TS. To identify the best suited model, seven ML classifiers based on different algorithms are designed, trained, tested and compared, using the three aforementioned input parameter sets. In order to test the accuracy of proposed method a comparison is made with the PMV-PPD model. The obtained simulation results have shown a good accuracy.
The method is well constructed and presented. Also the simulation results seem promising.
Comments to authors:
1. Please provide a list of abbreviation at the beginning of the study.
2. The temperature inside the bus is not specified and how long the ventilation system reaches these values, depending on the outside temperature.
3. In the Abstract section is mentioned “…to identify improved thermal settings to minimize the energy consumption, while guaranteeing good thermal comfort.” So, what is the reduction of energy consumed by the electric bus using the proposed prediction method?
Author Response
- Please provide a list of abbreviation at the beginning of the study.
List of abbreviation has been added.
- The temperature inside the bus is not specified and how long the ventilation system reaches these values, depending on the outside temperature.
The measured parameter air temperature Ta is equal to the temperature inside the bus.
- In the Abstract section is mentioned “…to identify improved thermal settings to minimize the energy consumption, while guaranteeing good thermal comfort.” So, what is the reduction of energy consumed by the electric bus using the proposed prediction method?
This paper focuses on the accurate prediction of the passengers thermal sensation based on measurements and surveys, accurate thermal sensation prediction will enable us in the next step to find improved settings.
Reviewer 3 Report
The manuscript applsci-2581835 is a study assessing the suitability of several ML standard algorithms to predict passenger thermal comfort in electric buses.
The idea is interesting and original and the research methods and data collection are correct.
There are several criticism points that the authors must address:
1. It is recommended to avoid undergraduate textbook information, such as the paragraph on page 8, lines 278-283. Definitions of precision, recall, and f1-score are well known and it is not necessary to be included. Please check this throughout the manuscript.
2. The authors observed correctly that the S3 test has the worst performance. S1 and S2 do not include at all any physiological features of the subjects. It is obvious that this is not correct since it is well known that subjects report different levels of thermal comfort depending on their age, height, sex, etc. That is why in theory, S3 should produce the best results, which does not happen in reality. It is no surprise though, since feature engineering on S3 is probably not correct (in fact there is no mention of this important matter other than MinMax scaling). It is not discussed how the features in S3 were treated: for example, the gender variable should have been treated as a categorical variable and encoded accordingly; the age variable is not expected to influence considerably the thermal comfort sensation so it is pointless to treat age as an integer (or number). It is recommended to define several age intervals over the min-max age range (e.g. 16-20, 21-25, 26-30 or even larger intervals, such as 10 years). This will convert the number (integer, float, etc.) variable into a categorical variable, which can be one-hot encoded. It is worth testing what happens if BMI is used instead of height and weight (BMI value is probably not relevant, so a similar treatment with the one described for the age variable might be required).
3. There is a significant amount of class imbalance (182/102). Was stratification maintained when the train and test set were randomly drawn? This could be a cause of the low f1-score. A confusion matrix (for any of the 100 train/test pairs) for each algorithm would reveal what is the class that is predominantly misclassified and could suggest actions to fix it.
4. It would be interesting to apply oversampling (e.g. SMOTE) to understand if the class imbalance has a significant effect.
5. Please consider including ROC curves and reporting AUC. It could be very interesting also to present some P-R curves.
Author Response
- It is recommended to avoid undergraduate textbook information, such as the paragraph on page 8, lines 278-283. Definitions of precision, recall, and f1-score are well known and it is not necessary to be included. Please check this throughout the manuscript.
We believe that the readers of this paper, interested in thermal comfort in buses, are not all experts in machine learning. Therefore, we think it is helpful to give more basics in the paper.
- The authors observed correctly that the S3 test has the worst performance. S1 and S2 do not include at all any physiological features of the subjects. It is obvious that this is not correct since it is well known that subjects report different levels of thermal comfort depending on their age, height, sex, etc. That is why in theory, S3 should produce the best results, which does not happen in reality. It is no surprise though, since feature engineering on S3 is probably not correct (in fact there is no mention of this important matter other than MinMax scaling). It is not discussed how the features in S3 were treated: for example, the gender variable should have been treated as a categorical variable and encoded accordingly; the age variable is not expected to influence considerably the thermal comfort sensation so it is pointless to treat age as an integer (or number). It is recommended to define several age intervals over the min-max age range (e.g. 16-20, 21-25, 26-30 or even larger intervals, such as 10 years). This will convert the number (integer, float, etc.) variable into a categorical variable, which can be one-hot encoded. It is worth testing what happens if BMI is used instead of height and weight (BMI value is probably not relevant, so a similar treatment with the one described for the age variable might be required).
Thank you for your valuable comment, in the supplementary analysis you can find a detailed explanation were we follow your recommendation and indeed achieve a better performance of the subset S3.
- There is a significant amount of class imbalance (182/102). Was stratification maintained when the train and test set were randomly drawn? This could be a cause of the low f1-score. A confusion matrix (for any of the 100 train/test pairs) for each algorithm would reveal what is the class that is predominantly misclassified and could suggest actions to fix it.
Stratification was not maintained at the original paper. This has been added to this reviewed submission which also provided an improvement on the performances, particularly on the reduction of the IQR. We thank you for this valuable point of improvement.
- It would be interesting to apply oversampling (e.g. SMOTE) to understand if the class imbalance has a significant effect.
In the supplementary material we show the results of applying smote, which showed no relevant difference, besides negatively affecting the ANN.
- Please consider including ROC curves and reporting AUC. It could be very interesting also to present some P-R curves.
All the mentioned curves and performances can be found in supplementary material.
Round 2
Reviewer 1 Report
Dear authors, thank you for the revision - I see you did much, but next time , please, leave changes in manuscript highlighted.
Anyway, I have already said that the paper satisfied me from the point of presentation of the material and how all the stages were described. Although, I do think that EDA should be in the main manuscript and hyperparameters should be set in this manuscript, I do not see other serious problems for publication. Good luck with your E-MetroBus project.
Author Response
Dear reviewer, thanks again for your valuable support. We included EDA in the main manuscript.
Reviewer 3 Report
The authors followed the recommendations and improved the manuscript
Author Response
Dear reviewer, thanks again for your valuable support.