Next Article in Journal
Effects of Dry Density and Moisture Content on the Kaolin–Brass Interfacial Shear Adhesion
Previous Article in Journal
Oryctolagus Cuniculus Algorithm and Its Application in the Inversion Method of Asteroid Spectra Reflectance Template
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Assessment of Thermal Comfort in an Electric Bus Based on Machine Learning Classification

Methods for Product Development and Mechatronics, Technische Universität Berlin, Straße des 17. Juni 135, 10623 Berlin, Germany
*
Author to whom correspondence should be addressed.
Appl. Sci. 2023, 13(20), 11190; https://doi.org/10.3390/app132011190
Submission received: 12 August 2023 / Revised: 27 September 2023 / Accepted: 2 October 2023 / Published: 11 October 2023

Abstract

:
In electric buses, heating, ventilation and air conditioning are responsible for up to 50% of the energy consumption. It is therefore necessary to identify improved thermal settings to minimize the energy consumption, while guaranteeing good thermal comfort. Hence, an accurate prediction of the passengers’ thermal sensation (TS) is needed. One of the most widely used models for TS prediction is the PMV-PPD model, which has been shown to provide reliable results in uniform, steady-state climatic conditions. Since these are not present in an urban bus, the accuracy of the PMV-PPD model diminishes. Additionally, some of the parameters needed are difficult to obtain (i.e., clothing insulation). This paper presents seven different machine learning models (ML) for the prediction of TS using three different sets of parameters. The first set comprises five parameters similar to the PMV-PPD model, the second uses only two, and the third uses all parameters available. To obtain the necessary data, climatic measurements in an electric bus in Berlin, Germany, were made. These measurements were performed in summer for ambient temperatures between 14.7 °C and 32.0 °C. Person-related information as well as the thermal comfort assessment were obtained via surveys. Despite the relatively small data set, four of our seven ML models performed well with a median accuracy between 70.3% and 69.4%. This could also be observed when using only two parameters. Hence, the efforts to gain experimental data can be reduced significantly. For the PMV-PPD model, a median shift of +1 was observed for mild and warm TS. The median accuracy rises from 48.8% without shift to 68.8% with shift.

1. Introduction

In electric vehicles (EVs), the heating, ventilation and air conditioning (HVAC) system is responsible for a significant proportion of the overall energy consumption. For example, a recent study in an EV showed that up to 18% of the battery capacity was allocated to cooling and heating [1]. This consumption can be even larger in electric buses (e-buses), where the operation of the HVAC system can require up to 50% of the available energy [2,3]. In turn, this typically results in a sharp reduction of the available travel range [3], which can in turn greatly affect the autonomy of e-buses and lead to large operational costs due to the increased charging frequency.
Additionally, this high energy consumption does not necessarily result in an improved passengers’ satisfaction, as the HVAC system is often operated at higher levels than what is necessary for the thermal comfort of the passengers. This is shown by recent on-road studies in e-buses, which were performed in Amsterdam [4] and Berlin [5] for mean external temperatures of 13.4 °C and 5.3–7.8 °C, respectively. In both cases, the results of the performed climatic measurements and of the passengers’ surveys indicate that the thermal settings of the HVAC system can be decreased, respectively, up to 1.4 K [4] and 2.5 K [5] with respect to the standard set temperature without significantly affecting the thermal well-being. These findings show the need for a revision of these thermal settings especially in e-buses with the goal of improving the overall energy efficiency, while guaranteeing a good thermal comfort to the passengers. This can be achieved by modeling the dependency of human thermal sensation on the climatic conditions inside the bus cabin, thereby enabling a prediction of passengers’ thermal comfort.
Thermophysiological models [6] are widely employed to assess human thermal comfort in closed environments, such as buildings [7] and vehicles [8,9]. Most of these models operate by computing the heat exchange between the human body and its surroundings based on climatic (e.g., air temperature, relative humidity and velocity) and personal parameters (e.g., metabolic rate, height, weight and thermal insulation of clothing) [6]. Among the several thermal comfort models developed since the early 1970s, the most widely employed is the PMV-PPD model [6,10,11]. This model enables us to compute the comfort of a group of subjects by means of two parameters, namely the predicted mean vote (PMV) and the predicted percentage of dissatisfied (PPD). Whereas the first parameter predicts the average thermal sensation based on the ASHRAE scale for the evaluation of passengers’ thermal sensation (TS), the second estimates the percentage of test subjects who experience the environment as thermally unpleasant. The PMV-PPD model is known to provide reliable results in uniform, steady-state climatic conditions (such as those encountered in a climatic chamber). However, the prediction accuracy can decrease significantly in more realistic environments characterized by dynamic conditions. For example, an analysis performed by applying the PMV-PPD model to the ASHRAE Global Thermal Comfort Database II has shown an accuracy of only 34% and a mean absolute error of one unit on the thermal sensation scale [12]. The prediction performance may be even lower for highly dynamic environments such as those encountered in buses. In this case, the climatic conditions can vary greatly depending on, e.g., the proximity to entrance doors or the height above the ground. Additional variations can ensue with time due to changes in outer temperature, solar radiation through the windows and external air flow through the doors [3,5,13]. To better capture these phenomena, dynamic thermophysiological models, such as the two-nodes model [3,8,9,14] or local models [15,16], are sometimes used. However, the accuracy of such models for bus cabins is unknown due to the limited validation by comparison with empirical results from, e.g., passengers’ surveys.
Further disadvantages of thermophysiological models include the high computational complexity, which increases when time dependency or local effects are considered. Finally, large sets of climatic and personal parameters are typically required as an input for the computation [10,14]. This may constitute a significant challenge in cases where only a limited set of data is available from measurements.
Because of the aforementioned limitations of thermophysiological models, recent studies have focused on the development of data-driven models based on machine learning (ML) to achieve a more accurate prediction of thermal comfort [17]. These models have demonstrated high performance for the assessment of occupants’ thermal well-being in buildings, showing up to 74% higher accuracy when compared to the PMV-PPD model [18]. An additional advantage of ML is its ability to consider any set of input parameters for modeling, thereby enabling us to also include parameters not considered by conventional thermophysiological models (e.g., sex, age and external air temperature) or use less parameters than the thermophysiological models. Because of these advantages, ML has recently been attracting attention to assess thermal comfort in vehicles, such as cars and buses. In [19], for example, three distinct ML models were tested to analyze thermal comfort in passenger cars, showing a high accuracy of up to 96%. However, in this case, the measured body temperature was adopted as an indicator for thermal comfort, thereby ignoring other individual factors and personal preferences that can affect the perception of the surrounding climatic conditions. Velt and Daanen employed passengers’ surveys to empirically evaluate the thermal well-being of passengers in a bus cabin [4]. The survey data were then combined with measured climatic parameters and employed in a multiple regression analysis for assessment of thermal comfort. However, the modeling performance remains unknown, as no information was given about the accuracy. To overcome the limitations of the aforementioned studies, ML is used in this paper to assess passengers’ thermal comfort in an e-bus. For this purpose, a data set obtained from climatic measurements in an e-bus in Berlin, Germany [5] containing both climatic parameters and personal information of 278 passengers is used for modeling. Because of the limited number of input data and the discrete nature of the declared TS, passengers were arranged in two groups representing mild and warm thermal sensation. Classification Models [20] are then used to classify passengers into the aforementioned groups. To identify the best suited model, seven ML classifiers based on different algorithms are designed, trained, tested and compared, using three different input parameter sets of the Berlin data set. Additionally, a further comparison is made with the PMV-PPD model to test the accuracy in respect to thermophysiological models. This paper is organized as follows: in Section 2, the climatic measurements, the data sets, and used models are described. The results of the classifiers and the PMV model are presented in Section 3 and discussed in Section 4. The conclusions are then presented in Section 5.

2. Materials and Methods

In this section, the climatic measurements are first described in Section 2.1. In Section 2.2, the data obtained from the measurements is described and grouped. The baseline PMV-PPD model is explained in Section 2.3. In Section 2.4, it is explained how the PMV value is compared to thermal sensation. The seven ML models implemented in this paper are presented in Section 2.5.

2.1. Climatic Measurements

This section presents the climatic measurements and the data sets that were the basis for the following assessment. The measurements were performed in a Solaris Urbino 18 electric bus (Solaris Bus & Coach, Bolechowo, Poland). This is an 18 m long, single-decker articulated bus with two wagons connected by a pivoting joint, as shown by the layout in Figure 1.
Both the climatic conditions in the bus cabin and the person-related information of the passengers were investigated during the measurements. Whereas the climatic conditions were measured by eight measurement stations installed in various positions inside the bus cabin (numbered squares in Figure 1), the person related information was gathered via surveys, which were directly submitted to the passengers during the bus ride. Additionally, the ambient air temperature was measured by dedicated sensors integrated directly in the bus and recorded via the CAN BUS in an in-built data logger (ViriCiti, ChargePoint Germany GmbH, Munich, Germany). The resulting data set is shown in Table 1. Notice that the position of the passengers in the bus cabin was determined by dividing the bus in six sectors, as shown in Figure 1. This has been carried out in order to account for air temperature variations within the bus (see also [5]). Additionally, the thermal comfort of the passengers was assessed by the passengers via the TS [6]. This parameter is used to express the feeling of heat or cold and is quantified via the 7-points ASHRAE scale [6] shown in Table 2.
Overall, three measurement sets (MS) with climatic parameters and personal information were performed on three different days on the BVG bus line 200 in the city of Berlin, Germany (see Table 3). The measurements were all performed in summer with mean ambient air temperatures between 14.7 °C and 32.0 °C.
Additional details about the measurement methodology can be found in [5].

2.2. Data Set

Overall, 329 personal surveys were collected during the climatic measurements. Figure 2 shows the number of surveys for each value of TS and measurement set. Notice that for the majority of the passengers, TS lied between 0 and 2, as very few passengers perceived the environment as “cold” (TS = −3) to “slightly cool” (TS = −1) or “hot” (TS = 3). This results in a low number of data points for each value of TS, which can reduce the modeling accuracy via ML [21]. To overcome this challenge, for this paper only the surveys with TS between 0 and 3 have been considered (284). After data cleansing, 278 surveys are grouped based on the value of TS [22]. Therefore, two groups were created as follows:
  • Group 1: Mild thermal sensation.
    •    Value of TS is 0 or 1 (178 surveys)
  • Group 2: Warm thermal sensation.
    •    Value of TS is 2 or 3 (100 surveys)
For modeling, both personal parameters from the surveys and climatic parameters obtained from the measurement stations were considered. To include both information in the data set, the mean values of T a , R H , V a and T r were obtained for each measurement and each measurement station. Then, these values were associated with the surveys based on the position of the passengers in the bus. Additional details about the surveys can be found in [5].
To obtain a better understanding in the resulting data set, an exploratory data analysis (EDA) was carried out on all the parameters found in Table 1, with the exception of p o s . Because p o s was already used to associate climatic parameters and personal information. To begin with, the new parameter body mass index (BMI) was calculated from the weight and height using the formula B M I = w e i g h t h e i g h t 2 . Height and weight were then substituted by the BMI calculation. Afterwards, unfeasible data were eliminated. We proceeded with a correlation and multicollinearity analysis. The resulting correlation matrix (Figure 3), reveals a strong correlation (> | 0.7 | ) between the parameters: T a , R H , T r and T a , o u t .
To analyze the multicollinearity, we used the variance inflation factor (VIF), which can be seen in Table 4. The parameters T a , T r and T a , o u t showed VIF values higher than the recommended value 5. Relying on the assumption that the average temperature surrounding a passenger is more closely related to their thermal comfort compared to the other mentioned parameters, we decided to keep the average temperature T a and discard the other values with higher VIF than 5. Even though R H has a high correlation to T a (−0.87), see Figure 3, we decided to keep it as its VIF is below 5 and domain knowledge has shown that the humidity plays an important role in the thermal sensation. The resulting VIF values are found in Table 4, where it can be seen that all values lie beneath the 5 VIF threshold. A pairplot of the resulting dataset is represented in Figure 4.
In order to compare the PMV-PPD model with the ML models, three different parameter sets were employed. The parameter set S 1 was used to provide a fair comparison to the PMV-PPD model [10,23,24]. To investigate the flexibility of the employed ML classifiers, two further parameter sets S 2 and S 3 were also created. S 2 aims to describe the predictive capabilities of the ML models using measurements that are easy to obtain, whereas S 3 uses all the parameters that were shown to be relevant after the EDA.
  • Parameter set S 1 . This parameter set only includes the parameters required for modeling using the PMV-PPD model [10,23,24]. This set was chosen to ensure comparability between the PMV-PPD model and the ML models, and is defined as follows:
    S 1 = [ T a , R H , V a , T r , I c l ]
  • Parameter set S 2 . This parameter set includes only the air temperature and humidity, which are easy to obtain within a measurement campaign:
    S 2 = [ T a , R H ]
  • Parameter set S 3 . This parameter set is the result of the EDA. Different tests regarding the encoding of a g e and B M I were carried out. Particularly one-hot encoding and ordinal encoding. The results showed that the best accuracy was achieved by one-hot encoding the s e x parameter into M a l e and F e m a l e , and by leaving the a g e and B M I at their original numerical values. For further details, see Supplementary Materials.
    S 3 = [ T a , R H , V a , I c l , a g e , s e x , B M I ]

2.3. PMV-PPD Model

The PMV-PPD Model developed by Povl Ole Fanger, which is also commonly referred to as the Fanger Model and from which the Norm ISO 7730 [11] was developed, states that the thermal comfort of the body is dependent on the heat balance it encounters with its environment. Which is why, with the knowledge of a persons physical activity, clothing insulation and environmental parameters such as air temperature, mean radiant temperature, air velocity and relative air humidity, the thermal comfort can be calculated. The PMV-PPD Model, uses this information to calculate the predictive mean vote PMV of a large group and can be obtained as follows:
P M V = f ( T a , R H , V a , T r , I c l )
A detailed description of the PMV calculation can be found in ISO 7730 [11]. In this paper, the python package pythermalcomfort [24] was used to calculate the PMV values from the data described in Section 2.2, the metabolic rate is set to 1.0, as the passengers are assumed to be in a “relaxed state” [11].

2.4. Comparison of PMV-PPD Model and Thermal Sensation

To compare the ML classifiers with the PMV-PPD model, the PMV is employed as parameter. This can be accomplished because PMV predicts the thermal sensation on the ASHRAE scale (see Table 2). The difference to TS is that PMV is a rationale number while TS is an integer. Since only the PMV value is used, in the following, the PMV-PPD model is referred to as PMV model. In order to assign a passenger to a group based on the PMV values as explained in Section 2.2, a threshold needs to be defined. Based on the grouping described in Section 2.2, one could use 1.5 as a threshold, as it lies exactly between the 2 values that were used to divide the groups. Passengers with a PMV value equal or above 1.5 will be rounded up to 2 and are placed in group 2 while the remaining passengers are placed in group 1.
According to [12], “PMV had a mean absolute error of one unit on the thermal sensation scale” when applied in more realistic environments characterized by dynamic conditions. Hence, Figure 5 shows the distribution of TS, which was given by the passengers in the survey, and compares it with the distribution of the calculated PMV values. A clear leftward shift can be seen. This means, the predicted thermal sensation of the PMV model is generally colder than the actual thermal sensation from the surveys obtained in warm conditions. Hence, in order to match TS given by the passenger, a shift needs to be added to the calculated PMV.
In order to better understand this shift behavior, the shift of each passenger was analyzed and calculated as
s h i f t = T S P M V
Equation (2) is equivalent to
T S = P M V + s h i f t
The analysis of the shift behavior helps identify the threshold in order to better match passengers’ TS and calculated PMV.

2.5. Machine Learning Model Design

In this section, the seven ML classifiers and the evaluation metrics are presented.

2.5.1. Machine Learning Classifiers

In this paper, ML classification models are employed to assign passengers either to group 1 or 2 depending on the parameter set. To identify the best suited model for the application, some of the most typically used classifiers for thermal comfort are compared [17]. Here, these ML models are shortly described.

Artificial Neural Networks (ANNs)

ANNs consist of multiple layers of processing units, typically known as neurons, which are connected to each other through weighted channels [25]. Each neuron uses an activation function to determine the output depending on the signal received from the neurons in the previous layer. ANNs are commonly employed in several applications, ranging from facial recognition [26] to weather forecasting [27] and prediction of thermal comfort [28,29].
In this paper, an ANN classifier is developed. Additionally, grid search is used to determine the optimal hidden size and number of layers using the benchmark parameter set S 1 , which are 10 and 3, respectively. Finally, a Rectified Linear Unit [30] was used as an activation function for each layer of the ANN, with the exception of the last one that used a sigmoid function. The ANN is trained for 100 epochs using an Adam optimizer with a learning rate of 0.01.

Ensemble Learning (ENL)

ENL is an approach that relies on the integrated use of multiple low-accuracy learning models for class prediction [31]. This allows us to improve the performance and robustness in respect to the single constituting models.
Among the most commonly used ENL models are Random Forest (RF) [32] and Adaptive Boosting (AdaBoost) [33]. Both RF [34] and AdaBoost [35] are widely used for the prediction of thermal comfort and are therefore applied in this paper:
  • RF operates by fitting several decision tree classifiers on various distinct sub-samples of the data set. The majority vote of the decision trees is then used to compute the final classifications, thereby enabling us to greatly reduce both the variance and the sensitivity to the training data.
  • AdaBoost is based on Boosting, i.e., the iterative creation of models that rectify the mistakes of the previous ones [36]. AdaBoost operates by first assigning equal weights to all data points in the employed data set. Subsequently, the misclassified data points are assigned a higher value and a second model is trained. The weight adjustment is then repeated. The whole process is iteratively performed and all the resulting models are used for the final classification.

k-Nearest Neighbors (kNN)

kNN classification operates by comparing and assigning a data point to the same class as the majority of its k nearest neighboring points in the data set [37]. Typical applications of kNN include finance (e.g., forecasting of stock market movements [38]), text processing [39] and thermal comfort modeling [40].

Support Vector Machine (SVM)

SVM operates by identifying the hyperplane that best divides the parameter space into two classes [41]. This identification is performed by maximizing the distance (known as margin) between the data points contained in each class and the hyperplane itself. The data points closest to the hyperplane are known as support vectors.
For some data sets, the data points cannot be linearly separated within the parameter space. In this case, a nonlinear transformation can be used to project the data points into a higher dimensional plane, where they can easily be separated with a hyperplane. As identifying these transformations can require very high computational power, specific functions known as kernels are typically used to determine the shape of the hyperplane [42].
Due to its high performance and simplicity of use, SVM finds application in several fields, such as image processing, text recognition [41] and modeling of thermal sensation in buildings [43]. In this paper, a SVM classifier is implemented by using three different kernels: Linear, Polynomial (Poly) and Radial Basis Function (Rbf).

2.5.2. Implementation and Accuracy Metrics

To evaluate the classifiers mentioned in Section 2.5.1, the method called repeated random sub-sampling was used, where 70% of the data set was randomly selected as the train set and the other 30% was used as the test set while keeping stratification. Whereas the train set is used to build the models, the test set is employed to evaluate its accuracy by testing it with unseen data. For each classifier, the train set was used to fit a Min-Max Scaler [44], which was then employed to normalize all parameters on the same scale for both the train and the test set. Repeated random sub-sampling carries this random test split procedure, in our case for 100 times, which provides 100 scores for each model, which will then be used to measure the overall performance of the model. This is particularly useful for handling a small dataset (such as ours) as it also provides an insight on the variability of the scores, it reduces bias results as a single high or low score, which might come from lucky sampling, is not strongly reflected in the overall performance metric. Additionally, random seeds are used to ensure that the same train–test splits are employed for all the classifiers.
The ANN was implemented using the Python library PyTorch, all the other classifiers were trained using the Python library scikit-learn using its default hyperparameters. All of the methods used balanced class weights to help counteract the effect that the imbalance of the data set might have on the performance, with the exception of AdaBoost and kNN, whose optimizations are not compatible with the balanced class weight strategy.
To evaluate the accuracy of the results, the F 1 score [45] is employed. F 1 score was chosen over traditional accuracy (percentage of correctly classified data points) because it is a better metric for imbalanced data sets, such as the present one. Its main advantage is that it considers both Precision and Recall, which is important when not only correctly classified but also miss-classifications are of interest. In the supplementary material precision-recall curves and receiver operating characteristic curves results can be found.
For the given data set, positively classified data points are understood as data points that were classified in group 2 from the ML Model, while negative classification refers to the data points classified in group 1. Accordingly, True and False classification refers to the data points that were correctly/incorrectly classified. As an example, a False Positive Classification describes a data point that is wrongly (false) classified as belonging to group 2 (positive).
Precision is defined as the percentage of positive classifications that are actually correct, compared to all the other positive classifications (regardless of true or false). This is computed as
P r e c i s i o n = T r u e P o s i t i v e s T r u e P o s i t i v e s + F a l s e P o s i t i v e s
Recall (also known as Sensitivity) is the percentage of true positives compared to all the actual data points belonging to group 2 (positive) and is computed as
R e c a l l = T r u e P o s i t i v e s T r u e P o s i t i v e s + F a l s e N e g a t i v e s
The F 1 score is calculated as the harmonic mean of the Precision and Recall as shown in Equation (6):
F 1 = 2 · P r e c i s i o n · R e c a l l P r e c i s i o n + R e c a l l

3. Results

This section shows the result of the shift analysis, the accuracy of the different ML models using parameter sets S 1 , S 2 and S 3 .

3.1. Shift Analysis

Based on Equation (2), the shift between passengers’ TS and PMV were calculated. Since this paper focuses on TS greater than 0, Figure 6 shows the shift analysis for these values.
A mean shift value of +1 is seen. In this case, we have to add 1 to the PMV value in order to match with the TS values given by the passengers. This also means that, based on the measured climatic conditions and clothing insulation of passengers, the PMV model predicts values closer to zero, underestimating passengers experiencing mild to warm sensations.
The shift analysis reveals, in order to sort the passengers in group 1 or group 2, the mean shift of +1 needs to be considered resulting in an adjusted threshold for the PMV model of 0.5, instead of the 1.5 threshold proposed in Section 2.4.

3.2. Results of Parameter Sets

Figure 7, Figure 8 and Figure 9 show the results of each model using the parameter set S 1 , S 2 and S 3 , respectively. The results are displayed as box plots representing the 100 F 1 scores (accuracies) achieved from the 100 tests made to the models. The interquantile range (IQR) is the difference between the 75th and 25th percentile, representing the middle 50% of the accuracies obtained from the 100 tests.

3.2.1. Parameter Set S 1

The box plots in Figure 7 show the accuracy of the 7 models and the PMV model with a threshold at 0.5 and 1.5. The result of PMV 0.5 and PMV 1.5 show clearly that a shift needs to be taken into account. With a median shift of +1, the median accuracy rises from 48.89% (PMV 1.5) to 68.75% (PMV 0.5).
All ML models have a similar IQR that ranges for the maximal value of 9.48% for PMV 1.5 to 5.76% for SVMPoly. Compared to the PMV 0.5 model, all SVM models and the ANN had a similar or slightly better performance. SVM with Linear Kernel showed the highest median accuracy of 69.44%, which is 0.69% higher than the one achieved by the PMV 0.5 model.

3.2.2. Parameter Set S 2

It can be seen in Figure 8 that using parameter set S 2 with only two input parameters gave similar results with similar IQR ranges and similar median accuracy compared to Figure 7. The IQR of the models range from 9.67% for AdaBoost to 5.41% for ANN. Using the subset S 2 showed relatively high scores. All the S V M models showed higher scores than the 68.75% of the PMV 0.5 Model. S V M R b f had the highest score overall with 70.42%, the confusion matrix of the 100 test sets can be seen in Figure 10, followed by A N N with 70.27%. S V M P o l y had a score of 70.08% and lastly S V M L i n e a r had the same score with the P M V 0.5 model.

3.2.3. Parameter Set S 3

Using the parameter set S 3 gave, as well as by the other parameter sets, a similar range of IQR within the models and similar median accuracy. This can be seen in Figure 9. The maximal IQR is 9.52% for AdaBoost and the minimal is 5.24% for SVM with Poly Kernel.
Even though the parameter set S 3 contained the most passenger information, it appeared to deliver the least accuracy of the three parameter sets with the best model being SVM with Linear Kernel achieving a median accuracy of 69.31%.

4. Discussion

4.1. Shift Analysis

In Section 2.2, the reasoning and implementation behind the mapping of TS to groups 1 and 2 were thoroughly described. In order to compare the performances of the different ML algorithms with the baseline, the PMV model, it was necessary to also map the prediction of PMV to one of the groups. This was also described in Section 2.4. With the help of the shift analysis in Section 3.1, it was explained that for mild and warm thermal sensations a mean shift of 1 needs to be considered. Hence, data-points with PMV values equal or greater to the threshold value of 0.5 were assigned to group 2 and the rest to group 1. This already gave positive results as seen in Figure 7 and is explained in Section 3. A reason for this may be that PMV is better suited for static environments where people have been for longer periods of time [11]. This would mean that in buses the surveyed passengers did not have enough time to acclimatize to their surrounding. This could explain why 0.5 is a better threshold to map PMV into group predictions than the more mathematically logical threshold value of 1.5.

4.2. Comparison of Parameter Sets

In Figure 11, the median score of every model is shown as well as the parameter set used. As a baseline, a red dashed line represents the median accuracy obtained by the PMV model with a threshold at 0.5 and a red dotted line for the PMV model with a threshold at 1.5. It can clearly be seen that with machine learning the ANN and the SVM models showed a similar accuracy compared to the PMV 0.5 model.
Within the better performing models, it can be clearly noticed that the subset S 2 had the best accuracy, achieving up to an improvement of 1.67%. This might be counter intuitive as one would expect that using more parameters would lead to more passenger information to use in order to make a better prediction, which is exactly the opposite of what is observed.
This could be explained by what is called the curse of dimensionality, which is particularly relevant for a small dataset as ours. The idea behind the curse of dimensionality is that with an increasing number of parameters, the volume of the parameter space grows exponentially, which makes the available data sparser within this higher dimensional space [46]. Also, increasing the parameter space dimension increases the degrees of freedom a classifier can set its boundaries to, and without enough data this could lead to over-fitting [46].
The fact that high accuracy can be achieved, even without hyperparameter tuning using the parameter set with the least parameters is of mayor significance. This means that in order to predict the TS of a passenger it would suffice to know the air temperature T a and the relative humidity R H in the cabin, which are easier and more reliable measurements to take, especially when compared to the additional radiant temperature T r , air velocity V a and clothing insulation I c l needed for the PMV model. This would also drastically simplify data collection to create larger datasets needed to improve future ML models. Another reason for our positive results could be related to the fact that we simplified the task into a binary classification problem, because of our small dataset. It may be possible that subsets with higher dimensions could be better when multi-labeled classification is performed, for which larger datasets with representative TS across the whole ASHRAE scale would be required.

5. Conclusions and Next Steps

As cities start to electrify their public transport systems as a measure to tackle climate change and air pollution, a more efficient way to operate these becomes of mayor importance. For this, HVAC system is of high relevance, as it may use up to 50% of the battery capacity. This leads to the question of efficiently conditioning bus cabin environment without compromising the comfort of the passengers. Until recently, the most used model to predict thermal comfort was the PMV-PPD model, that has not only been shown to have low accuracies while predicting thermal comfort, but requires measurements that might not be easy to obtain, such as clothing insulation. In this paper, 329 personal surveys were collected during climatic measurements on an electric bus in Berlin, Germany and 278 of them were used for this assessment. Seven of the most typically used ML models for thermal comfort prediction were trained and tested 100 times with repeated random sub-sampling, using three different parameter sets of the dataset. The F 1 score was used as an accuracy metric. The results were then compared to the PMV-PPD model.
The comparison showed that it is possible to achieve similar or better results compared to the PMV-PPD model using machine learning, if the PMV value is adjusted by +1. It is even possible to obtain better results with only two input parameters ( T a and R H ) compared to the five that are needed for PMV-PPD model, even using a very small set of 278 data points for input. This would drastically simplify the measurement of thermal comfort in the bus cabin. To further prove this requires a crucial next step, which will be the collection of a larger dataset that includes measurements throughout the year (in cold and warm environment), from which a multi-label classification of the whole ASHRAE scale could be achieved.
From the shift analysis in Section 3.1, it can be concluded that the PMV-PPD model tends to underestimate extreme TS and instead predicts values closer to neutral TS. Further measurements will also be needed to corroborate if the PMV shift toward neutral values that were found in the data set, is related to the time the passengers spent in the bus. For this, two surveys could be filled, one shortly after entering the bus and another after spending some time. If the shift is reduced in the second survey, it could be concluded that time and acclimatization are of importance when calculating PMV values.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app132011190/s1. S 3 encoding selection, Precision-Recall Curves, Receiver Operating Characteristic Curves as requested by the reviewers.

Author Contributions

Conceptualization, T.-A.F., F.C. and A.S.A.; methodology, T.-A.F., F.C. and D.G.; software, A.S.A. and F.C.; validation, T.-A.F., F.C. and A.S.A.; formal Analysis, T.-A.F. and A.S.A.; investigation, A.S.A.; resources, D.G.; data curation, F.C. and A.S.A.; writing—original draft, A.S.A., T.-A.F. and F.C.; writing—review and editing, T.-A.F., A.S.A., F.C. and D.G.; visualization, A.S.A.; supervision, T.-A.F.; project administration, T.-A.F. and F.C.; funding acquisition, D.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Federal Ministry of Transportation and Digital Infrastructure, Project “E-Metro-Bus”, grant number 3EMF0105B. https://e-metrobus.berlin/ (accessed on 2 October 2023). We acknowledge support by the German Research Foundation and the Open Access Publication Fund of TU Berlin.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Acknowledgments

The authors wish to thank the partners from the Berlin public transport operator (Berliner Verkehrsbetriebe, BVG) and Konvekta AG for the inspiring discussions and their valuable support.

Conflicts of Interest

The authors declare no conflict of interest.

Nomenclature

AdaBoostAdaptive Boosting
AIArtificial Intelligence
ANNArtificial Neural Network
BMIBody Mass Index
e-busesElectric Buses
ENLEnsemble Learning
EVElectrical Vehicle
HVACHeating, Ventilation and Air Conditioning
IQRInterquantile Range
kNNK-Nearest Neighbors
MLMachine Learning
MSMeasurement Set
PMVPredictive Mean Vote
PolyPolynomial
PPDPredictive Percentage of Dissatisfied
RbfRadial Basis Function
SVMSupport Vector Machine
TSThermal Sensation
VIFVariance Inflation Factor

References

  1. Doyle, A.; Muneer, T. Energy consumption and modelling of the climate control system in the electric vehicle. Energy Explor. Exploit. 2018, 37, 014459871880645. [Google Scholar] [CrossRef]
  2. Göhlich, D.; Fay, T.A.; Jefferies, D.; Lauth, E.; Kunith, A.; Zhang, X. Design of urban electric bus systems. Des. Sci. 2018, 4, e15. [Google Scholar] [CrossRef]
  3. Cigarini, F.; Fay, T.A.; Artemenko, N.; Göhlich, D. Modeling and Experimental Investigation of Thermal Comfort and Energy Consumption in a Battery Electric Bus. World Electr. Veh. J. 2021, 12, 7. [Google Scholar] [CrossRef]
  4. Velt, K.B.; Daanen, H.A.M. Optimal bus temperature for thermal comfort during a cool day. Appl. Ergon. 2017, 62, 72–76. [Google Scholar] [CrossRef]
  5. Cigarini, F.; Schminkel, P.; Sonnekalb, M.; Best, P.; Göhlich, D. Determination of improved climatic conditions for thermal comfort and energy efficiency in electric buses. Appl. Ergon. 2022, 105, 103856. [Google Scholar] [CrossRef]
  6. American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE). ASHRAE Handbook Fundamentals; American Society of Heating, Refrigerating and Air–Conditioning Engineers, Inc.: Atlanta, GA, USA, 2017. [Google Scholar]
  7. Zhao, Q.; Lian, Z.; Lai, D. Thermal comfort models and their developments: A review. Energy Built Environ. 2021, 2, 21–33. [Google Scholar] [CrossRef]
  8. Kaynakli, O.; Pulat, E.; Kilic, M. Thermal comfort during heating and cooling periods in an automobile. Heat Mass Transf. 2005, 41, 449–458. [Google Scholar] [CrossRef]
  9. Pala, U.; Oz, H.R. An investigation of thermal comfort inside a bus during heating period within a climatic chamber. Appl. Ergon. 2015, 48, 164–176. [Google Scholar] [CrossRef]
  10. Fanger, P.O. Thermal Comfort: Analysis and Applications in Environmental Engineering; R.E. Krieger Pub. Co.: Malabar, FL, USA, 1982. [Google Scholar]
  11. ISO 7730; Ergonomics of the Thermal Environment: Analytical Determination and Interpretation of Thermal Comfort Using Calculation of the PMV and PPD Indicesand Local Thermal Comfort Criteria. ISO: Geneva, Switzerland, 2005.
  12. Cheung, T.; Schiavon, S.; Parkinson, T.; Li, P.; Brager, G. Analysis of the accuracy on PMV–PPD model using the ASHRAE Global Thermal Comfort Database II. Build. Environ. 2019, 153, 205–217. [Google Scholar] [CrossRef]
  13. Jefferies, D.; Ly, T.-A.; Kunith, A.; Göhlich, D. Energiebedarf verschiedener Klimatisierungssysteme für Elektro-Linienbusse. In Proceedings of the Deutsche Kälte und Klimatagung 2015; Deutscher Kälte- und Klimatechnischer Verein e.V.: Dresden, Germany, 2015. [Google Scholar]
  14. Gagge, A.P.; Fobelets, A.P.; Berglund, L.G. A Standard Predictive Index of Human Response to the Thermal Environment. ASHRAE Trans. 1986, 92 Pt 2, 709–731. [Google Scholar]
  15. Zhang, H.; Arens, E.; Huizenga, C.; Han, T. Thermal sensation and comfort models for non-uniform and transient environments: Part I: Local sensation of individual body parts. Build. Environ. 2010, 45, 380–388. [Google Scholar] [CrossRef]
  16. Abou Jaoude, R.; Thiagalingam, I.; El Khoury, R.; Crehan, G. Berkeley thermal comfort models: Comparison to people votes and indications for user-centric HVAC strategies in car cabins. Build. Environ. 2020, 180, 107093. [Google Scholar] [CrossRef]
  17. Qavidel Fard, Z.; Zomorodian, Z.S.; Korsavi, S.S. Application of machine learning in thermal comfort studies: A review of methods, performance and challenges. Energy Build. 2022, 256, 111771. [Google Scholar] [CrossRef]
  18. Guenther, J.; Sawodny, O. Feature selection and Gaussian Process regression for personalized thermal comfort prediction. Build. Environ. 2019, 148, 448–458. [Google Scholar] [CrossRef]
  19. Ju, Y.J.; Lim, J.R.; Jeon, E.S. Prediction of AI-Based Personal Thermal Comfort in a Car Using Machine-Learning Algorithm. Electronics 2022, 11, 340. [Google Scholar] [CrossRef]
  20. Kotsiantis, S.B.; Zaharakis, I.D.; Pintelas, P.E. Machine learning: A review of classification and combining techniques. Artif. Intell. Rev. 2006, 26, 159–190. [Google Scholar] [CrossRef]
  21. Althnian, A.; AlSaeed, D.; Al-Baity, H.; Samha, A.; Dris, A.B.; Alzakari, N.; Abou Elwafa, A.; Kurdi, H. Impact of Dataset Size on Classification Performance: An Empirical Evaluation in the Medical Domain. Appl. Sci. 2021, 11, 796. [Google Scholar] [CrossRef]
  22. Chaudhuri, T.; Soh, Y.C.; Li, H.; Xie, L. Machine learning based prediction of thermal comfort in buildings of equatorial Singapore. In Proceedings of the 2017 IEEE International Conference on Smart Grid and Smart Cities (ICSGSC), Singapore, 23–26 July 2017; pp. 72–77. [Google Scholar] [CrossRef]
  23. ASHRAE Standard—55; Thermal Environmental Conditions for Human Occupancy. ASHRAE: Peachtree Corners, GA, USA, 2020.
  24. Tartarini, F.; Schiavon, S. pythermalcomfort: A Python package for thermal comfort research. SoftwareX 2020, 12, 100578. [Google Scholar] [CrossRef]
  25. Wang, S.C. Artificial Neural Network. In Interdisciplinary Computing in Java Programming; Springer: Boston, MA, USA, 2003; pp. 81–100. [Google Scholar] [CrossRef]
  26. Kasar, M.M.; Bhattacharyya, D.; Kim, T.H. Face Recognition Using Neural Network: A Review. Int. J. Secur. Its Appl. 2016, 10, 81–100. [Google Scholar] [CrossRef]
  27. Hasan, M.T.; Fattahul Islam, K.M.; Rahman, M.S.; Li, S. Weather Forecasting Using Artificial Neural Network. In Proceedings of the Artificial Intelligence and Security; Sun, X., Pan, Z., Bertino, E., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 171–180. [Google Scholar]
  28. Deng, Z.; Chen, Q. Artificial neural network models using thermal sensations and occupants’ behavior for predicting thermal comfort. Energy Build. 2018, 174, 587–602. [Google Scholar] [CrossRef]
  29. Dyvia, H.A.; Arif, C. Analysis of thermal comfort with predicted mean vote (PMV) index using artificial neural network. IOP Conf. Ser. Earth Environ. Sci. 2021, 622, 012019. [Google Scholar] [CrossRef]
  30. Jahan, I.; Ahmed, M.F.; Ali, M.O.; Jang, Y.M. Self-gated rectified linear unit for performance improvement of deep neural networks. ICT Express 2022, 9, 320–325. [Google Scholar] [CrossRef]
  31. Rokach, L. Ensemble-based classifiers. Artif. Intell. Rev. 2010, 33, 1–39. [Google Scholar] [CrossRef]
  32. Breiman, L. Random Forest. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  33. Freund, Y.; Schapire, R.E. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
  34. Chaudhuri, T.; Zhai, D.; Soh, Y.C.; Li, H.; Xie, L. Random forest based thermal comfort prediction from gender-specific physiological parameters using wearable sensing technology. Energy Build. 2018, 166, 391–406. [Google Scholar] [CrossRef]
  35. Alsaleem, F.; Tesfay, M.; Rafaie, M.; Sinkar, K.; Besarla, D.; Arunasalam, P. An IoT Framework for Modeling and Controlling Thermal Comfort in Buildings. Front. Built Environ. 2020, 6, 87. [Google Scholar] [CrossRef]
  36. Shapire, R.E. The Boosting Approach to Machine Learning—An Overview. In Nonlinear Estimation and Classification; Springer: New York, NY, USA, 2022; Available online: https://www.aivc.org/sites/default/files/airbase_2522.pdf (accessed on 1 October 2023).
  37. Cunningham, P.; Delany, S.J. k-Nearest Neighbour Classifiers—A Tutorial. ACM Comput. Surv. 2022, 54, 1–25. [Google Scholar] [CrossRef]
  38. Latha, R.; Sreekanth, G.; Suganthe, R.; Geetha, M.; Selvaraj, R.E.; Balaji, S.; Harini, K.; Ponnusamy, P.P. Stock Movement Prediction using KNN Machine Learning Algorithm. In Proceedings of the 2022 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 25–27 January 2022; pp. 1–5. [Google Scholar] [CrossRef]
  39. Trstenjak, B.; Mikac, S.; Donko, D. KNN with TF-IDF based Framework for Text Categorization. Procedia Eng. 2014, 69, 1356–1364. [Google Scholar] [CrossRef]
  40. Xiong, L.; Yao, Y. Study on an adaptive thermal comfort model with K-nearest-neighbors (KNN) algorithm. Build. Environ. 2021, 202, 108026. [Google Scholar] [CrossRef]
  41. Ma, Y.; Guo, G. (Eds.) Support Vector Machines Applications; Springer: Berlin/Heidelberg, Germany, 2014. [Google Scholar]
  42. Patle, A.; Chouhan, D.S. SVM kernel functions for classification. In Proceedings of the 2013 International Conference on Advances in Technology and Engineering (ICATE), Mumbai, India, 23–25 January 2013; pp. 1–9. [Google Scholar] [CrossRef]
  43. Zhou, X.; Xu, L.; Zhang, J.; Niu, B.; Luo, M.; Zhou, G.; Zhang, X. Data-driven thermal comfort model via support vector machine algorithms: Insights from ASHRAE RP-884 database. Energy Build. 2020, 211, 109795. [Google Scholar] [CrossRef]
  44. Patro, S.K.; Sahu, K.K. Normalization: A Prepocessing Stage. Int. Adv. Res. J. Sci. Eng. Technol. 2015, 2, 20–22. [Google Scholar] [CrossRef]
  45. Sokolova, M.; Japkowicz, N.; Szpakowicz, S. Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation. In Australasian Joint Conference on Artificial Intelligence; Springer: Berlin/Heidelberg, Germany, 2006; Volume 4304, pp. 1015–1021. [Google Scholar] [CrossRef]
  46. Chen, L. Curse of Dimensionality. In Encyclopedia of Database Systems; Liu, L., Özsu, M.T., Eds.; Springer: Boston, MA, USA, 2009; pp. 545–546. [Google Scholar] [CrossRef]
Figure 1. Bus layout showing the division in sectors and the position of the measurement stations (numbered squares 1–8).
Figure 1. Bus layout showing the division in sectors and the position of the measurement stations (numbered squares 1–8).
Applsci 13 11190 g001
Figure 2. Number of surveys for measurement set and thermal sensation value.
Figure 2. Number of surveys for measurement set and thermal sensation value.
Applsci 13 11190 g002
Figure 3. Correlation matrix.
Figure 3. Correlation matrix.
Applsci 13 11190 g003
Figure 4. Parameters pairplot, were blue represents the passengers belonging to group 1 and orange to group 2 as defined in Section 2.2.
Figure 4. Parameters pairplot, were blue represents the passengers belonging to group 1 and orange to group 2 as defined in Section 2.2.
Applsci 13 11190 g004
Figure 5. Comparison of reported TS compared to the calculated PMV.
Figure 5. Comparison of reported TS compared to the calculated PMV.
Applsci 13 11190 g005
Figure 6. Shift distributions for group 1 and 2.
Figure 6. Shift distributions for group 1 and 2.
Applsci 13 11190 g006
Figure 7. Accuracy box plot for ML models and PMV-PPD using parameter set S 1 .
Figure 7. Accuracy box plot for ML models and PMV-PPD using parameter set S 1 .
Applsci 13 11190 g007
Figure 8. Accuracy box plot for ML models using parameter set S 2 .
Figure 8. Accuracy box plot for ML models using parameter set S 2 .
Applsci 13 11190 g008
Figure 9. Accuracy box plot for models using parameter set S 3 .
Figure 9. Accuracy box plot for models using parameter set S 3 .
Applsci 13 11190 g009
Figure 10. Confusion matrix for S M V R b f model using parameter set S 2 .
Figure 10. Confusion matrix for S M V R b f model using parameter set S 2 .
Applsci 13 11190 g010
Figure 11. Median accuracy of ML models using the 3 different parameter sets compared to PMV accuracy using parameter set S 1 .
Figure 11. Median accuracy of ML models using the 3 different parameter sets compared to PMV accuracy using parameter set S 1 .
Applsci 13 11190 g011
Table 1. Entries of the data set, including the climatic and personal parameters.
Table 1. Entries of the data set, including the climatic and personal parameters.
ParameterSymbolUnit
Air temperature T a °C
Relative air humidity R H %
Air velocity V a m / s
Mean radiant temperature T r °C
External air temperature T a , o u t °C
Age a g e years
Sex s e x Male/Female
Heighth m
Weightm k g
Clothing insulation I c l Clo
Thermal Sensation T S ASHRAE scale
Position in the bus cabin p o s Sector 1 to 6
Table 2. ASHRAE scale for the evaluation of passengers’ TS.
Table 2. ASHRAE scale for the evaluation of passengers’ TS.
ValueThermal Sensation (TS)
3 Cold
2 Cool
1 Slightly cool
  0Neutral
  1Slightly warm
  2Warm
  3Hot
Table 3. Measurement sets (MS).
Table 3. Measurement sets (MS).
MSDateNumber of
Surveys
Considered
Surveys
Mean External
Air Temperature
126 August 202113210614.7–20.3 °C
217 June 2022836517.7–26.5 °C
325 June 202211410723.5–32.0 °C
Table 4. Variance inflation factors (VIF) before and after feature engineering.
Table 4. Variance inflation factors (VIF) before and after feature engineering.
ParameterSymbolVIF (Before)VIF (After)
Air temperature T a 63.454.93
Relative air humidity R H 4.424.02
Air velocity V a 1.211.16
Mean radiant temperature T r 60.78
External air temperature T a , o u t 11.63
Age a g e 1.071.07
Sex s e x 1.061.06
Clothing insulation I c l 1.641.57
Body mass index B M I 1.141.13
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Santoyo Alum, A.; Fay, T.-A.; Cigarini, F.; Göhlich, D. Assessment of Thermal Comfort in an Electric Bus Based on Machine Learning Classification. Appl. Sci. 2023, 13, 11190. https://doi.org/10.3390/app132011190

AMA Style

Santoyo Alum A, Fay T-A, Cigarini F, Göhlich D. Assessment of Thermal Comfort in an Electric Bus Based on Machine Learning Classification. Applied Sciences. 2023; 13(20):11190. https://doi.org/10.3390/app132011190

Chicago/Turabian Style

Santoyo Alum, Anuar, Tu-Anh Fay, Francesco Cigarini, and Dietmar Göhlich. 2023. "Assessment of Thermal Comfort in an Electric Bus Based on Machine Learning Classification" Applied Sciences 13, no. 20: 11190. https://doi.org/10.3390/app132011190

APA Style

Santoyo Alum, A., Fay, T. -A., Cigarini, F., & Göhlich, D. (2023). Assessment of Thermal Comfort in an Electric Bus Based on Machine Learning Classification. Applied Sciences, 13(20), 11190. https://doi.org/10.3390/app132011190

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop