Appendix A. Conversion of Continuous Data into Categorical Data
In this section, the step-by-step procedure that was used to build the regression tree model is presented. For this purpose, a sample dataset that was built from a few samples of the original dataset is shown in
Table A1.
Table A1.
Sample data to build the regression tree.
Table A1.
Sample data to build the regression tree.
L(T-24) | L(T-48) | DAY | SEASON | TEMP | HUMIDITY | L(T) |
---|
2176 | 412 | 1 | 1 | 67 | 88 | 432 |
2354 | 1829 | 0 | 1 | 68 | 88 | 2260 |
2777 | 2647 | 0 | 2 | 70 | 83 | 2681 |
3112 | 3203 | 1 | 2 | 75 | 67 | 3343 |
1663 | 1549 | 1 | 0 | 75 | 73 | 1579 |
1010 | 1027 | 0 | 0 | 71 | 93 | 1018 |
To convert continuous features into categorical features, multiple subtables were formed from
Table A1. Each subtable consisted of one input feature and one target feature. We sorted each subtable in ascending order of input feature in that table. We calculated the average between every two continuous input feature values. We converted continuous input features into categorical features based on their average value. We found the mean squared value for each average and the average value that gave lowest “MSE” were treated as threshold for that feature.
Prepare the subtable for input feature L(T-24) and output feature L(T) as shown in
Table A2.
Table A2.
Sorted subtable—L(T-24) vs. L(T).
Table A2.
Sorted subtable—L(T-24) vs. L(T).
L(T-24) | L(T) |
---|
1010 | 1018 |
1663 | 1579 |
2176 | 432 |
2354 | 2260 |
2777 | 2681 |
3112 | 3343 |
Calculate the average between every two continuous input feature values for L(T-24) and the average values are [1336, 1919, 2265, 2566, 2945].
Convert the continuous input feature L(T-24) into a categorical feature based on the average value 1336 and
Table A2, as shown in
Table A3. The predicted value against each category of input feature is the average of all output variables for that category.
Table A3.
Categorical subtable—L(T-24) vs. L(T).
Table A3.
Categorical subtable—L(T-24) vs. L(T).
L(T-24) | L(T) | (T) |
---|
<1336 | 1018 | 1018 |
≥1336 | 1579 | = 2059 |
≥1336 | 432 | 2059 |
≥1336 | 2260 | 2059 |
≥1336 | 2681 | 2059 |
≥1336 | 3343 | 2059 |
Calculate the mean squared error based on the actual and predicted load values shown in
Table A3 and presented below
= 825,622.
Convert the continuous input feature L(T-24) into a categorical feature based on the average value 1919 and
Table A2, as shown in
Table A4. The predicted value against each category of input feature is the average of all output variables for that category.
Table A4.
Categorical subtable—L(T-24) vs. L(T) with average value 1919.
Table A4.
Categorical subtable—L(T-24) vs. L(T) with average value 1919.
L(T-24) | L(T) | (T) |
---|
<1919 | 1018 | 1298 |
<1919 | 1579 | 1298 |
≥1919 | 432 | = 2179 |
≥1919 | 2260 | 2179 |
≥1919 | 2681 | 2179 |
≥1919 | 3343 | 2179 |
Calculate the mean squared error based on the actual and predicted load values shown in
Table A4 and presented below
= 803,903.
Convert the continuous input feature L(T-24) into a categorical feature based on the average value 2265 and
Table A2, as shown in
Table A5. The predicted value against each category of input feature is the average of all output variables for that category.
Table A5.
Categorical subtable—L(T-24) vs. L(T) with average value 2265.
Table A5.
Categorical subtable—L(T-24) vs. L(T) with average value 2265.
L(T-24) | L(T) | (T) |
---|
<2265 | 1018 | = 1010 |
<2265 | 1579 | 1010 |
<2265 | 432 | 1010 |
≥2265 | 2260 | = 2761 |
≥2265 | 2681 | 2761 |
≥2265 | 3343 | 2761 |
Calculate the mean squared error based on the actual and predicted load values shown in
Table A5 and presented below
= 209,064.
Convert the continuous input feature L(T-24) into a categorical feature based on the average value 2566 and
Table A2, as shown in
Table A6. The predicted value against each category of input feature is the average of all output variables for that category.
Table A6.
Categorical subtable—L(T-24) vs. L(T) with average value 2566.
Table A6.
Categorical subtable—L(T-24) vs. L(T) with average value 2566.
L(T-24) | L(T) | (T) |
---|
<2566 | 1018 | = 1322 |
<2566 | 1579 | 1322 |
<2566 | 432 | 1322 |
<2566 | 2260 | 1322 |
≥2566 | 2681 | = 3012 |
≥2566 | 3343 | 3012 |
Calculate the mean squared error based on the actual and predicted load values shown in
Table A6 and presented below
= 341,675.
Convert the continuous input feature L(T-24) into a categorical feature based on the average value 2945 and
Table A2, as shown in
Table A7. The predicted value against each category of input feature is the average of all output variables for that category.
Table A7.
Categorical subtable—L(T-24) vs. L(T) with average value 2945.
Table A7.
Categorical subtable—L(T-24) vs. L(T) with average value 2945.
L(T-24) | L(T) | (T) |
---|
<2945 | 1018 | = 1594 |
<2945 | 1579 | 1594 |
<2945 | 432 | 1594 |
<2945 | 2260 | 1594 |
<2945 | 2681 | 1594 |
≥2945 | 3343 | 3343 |
Calculate the mean squared error based on the actual and predicted load values shown in
Table A7 and presented below
= 551,591.
Prepare subtable for input feature L(T-48) and output feature L(T) as shown in
Table A8.
Table A8.
Sorted subtable—L(T-48) vs. L(T).
Table A8.
Sorted subtable—L(T-48) vs. L(T).
L(T-48) | L(T) |
---|
412 | 432 |
1027 | 1018 |
1549 | 1579 |
1829 | 2260 |
2647 | 2681 |
3203 | 3343 |
Calculate the average between every two continuous input feature values for L(T-48) and the average values are [720, 1288, 1689, 2238, 2925].
Convert the continuous input feature L(T-48) into a categorical feature based on the average value 720 and
Table A8, as shown in
Table A9. The predicted value against each category of input feature is the average of all output variables for that category.
Table A9.
Categorical subtable—L(T-48) vs. L(T).
Table A9.
Categorical subtable—L(T-48) vs. L(T).
L(T-48) | L(T) | (T) |
---|
<720 | 432 | 432 |
≥720 | 1018 | = 2176.2 |
≥720 | 1579 | 2176.2 |
≥720 | 2260 | 2176.2 |
≥720 | 2681 | 2176.2 |
≥720 | 3343 | 2176.2 |
Calculate the mean squared error based on the actual and predicted load values shown in
Table A9 and presented below
= 553,557.1333.
Convert the continuous input feature L(T-48) into a categorical feature based on the average value 1288 and
Table A8, shown in
Table A10. The predicted value against each category of input feature is the average of all output variables for that category.
Table A10.
Categorical subtable—L(T-48) vs. L(T).
Table A10.
Categorical subtable—L(T-48) vs. L(T).
L(T-48) | L(T) | (T) |
---|
<1288 | 432 | = 725 |
<1288 | 1018 | 725 |
≥1288 | 1579 | = 2465.75 |
≥1288 | 2260 | 2465.75 |
≥1288 | 2681 | 2465.75 |
≥1288 | 3343 | 2465.75 |
Calculate the mean squared error based on the actual and predicted load values shown in
Table A10 and presented below
= 302,709.4583.
Convert the continuous input feature L(T-48) into a categorical feature based on the average value 1689 and
Table A8, as shown in
Table A11. The predicted value against each category of input feature is the average of all output variables for that category.
Table A11.
Categorical subtable—L(T-48) vs. L(T).
Table A11.
Categorical subtable—L(T-48) vs. L(T).
L(T-48) | L(T) | (T) |
---|
<1689 | 432 | = 1009.666667 |
<1689 | 1018 | 1009.666667 |
<1689 | 1579 | 1009.666667 |
≥1689 | 2260 | = 2761.333333 |
≥1689 | 2681 | 2761.333333 |
≥1689 | 3343 | 2761.333333 |
Calculate the mean squared error based on the actual and predicted load values shown in
Table A11 and presented below
= 209,005.5556.
Convert the continuous input feature L(T-48) into a categorical feature based on the average value 2238 and
Table A8, as shown in
Table A12. The predicted value against each category of input feature is the average of all output variables for that category.
Table A12.
Categorical subtable—L(T-48) vs. L(T).
Table A12.
Categorical subtable—L(T-48) vs. L(T).
L(T-48) | L(T) | (T) |
---|
<2238 | 432 | = 1322.25 |
<2238 | 1018 | 1322.257 |
<2238 | 1579 | 1322.25 |
<2238 | 2260 | 1322.25 |
≥2238 | 2681 | = 3012 |
≥2238 | 3343 | 3012 |
Calculate the mean squared error based on the actual and predicted load values shown in
Table A12 and presented below
= 341,588.4583.
Convert the continuous input feature L(T-48) into a categorical feature based on the average value 2925 and
Table A8, as shown in
Table A13. The predicted value against each category of input feature is the average of all output variables for that category.
Table A13.
Categorical subtable—L(T-48) vs. L(T).
Table A13.
Categorical subtable—L(T-48) vs. L(T).
L(T-48) | L(T) | (T) |
---|
<2925 | 432 | = 1594 |
<2925 | 1018 | 1594 |
<2925 | 1579 | 1594 |
<2925 | 2260 | 1594 |
<2925 | 2681 | 1594 |
≥2925 | 3343 | 3343 |
Calculate the mean squared error based on the actual and predicted load values shown in
Table A13 and presented below
= 551,228.3333.
Prepare subtable for input feature L(TEMP) and output feature L(T) and shown in
Table A14
Table A14.
Sorted subtable—L(TEMP) vs. L(T).
Table A14.
Sorted subtable—L(TEMP) vs. L(T).
L(TEMP) | L(T) |
---|
67 | 432 |
68 | 2260 |
70 | 2681 |
71 | 1018 |
75 | 3343 |
75 | 1579 |
Calculate average between every two continuous input feature values for L(TEMP) and the average values are [67.5, 69, 70.5, 73, 75].
Convert the continuous input feature L(TEMP) into a categorical feature based on the average value 67.5 and
Table A14, as shown in
Table A15. The predicted value against each category of input feature is the average of all output variables for that category.
Table A15.
Categorical subtable—L(TEMP) vs. L(T).
Table A15.
Categorical subtable—L(TEMP) vs. L(T).
L(TEMP) | L(T) | (T) |
---|
<67.5 | 432 | 432 |
≥67.5 | 2260 | = 2176.2 |
≥67.5 | 2681 | 2176.2 |
≥67.5 | 1018 | 2176.2 |
≥67.5 | 3343 | 2176.2 |
≥67.5 | 1579 | 2176.2 |
Calculate the mean squared error based on the actual and predicted load values shown in
Table A15 and presented below
= 553,557.
Convert the continuous input feature L(TEMP) into a categorical feature based on the average value 69 and
Table A14, as shown in
Table A16. The predicted value against each category of input feature is the average of all output variables for that category.
Table A16.
Categorical subtable—L(TEMP) vs. L(T).
Table A16.
Categorical subtable—L(TEMP) vs. L(T).
L(TEMP) | L(T) | (T) |
---|
<69 | 432 | = 1346 |
<69 | 2260 | 1346 |
≥69 | 2681 | = 2155.25 |
≥69 | 1018 | 2155.25 |
≥69 | 3343 | 2155.25 |
≥69 | 1579 | 2155.25 |
Calculate the mean squared error based on the actual and predicted load values shown in
Table A16 and presented below
= 830,559.
Convert the continuous input feature L(TEMP) into a categorical feature based on the average value 70.5 and
Table A14, as shown in
Table A17. The predicted value against each category of input feature is the average of all output variables for that category.
Table A17.
Categorical subtable—L(TEMP) vs. L(T).
Table A17.
Categorical subtable—L(TEMP) vs. L(T).
L(TEMP) | L(T) | (T) |
---|
<70.5 | 432 | = 1791 |
<70.5 | 2260 | 1791 |
<70.5 | 2681 | 1791 |
≥70.5 | 1018 | = 1980 |
≥70.5 | 3343 | 1980 |
≥70.5 | 1579 | 1980 |
Calculate the mean squared error based on the actual and predicted load values shown in
Table A17 and presented below
= 967,159.
Convert the continuous input feature L(TEMP) into a categorical feature based on the average value 73 and
Table A14, as shown in
Table A18. The predicted value against each category of input feature is the average of all output variables for that category.
Table A18.
Categorical subtable—L(TEMP) vs. L(T).
Table A18.
Categorical subtable—L(TEMP) vs. L(T).
L(TEMP) | L(T) | (T) |
---|
<73 | 432 | = 1597.75 |
<73 | 2260 | 1597.75 |
<73 | 2681 | 1597.75 |
<73 | 1018 | 1597.75 |
≥73 | 3343 | = 2461 |
≥73 | 1579 | 2461 |
Calculate the mean squared error based on the actual and predicted load values shown in
Table A18 and presented below
= 810,489.
Convert the continuous input feature L(TEMP) into a categorical feature based on the average value 75 and
Table A14, as shown in
Table A19. The predicted value against each category of input feature is the average of all output variables for that category.
Table A19.
Categorical subtable—L(TEMP) vs. L(T).
Table A19.
Categorical subtable—L(TEMP) vs. L(T).
L(TEMP) | L(T) | (T) |
---|
<75 | 432 | = 1946.8 |
<75 | 2260 | 1946.8 |
<75 | 2681 | 1946.8 |
<75 | 1018 | 1946.8 |
<75 | 3343 | 1946.8 |
≥75 | 1579 | 1579 |
Calculate the mean squared error based on the actual and predicted load values shown in
Table A19 and presented below
= 957,301.
Prepare subtable for input feature Humidity and output feature L(T) and shown in
Table A20
Table A20.
Sorted subtable—Humidity vs. L(T).
Table A20.
Sorted subtable—Humidity vs. L(T).
Humidity | L(T) |
---|
67 | 3343 |
73 | 1579 |
83 | 2681 |
88 | 432 |
88 | 2260 |
93 | 1018 |
Calculate average between every two continuous input feature values for Humidity and the average values are [70, 78, 85.5, 88, 90.5].
Convert the continuous input feature Humidity into a categorical feature based on the average value 70 and
Table A20, as shown in
Table A21. The predicted value against each category of input feature is the average of all output variables for that category.
Table A21.
Categorical subtable—Humidity vs. L(T).
Table A21.
Categorical subtable—Humidity vs. L(T).
Humidity | L(T) | (T) |
---|
<70 | 3343 | 3343 |
≥70 | 1579 | = 1594 |
≥70 | 2681 | 1594 |
≥70 | 432 | 1594 |
≥70 | 2260 | 1594 |
≥70 | 1018 | 1594 |
Calculate the mean squared error based on the actual and predicted load values shown in
Table A21 and presented below
= 551,228.
Convert the continuous input feature Humidity into a categorical feature based on the average value 78 and
Table A20, as shown in
Table A22. The predicted value against each category of input feature is the average of all output variables for that category.
Table A22.
Categorical subtable—Humidity vs. L(T).
Table A22.
Categorical subtable—Humidity vs. L(T).
Humidity | L(T) | (T) |
---|
<78 | 3343 | = 2461 |
<78 | 1579 | 2461 |
≥78 | 2681 | = 1597.75 |
≥78 | 432 | 1597.75 |
≥78 | 2260 | 1597.75 |
≥78 | 1018 | 1597.75 |
Calculate the mean squared error based on the actual and predicted load values shown in
Table A22 and presented below
= 810,489.
Convert the continuous input feature Humidity into a categorical feature based on the average value 85.5 and
Table A20, as shown in
Table A23. The predicted value against each category of input feature is the average of all output variables for that category.
Table A23.
Categorical subtable—Humidity vs. L(T).
Table A23.
Categorical subtable—Humidity vs. L(T).
Humidity | L(T) | (T) |
---|
<85.5 | 3343 | = 2534.33 |
<85.5 | 1579 | 2534.33 |
<85.5 | 2681 | 2534.33 |
≥85.5 | 432 | = 1236.67 |
≥85.5 | 2260 | 1236.67 |
≥85.5 | 1018 | 1236.67 |
Calculate the mean squared error based on the actual and predicted load values shown in
Table A23 and presented below
= 555,105
Convert the continuous input feature Humidity into a categorical feature based on the average value 88 and
Table A20, as shown in
Table A24. The predicted value against each category of input feature is the average of all output variables for that category.
Table A24.
Categorical subtable—Humidity vs. L(T).
Table A24.
Categorical subtable—Humidity vs. L(T).
Humidity | L(T) | (T) |
---|
<88 | 3343 | = 2008.75 |
<88 | 1579 | 2008.75 |
<88 | 2681 | 2008.75 |
<88 | 432 | 2008.75 |
≥88 | 2260 | = 1639 |
≥88 | 1018 | 1639 |
Calculate the mean squared error based on the actual and predicted load values shown in
Table A24 and presented below
= 945,708
Convert the continuous input feature Humidity into a categorical feature based on the average value 90.5 and
Table A20, as shown in
Table A25. The predicted value against each category of input feature is the average of all output variables for that category.
Table A25.
Categorical subtable—Humidity vs. L(T).
Table A25.
Categorical subtable—Humidity vs. L(T).
Humidity | L(T) | (T) |
---|
<90.5 | 3343 | = 2059 |
<90.5 | 1579 | 2059 |
<90.5 | 2681 | 2059 |
<90.5 | 432 | 2059 |
<90.5 | 2260 | 2059 |
≥90.5 | 1018 | 1018 |
Calculate the mean squared error based on the actual and predicted load values shown in
Table A25 and presented below
= 825,578
From all the above calculations, the minimum MSE value for the feature “T-24” is 209,064 for the split ≥2265, the minimum MSE value for the feature “T-48” is 209,006 for the split ≥1689, the minimum MSE value for the feature “Temperature” is 553,557 for the split ≥67.5, the minimum MSE value for the feature “Humidity” is 551,228 for the split ≥70. Hence, these splits against each feature were used to convert the continuous data shown in
Table A1 into categorical data, as shown in
Table A26. Furthermore, the MSE value for the day with categories (1 and 0) is 965,922 and the MSE value for the season with categories (0, 1, and 2) is 747157. All the MSE values are presented in
Table A1 with bold font.
Table A26.
Sample categorical data to build regression tree.
Table A26.
Sample categorical data to build regression tree.
L(T-24) | L(T-48) | DAY | SEASON | TEMP | HUMIDITY | L(T) |
---|
<2265 | <1689 | 1 | 1 | <67.5 | ≥70 | 432 |
≥2265 | ≥1689 | 0 | 1 | ≥67.5 | ≥70 | 2260 |
≥2265 | ≥1689 | 0 | 2 | ≥67.5 | ≥70 | 2681 |
≥2265 | ≥1689 | 1 | 2 | ≥67.5 | <70 | 3343 |
<2265 | <1689 | 1 | 0 | ≥67.5 | ≥70 | 1579 |
<2265 | <1689 | 0 | 0 | ≥67.5 | ≥70 | 1018 |
209,064 | 209,006 | 965,922 | 747,157 | 553,557 | 551,228 | – |
Appendix B. Regression Tree Model Formulation
From
Table A26, we observe that L(T-48) has a minimum MSE value, i.e., 209,006 in comparison with all the remaining features. Hence, the input feature L(T-48) is considered as a root node for the regression tree and that node has two branches ≥1689 and <1689. In order to identify the decision node under each branch,
Table A26 is divided into two subtables, presented in
Table A27 and
Table A28.
Table A27.
Subtable: L(T-48) < 1689.
Table A27.
Subtable: L(T-48) < 1689.
L(T-24) | DAY | SEASON | TEMP | HUMIDITY | T |
---|
<2265 | 1 | 1 | <67.5 | ≥70 | 432 |
<2265 | 1 | 0 | ≥67.5 | ≥70 | 1579 |
<2265 | 0 | 0 | ≥67.5 | ≥70 | 1018 |
Table A28.
Subtable: L(T-48) ≥ 1689.
Table A28.
Subtable: L(T-48) ≥ 1689.
L(T-24) | DAY | SEASON | TEMP | HUMIDITY | T |
---|
≥2265 | 0 | 1 | ≥67.5 | ≥70 | 2260 |
≥2265 | 0 | 2 | ≥67.5 | ≥70 | 2681 |
≥2265 | 1 | 2 | ≥67.5 | <70 | 3343 |
In order to identify the decision node among L(T-24), day, season, temperature, and humidity under branch <1689
Table A28 is further divided into multiple subtables based on each input feature.
A subtable based on input feature L(T-24) and target variable L(T) is presented in
Table A29. From
Table A29, it is observed that input feature L(T-24) has an MSE value of 219,303.
Table A29.
L(T-24) vs. L(T) for L(T-48) < 1689.
Table A29.
L(T-24) vs. L(T) for L(T-48) < 1689.
L(T-24) | L(T) Prediction | Squared Error | MSE | |
---|
<2265 | 432 | 1010 | 333,699 | |
<2265 | 1579 | 1010 | 324,140 | 219,303 |
<2265 | 1018 | 1010 | 69 | |
A subtable based on input feature day and target variable L(T) is presented in
Table A30. From
Table A30, it is observed that input feature day has an MSE value of 219,268.
Table A30.
Day vs. L(T) for L(T-48) < 1689.
Table A30.
Day vs. L(T) for L(T-48) < 1689.
Day | L(T) Prediction | Squared Error | MSE | |
---|
1 | 432 | 1005.5 | 328,902.25 | |
1 | 1579 | 1005.5 | 328,902.25 | 219,268 |
0 | 1018 | 1018 | 0 | |
A subtable based on input feature season and target variable L(T) is presented in
Table A31. From
Table A31, it is observed that input feature season has an MSE value of 52,454.
Table A31.
Season vs. L(T) for L(T-48) < 1689.
Table A31.
Season vs. L(T) for L(T-48) < 1689.
Season | L(T) Prediction | Squared Error | MSE | |
---|
1 | 432 | 432 | 0 | |
0 | 1579 | 1298.5 | 78,680.25 | 52,454 |
0 | 1018 | 1298.5 | 78,680.25 | |
A subtable based on input feature temperature and target variable L(T) is presented in
Table A32. From
Table A32, it is observed that input feature temperature has an MSE value of 52,454.
Table A32.
Temperature vs. L(T) for L(T-48) < 1689.
Table A32.
Temperature vs. L(T) for L(T-48) < 1689.
Temperature | L(T) Prediction | Squared Error | MSE | |
---|
<67.5 | 432 | 432 | 0 | |
≥67.5 | 1579 | 1298.5 | 78,680.25 | 52454 |
≥67.5 | 1018 | 1298.5 | 78,680.25 | |
A subtable based on input feature humidity and target variable L(T) is presented in
Table A33. From
Table A33, it is observed that input feature humidity has an MSE value of 219,303.
Table A33.
Humidity vs. L(T) for L(T-48) < 1689.
Table A33.
Humidity vs. L(T) for L(T-48) < 1689.
Humidity | L(T) Prediction | Squared Error | MSE | |
---|
≥70 | 432 | 1009.67 | 333,698.78 | |
≥70 | 1579 | 1009.67 | 324,140.44 | 219,303 |
≥70 | 1018 | 1009.67 | 69.44 | |
It is observed from the above calculations that season and temperature have a minimum MSE, i.e., 52,454. Here, season is considered as a decision node under branch L(T-48) < 1689. Now, the node season has two branches, i.e., season “1” and “0”. In order to identify the decision/leaf node under each branch,
Table A27 is divided into two subtables, presented in
Table A34 and
Table A35. From
Table A34, it is observed that the branch corresponding to season “1” has a leaf node with value 432.
Table A34.
Subtable: L(T-48) < 1689 and season = “1”.
Table A34.
Subtable: L(T-48) < 1689 and season = “1”.
T-24 | DAY | TEMP | HUMIDITY | T |
---|
<2265 | 1 | <67.5 | ≥70 | 432 |
Table A35.
Subtable: L(T-48) < 1689 and season = “0”.
Table A35.
Subtable: L(T-48) < 1689 and season = “0”.
T-24 | DAY | TEMP | HUMIDITY | T |
---|
<2265 | 1 | ≥67.5 | ≥70 | 1579 |
<2265 | 0 | ≥67.5 | ≥70 | 1018 |
In order to identify the decision node among L(T-24), day, temperature, and humidity under branch season “0”,
Table A35 is divided into multiple subtables with respect to each feature.
A subtable based on input feature L(T-24) and target variable L(T) is presented in
Table A36. From
Table A36, it is observed that input feature L(T-24) has an MSE value of 78,680.
Table A36.
L(T-24) vs. L(T) for L(T-48) < 1689 and season “0”.
Table A36.
L(T-24) vs. L(T) for L(T-48) < 1689 and season “0”.
L(T-24) | L(T) | Prediction | Squared Error | MSE |
---|
<2265 | 1579 | 1298.5 | 78,680 | 78,680 |
<2265 | 1018 | 1298.5 | 78,680 | |
A subtable based on input feature day and target variable L(T) is presented in
Table A37. From
Table A37, it is observed that input feature day has an MSE value of 0.
Table A37.
Day vs. L(T) for L(T-48) < 1689 and season “0”.
Table A37.
Day vs. L(T) for L(T-48) < 1689 and season “0”.
Day | L(T) | Prediction | Squared Error | MSE |
---|
1 | 1579 | 1579 | 0 | 0 |
0 | 1018 | 1018 | 0 | |
A subtable based on input feature temperature and target variable L(T) is presented in
Table A38. From
Table A38, it is observed that input feature temperature has an MSE value of 78,680.
Table A38.
Temperature vs. L(T) for L(T-48) < 1689 and season “0”.
Table A38.
Temperature vs. L(T) for L(T-48) < 1689 and season “0”.
Temperature | L(T) | Prediction | Squared Error | MSE |
---|
≥67.5 | 1579 | 1298.5 | 78,680 | 78,680 |
≥67.5 | 1018 | 1298.5 | 78,680 | |
A subtable based on input feature humidity and target variable L(T) is presented in
Table A39. From
Table A39, it is observed that input feature humidity has an MSE value of 78,680.
Table A39.
Humidity vs. L(T) for L(T-48) < 1689 and season “0”.
Table A39.
Humidity vs. L(T) for L(T-48) < 1689 and season “0”.
Humidity | L(T) | Prediction | Squared Error | MSE |
---|
≥70 | 1579 | 1298.5 | 78,680 | 78,680 |
≥70 | 1018 | 1298.5 | 78,680 | |
It is observed from the above calculations that feature “Day” has a minimum MSE, i.e., 0. Here, “Day” is considered as decision node under the season “0” branch. Now, node “Day” has two branches, i.e., day “0” and “1” as presented in
Table A37. From
Table A37, it is observed that the branch corresponding to day “1” has a leaf node with value 1579 and day “0” has a leaf node with value 1018.
In order to identify the decision node among L(T-24), day, season, temperature, and humidity under branch ≥1689,
Table A40 is further divided into multiple subtables based on each input feature.
A subtable based on input feature L(T-24) and target variable L(T) is presented in
Table A40. From
Table A40, it is observed that input feature L(T-24) has an MSE value of 198,708.
Table A40.
L(T-24) vs. L(T) for L(T-48) ≥ 1689.
Table A40.
L(T-24) vs. L(T) for L(T-48) ≥ 1689.
L(T-24) | L(T) | Prediction | Squared Error | MSE |
---|
≥2265 | 2260 | 2761.33 | 251,335 | |
≥2265 | 2681 | 2761.33 | 6453 | 198,708 |
≥2265 | 3343 | 2761.33 | 338,336 | |
A subtable based on input feature day and target variable L(T) is presented in
Table A41. From
Table A41, it is observed that input feature day has an MSE value of 29,540.
Table A41.
Day vs. L(T) for L(T-48) ≥ 1689.
Table A41.
Day vs. L(T) for L(T-48) ≥ 1689.
Day | L(T) Prediction | Squared Error | MSE | |
---|
0 | 2260 | 2470.5 | 44310 | |
0 | 2681 | 2470.5 | 44310 | 29,540 |
1 | 3343 | 3343 | 0 | |
A subtable based on input feature season and target variable L(T) is presented in
Table A42. From
Table A42, it is observed that input feature season has an MSE value of 73,041.
Table A42.
Season vs. L(T) for L(T-48) ≥ 1689.
Table A42.
Season vs. L(T) for L(T-48) ≥ 1689.
Season | L(T) Prediction | Squared Error | MSE | |
---|
1 | 2260 | 2260 | 0 | |
2 | 2681 | 3012 | 109,561 | 73,041 |
2 | 3343 | 3012 | 109,561 | |
A subtable based on input feature temperature and target variable L(T) is presented in
Table A43. From
Table A43, it is observed that input feature temperature has an MSE value of 198,708.
Table A43.
Temperature vs. L(T) for L(T-48) ≥ 1689.
Table A43.
Temperature vs. L(T) for L(T-48) ≥ 1689.
Temperature | L(T) Prediction | Squared Error | MSE | |
---|
≥67.5 | 2260 | 2761.33 | 251,335 | |
≥67.5 | 2681 | 2761.33 | 6453 | 198,708 |
≥67.5 | 3343 | 2761.33 | 338,336 | |
A subtable based on input feature humidity and target variable L(T) is presented in
Table A44. From
Table A44, it is observed that input feature humidity has an MSE value of 219,303.
Table A44.
Humidity vs. L(T) for L(T-48) ≥ 1689.
Table A44.
Humidity vs. L(T) for L(T-48) ≥ 1689.
Humidity | L(T) Prediction | Squared Error | MSE | |
---|
≥70 | 2260 | 2470.5 | 44310 | |
≥70 | 2681 | 2470.5 | 44310 | 29,540 |
<67.5 | 3343 | 3343 | 0 | |
It is observed from the above calculations that day and humidity have a minimum MSE, i.e., 29,540. Here, day is considered as a decision node under branch L(T-48) ≥ 1689. Now, the node day has two branches, i.e., day “1” and “0”. In order to identify the decision/leaf node under each branch,
Table A28 is divided into two subtables, as presented in
Table A45 and in
Table A46. From
Table A45, it is observed that the branch corresponding to day “1” has a leaf node with value 3343.
Table A45.
Subtable: L(T-48) ≥ 1689 and day = “1”.
Table A45.
Subtable: L(T-48) ≥ 1689 and day = “1”.
L(T-24) | SEASON | TEMP | HUMIDITY | L(T) |
---|
≥2265 | 2 | ≥67.5 | <70 | 3343 |
Table A46.
Subtable: L(T-48) ≥ 1689 and Day = “0”.
Table A46.
Subtable: L(T-48) ≥ 1689 and Day = “0”.
L(T-24) | SEASON | TEMP | HUMIDITY | L(T) |
---|
≥2265 | 1 | ≥67.5 | ≥70 | 2260 |
≥2265 | 2 | ≥67.5 | ≥70 | 2681 |
In order to identify the decision node among L(T-24), season, temperature, and humidity under the branch day “0”,
Table A46 is divided into multiple subtables with respect to each feature.
A subtable based on input feature L(T-24) and target variable L(T) is presented in
Table A47. From
Table A47, it is observed that input feature L(T-24) has an MSE value of 44,310.25.
Table A47.
L(T-24) vs. L(T) for L(T-48) ≥ 1689 and day “0”.
Table A47.
L(T-24) vs. L(T) for L(T-48) ≥ 1689 and day “0”.
L(T-24) | L(T) | Prediction | Squared Error | MSE |
---|
≥2265 | 2260 | 2470.5 | 44,310.25 | 44,310.25 |
≥2265 | 2681 | 2470.5 | 44,310.25 | |
A subtable based on input feature deason and target variable L(T) is presented in
Table A48. From
Table A48, it is observed that input feature season has an MSE value of 0.
Table A48.
Season vs. L(T) for L(T-48) ≥ 1689 and day “0”.
Table A48.
Season vs. L(T) for L(T-48) ≥ 1689 and day “0”.
Season | L(T) | Prediction | Squared Error | MSE |
---|
1 | 2260 | 2260 | 0 | 0 |
2 | 2681 | 2681 | 0 | |
A subtable based on input feature temperature and target variable L(T) is presented in
Table A49. From
Table A49, it is observed that input feature temperature has an MSE value of 44,310.25.
Table A49.
Temperature vs. L(T) for L(T-48) ≥ 1689 and day “0”.
Table A49.
Temperature vs. L(T) for L(T-48) ≥ 1689 and day “0”.
Temperature | L(T) | Prediction | Squared Error | MSE |
---|
≥67.5 | 2260 | 2470.5 | 44,310.25 | 44,310.25 |
≥67.5 | 2681 | 2470.5 | 44,310.25 | |
A subtable based on input feature humidity and target variable L(T) is presented in
Table A50. From
Table A50, it is observed that input feature humidity has an MSE value of 44,310.25.
Table A50.
Humidity vs. L(T) for L(T-48) ≥ 1689 and day “0”.
Table A50.
Humidity vs. L(T) for L(T-48) ≥ 1689 and day “0”.
Humidity | L(T) | Prediction | Squared Error | MSE |
---|
≥70 | 2260 | 2470.5 | 44,310.25 | 44,310.25 |
≥70 | 2681 | 2470.5 | 44,310.25 | |
It is observed from the above calculations that feature “Season” has a minimum MSE, i.e., 0. Here, “Season” is considered as a decision node under branch day “0”. Now node “Season” has two branches, i.e., season “1” and “2” as presented in
Table A48. From
Table A48, it is observed that the branch corresponding to season “1” has a leaf node with value 2260 and season “2” has a leaf node with value 2681. Finally, the complete decision tree to predict load L(T) based on the features, i.e., L(T-24), L(T-48), day, season, temperature, and humidity is shown in
Figure A1. The decision tree shown in
Figure A1 was used to predict the load shown in
Table A1 and the predicted load is shown in
Table A51. From
Table A51, it is observed that both actual and predicted load values are equal.
Figure A1.
Regression tree architecture with sample data.
Figure A1.
Regression tree architecture with sample data.
Table A51.
Predicted load from sample data using regression tree.
Table A51.
Predicted load from sample data using regression tree.
L(T-24) | L(T-48) | DAY | SEASON | TEMP | HUMIDITY | L(T) | (T) |
---|
2176 | 412 | 1 | 1 | 67 | 88 | 432 | 432 |
2354 | 1829 | 0 | 1 | 68 | 88 | 2260 | 2260 |
2777 | 2647 | 0 | 2 | 70 | 83 | 2681 | 2681 |
3112 | 3203 | 1 | 2 | 75 | 67 | 3343 | 3343 |
1663 | 1549 | 1 | 0 | 75 | 73 | 1579 | 1579 |
1010 | 1027 | 0 | 0 | 71 | 93 | 1018 | 1018 |