Next Article in Journal
Existence, Uniqueness and Asymptotic Behavior of Solutions for Semilinear Elliptic Equations
Next Article in Special Issue
Embedding the Different Families of Fuzzy Sets into Banach Spaces by Using Cauchy Sequences
Previous Article in Journal
Partition Differential Equations and Some Combinatorial Algebraic Structures
Previous Article in Special Issue
Novel Fuzzy Ostrowski Integral Inequalities for Convex Fuzzy-Valued Mappings over a Harmonic Convex Set: Extending Real-Valued Intervals Without the Sugeno Integrals
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Novel Outlier-Robust Accuracy Measure for Machine Learning Regression Using a Non-Convex Distance Metric

by
Ahmad B. Hassanat
1,*,
Mohammad Khaled Alqaralleh
1,
Ahmad S. Tarawneh
1,*,
Khalid Almohammadi
2,
Maha Alamri
3,
Abdulkareem Alzahrani
4,
Ghada A. Altarawneh
5 and
Rania Alhalaseh
1
1
Faculty of Information Technology, Mutah University, Karak 61710, Jordan
2
Department of Computer Science, Applied College, University of Tabuk, Tabuk 47512, Saudi Arabia
3
Department of Systems and Networking, Faculty of Computing and Information, Al-Baha University, Al-Baha 65779, Saudi Arabia
4
Department of Computer Science, Faculty of Computing and Information, Al-Baha University, Al-Baha 65779, Saudi Arabia
5
Faculty of Business, Mutah University, Karak 61710, Jordan
*
Authors to whom correspondence should be addressed.
Mathematics 2024, 12(22), 3623; https://doi.org/10.3390/math12223623
Submission received: 24 October 2024 / Revised: 16 November 2024 / Accepted: 16 November 2024 / Published: 20 November 2024
(This article belongs to the Special Issue Novel Approaches in Fuzzy Sets and Metric Spaces)

Abstract

:
Regression, a supervised machine learning approach, establishes relationships between independent variables and a continuous dependent variable. It is widely applied in areas like price prediction and time series forecasting. The performance of regression models is typically assessed using error metrics such as the Mean Squared Error (MSE), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE). However, these metrics present challenges including sensitivity to outliers (notably MSE and RMSE) and scale dependency, which complicates comparisons across different models. Additionally, traditional metrics sometimes yield values that are difficult to interpret across various problems. Consequently, there is a need for a metric that consistently reflects regression model performance, independent of the problem domain, data scale, and outlier presence. To overcome these shortcomings, this paper introduces a new regression accuracy measure based on the Hassanat distance, a non-convex distance metric. This measure is not only invariant to outliers but also easy to interpret as it provides an accuracy-like value that ranges from 0 to 1 (or 0–100%). We validate the proposed metric against traditional measures across multiple benchmarks, demonstrating its robustness under various model scenarios and data types. Hence, we suggest it as a new standard for assessing regression models’ accuracy.

1. Introduction

Unlike classification models, which aim to find the relationship between independent variables and a categorical dependent variable, regression models predict a continuous dependent variable given one or more independent variables. Although regression analysis is widely used in machine learning research, there is no universal agreement on a single, unified standard metric for assessing regression outcomes [1,2,3].
Commonly used metrics include the mean squared error (MSE), its square root variant (RMSE), the mean absolute error (MAE), and its percentage variant (MAPE). While these metrics provide valuable insights, they share a common limitation: their values can range from zero to infinity, and a single value often fails to adequately reflect the effectiveness of a regression model relative to the distribution of the ground truth data. The limitations of these popular regression metrics, as outlined in Table 1 below, highlight the complexities involved in accurately evaluating regression models [1,4].
As one can see from Table 1, the most commonly used methods have limitations that affect their explainability and assessment of regression-based machine learning models.
The basic mathematical definition of regression is to find a function that maps the input features to the target variables. Let us call the input features X and the target variable Y. The purpose is to identify a function such that f ( X ) Y . The link between X and Y is frequently described as follows:
Y = f ( x ) + e
where e is the error or noise in the relationship. In assessing regression model performance, it is ideal to seek a value λ that encapsulates the model’s accuracy in a simple, bounded format (0–100%). A λ would ideally indicate a lower error and higher performance, and vice versa. Unfortunately, commonly used metrics like R 2 or adjusted R 2 do provide bounded values between 0 and 1, which can be interpreted as percentages. However, the number of features still influences these metrics, the scale of the data, and the presence of outliers. Thus, while these metrics are useful, they do not universally and independently measure performance without being affected by these factors. Although insightful, metrics such as MSE, RMSE, and MAPE are not bounded between 0 and 1, vary with data scale, and are sensitive to outliers, limiting their universal applicability and comparability across different datasets or models. Therefore, this study proposes a new bounded performance metric for assessing regression models based on the Hassanat distance.
In summary, the proposed metric is:
  • Outlier-robust: Outliers do not significantly affect the performance assessment.
  • Bounded: The result is bounded between 0 and 1, which is highly explainable as a model performance.
  • Interpretable: As being based on the well-known Hassanat distance metric [6,7,8].
  • The performance of regression models can be evaluated and assessed without the need to compare them to other methods.

2. Related Work

The application of regression-based machine learning models is widely recognized across various fields due to their versatility. These models are frequently discussed in the literature, not only in terms of the different types of regression models available but also with respect to the diverse metrics used to assess their performance. For example, Putri and colleagues [9] used various metrics such as R 2 , RMSE, and cross-validation to measure the accuracy of these models.
Also, Sreehari et al. [10] proposed a system for climate prediction. They claim that their model helps in selecting the appropriate decisions on crop yielding. Their model is assessed using the RMSE metric.
Narloch and others [11] used different RMSE values to evaluate their deep learning system, which aims to predict the compressive strength of cement samples based on microscopic images.
A similar approach was used in [12] to evaluate a deep learning system that tests cement-stabilized soils based on ultrasonic waves. RMSE and MHD were used to evaluate their model.
Even though utilizing these metrics to evaluate regression models is popular, the debate about using them for specific applications continues to be an ongoing discussion in the literature [13,14]. Some research papers suggest that R 2 is more informative than other metrics [15], while others argue that R 2 is not efficient in evaluating some models [16]. Also, in multi-regression problems, R 2 lacks interpretability [17].
Further, MAPE’s tendency towards low prediction makes it unsuitable for evaluating tasks with high predicted mistakes [18,19]. As a consequence, the symmetric mean absolute percentage error (SMAPE) [20] was proposed as an alternative to address some of the drawbacks associated with MAPE. Despite the lack of agreement on its best mathematical expression in publications, SMAPE is growing in popularity in the machine learning community due to its unique characteristics [21].
Moreno et al. showed that MAPE does not meet the validity criterion due to the distribution of the absolute percentage errors, which is usually skewed to the right, particularly in the presence of outliers. Therefore, they proposed an alternative measure based on the calculation of the Huber M-estimator to overcome such problems [22].
Plevris et al. [23] studied 14 regression-related metrics, including the most widely used, such as RMSE, MAE, and the R 2 , among others, providing their mathematical foundation, and discussing their characteristics, pros, cons, and limitations. However, none of these metrics were shown to be a perfect choice, as some of the metrics showed poor performance, while others exhibited good performance. Table 2 shows the investigated regression metrics.
As can be seen from the literature, there is a plethora of regression metrics; however, each has its own strengths and weaknesses. Most of these metrics do not have an upper bound, which makes judgment on the performance of a regression method challenging. Furthermore, most of them are affected by outliers, i.e., one extreme false prediction can drift the regression method assessed to a lower rank. Therefore, evaluation of the quality of uncertainty estimates in machine learning regression leaves a lot of room for improvement [24].
As an improvement, we propose a regression metric based on the Hassanat distance, which is bounded between 0 and 1 and, therefore, does not allow any extremely false prediction to dominate the assessment of the regression model. Here, we revisit the mean Hassanat distance (MHD), which measures performance on a scale from 0 to 1, with 0 indicating the highest performance and 1 indicating the lowest performance [12,25]. To have an accuracy-like measure, we propose the use of the mean Hassanat similarity percentage, which is bounded in the range [ 0 , 100 % ] .

3. Proposed Metric

The Hassanat distance metric is widely used for different machine learning applications [26,27], such as epilepsy diagnosis [28], alert systems [29], computing expectiles [30], content-based image retrieval [31,32], face detection/recognition [33], biometrics [34,35], cancer classification [36], fraudulent detection [37], validation of oversampling approaches [38,39], etc.
Moreover, many researchers have found that the Hassanat distance is very useful for various machine learning tasks, including but not limited to those mentioned in references [40,41,42,43,44,45,46,47,48,49,50,51].
The proposed regression measure is derived from the Hassanat distance, which is a non-convex distance metric given by Equation (2):
HasD ( X , Y ) = i = 1 m D ( x i , y i )
where D ( x i , y i ) is expressed in Equation (3):
D ( X , Y ) = 1 1 + min ( x i , y i ) 1 + max ( x i , y i ) if min ( x i , y i ) 0 , 1 1 + min ( x i , y i ) + | min ( x i , y i ) | 1 + max ( x i , y i ) + | min ( x i , y i ) | if min ( x i , y i ) < 0 .
As one can see from Equation (3), HasD is bounded by [0, 1]. It reaches 1 when the maximum value approaches ∞ assuming the minimum is , or when the minimum value approaches assuming the maximum is ∞ [6,7,8]. HasD between 0 and values in the range of [−10, 10] is visualized in Figure 1.
Also, HasD can be simplified mathematically to the expression in Equation (4).
D ( x , y ) = | x i y i | 1 + max ( x i , y i ) if min ( x i , y i ) 0 , | x i y i | 1 + max ( x i , y i ) + | min ( x i , y i ) | if min ( x i , y i ) < 0 .
Based on Equations (2)–(4), MHD between two vectors, i.e., actual and predicted, can be derived as presented in Equation (5):
MHD = i = 0 n 1 1 m i n ( x i , y i ) + 1 m a x ( x i , y i ) + 1 n
where n is the number of observations or the size of the test dataset, x is the vector of actual values, and y is the vector of predicted values provided by a regressor.
This distance was proved to be a metric, in which the distance will be in the range of 0 to 1, regardless of the difference between the two values. The distance is calculated in relation to the dimensionality of the tested vectors (actual and predicted). Therefore, increase in dimensions increases the distance linearly in the worst case.
The Hassanat similarity, denoted as HasS(x,y), quantifies the similarity between two vectors by comparing their respective components. This similarity measure is distinct in its consideration of both non-negative and negative values within the vectors, adapting the calculation based on the sign and magnitude of the components. The Hassanat similarity formula, as derived from HasD, is outlined in Equation (6), which encompasses two scenarios: Case 1 pertains to instances when the minimum of both measured values is zero or positive, while Case 2 addresses instances where the minimum of both measured values is negative.
HasS ( x , y ) = 1 + min ( x i , y i ) 1 + max ( x i , y i ) if min ( x i , y i ) 0 , 1 + min ( x i , y i ) + | min ( x i , y i ) | 1 + max ( x i , y i ) + | min ( x i , y i ) | if min ( x i , y i ) < 0 .
A classification accuracy-like value can be obtained by applying Equation (7), the Mean Hassanat Similarity Percentage (MHSP).
MHSP = HasS ( x , y ) n 100 %
A toy example to illustrate the result (predicted values) of different hypothetical models on synthetic actual targets is illustrated in Table 3 and Table 4.
As illustrated in Table 3, Model 1 nearly achieved accurate prediction results except for instance number 3, where it failed to predict a reasonable value (actual = 1, predicted = 30). Such discrepancies are common in machine learning and may be attributed to outliers in the feature values of instance 3. This value can be considered an outlier in the prediction results.
In comparison, Model 2 performed poorly relative to Model 1. However, it is worth mentioning that most common measures showed Model 2 to be the best-performing regressor, indicating their susceptibility to outliers like the one in instance 30. On the other hand, the Hassanat distance and its derivatives, MHD and the proposed MHSP, favored Model 1. This suggests a more reasonable decision, as these measures are not influenced by extreme values like the outlier (30). HasD returns values between 0 and 1 regardless of the difference between predicted and actual values, providing a more insightful evaluation of both models’ true performance.

4. Experiments Setup

To conduct our experiments and comparisons, we employed several regressors, namely, Ridge, Huber, Quantile, Theil–Sen, XGBoost, Random Forest (RF), and KNN. The experiments were conducted using six datasets publicly available on the UCI Machine Learning Repository.
The proposed metric is compared to several well-known evaluation metrics.

4.1. Common Metrics

The following well-known metrics are used as benchmarks to evaluate the effectiveness of our proposed metric:

4.1.1. Mean Squared Error (MSE)

MSE = 1 n i = 1 n ( x i y ^ i ) 2

4.1.2. Root Mean Squared Error (RMSE)

RMSE = 1 n i = 1 n ( x i y ^ i ) 2

4.1.3. Mean Absolute Error (MAE)

MAE = 1 n i = 1 n | x i y ^ i |

4.1.4. Mean Absolute Percentage Error (MAPE)

MAPE = 1 n i = 1 n x i y ^ i x i × 100 %

4.1.5. Explained Variance (EVS)

EVS = σ e 2 σ t 2
where σ e 2 : Variance explained by the regression model, and σ t 2 : Total variance in the data.

4.1.6. Coefficient of Determination R 2

R 2 = 1 RSS TSS
where
TSS = i = 1 n ( x i x ¯ ) 2
and
RSS = i = 1 n ( x i y ^ i ) 2
In the Formulas (8)–(15), x is a vector of the actual values, x ¯ is the mean of the actual values, and y ^ is the vector of predicted values.

5. Datasets

The proposed metric was tested using six publicly available datasets on the UCI Machine Learning Repository [52]. Table 5 shows the used dataset characteristics.
Each dataset has its own characteristics and distribution of feature values. However, our concern is focused on the dependent variable, as it counts when evaluating the performance of a regression method. Figure 2 shows the distribution of the dependent variables of all datasets.
A box and whisker plot for the dependent variable in all datasets is presented in Figure 3.

6. Results and Discussion

This section would be easier to understand if it is broken into three parts. As a result, we rearranged the section so that the most significant findings appear first, followed by a separate explanation of the remaining observations, and concluded with a discussion subsection.

6.1. Key Results

In order to investigate the use of the proposed metric for regression accuracy, we used the dataset shown in Table 5 to evaluate a number of common machine learning regression methods, namely, Ridge Regression (RG), Huber Regression (HR), Theil–Sen Regression (TSR), Quantile Regression (QR), RF, XGBoost (XGB), and K-nearest neighbor (KNN). We randomly split each dataset into two sets, training with 80% of the data, and testing with 20% of the remaining examples.
Table 6, Table 7, Table 8, Table 9, Table 10 and Table 11 show the regression results for each dataset, namely, Real Estate Valuation, Average Localization Error (ALE), Concrete Compressive Strength, Forest Fires, Combined Cycle Power Plant, and Abalone, respectively. The visualizations of the regression performance are depicted in Figure 4, Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9.

6.2. Additional Observations

After a careful look at Table 6, Table 7, Table 8, Table 9, Table 10 and Table 11, we can observe the following:
  • There is no universal measure that can be used to find the best performer since different measures disagree on the one best performer among the regressors tested. This is evident from Table 7 when using ALE data, Table 9 when using ALE data, and Table 11 when using Abalone data.
  • Some datasets were easy for regression, such as Real Estate, Concrete, and Combined Cycle Power Plant data, where Random Forest and XGBoost were the best performers, achieving 90%, 91%, and 99.5% MHSP, respectively. This is evident from Table 6, Table 8 and Table 10, where all measures used voted for one regressor. At the same time, there was no significant difference between the performance of different regressors. This indicates that these datasets are easy for regression tasks as their target variables are well distributed, lacking skewness, and outliers, as can be seen in Figure 2 and Figure 3.
  • Some measures obtained unconvincing very low values, such as R 2 on Forest Fires data using all regressors, or very high values, such as MAPE on the same dataset, which makes it hard for such measures to distinguish the best regressor.
  • Full agreement on the best regressor by all measures; this happened three times when using Random Forest on Real Estate Valuation data, XGBoost on Concrete data, and XGBoost on Combined Cycle Power Plant data. This includes the proposed MHSP, which not only voted for the same best classifier, but also provides an accuracy-like, stable, and solid measure.
  • In contrast to widely recognized measures such as MAE, MAEP, RMSE, and MSE, the proposed MHSP is bound within the range ( 0 , 1 ] , offering a more structured evaluation approach.
  • The proposed MHSP aligns with at least one common measure when determining the top regressor. Even in cases where it agrees with only a single measure, MHSP delivers a consistent accuracy of approximately 87% for all regressors supported by other measures. This demonstrates that MHSP is a reliable metric and can be utilized as a viable regression measure.
  • When designating the best performer as the candidate supported by the most metrics, MHSP played a crucial role in selecting this top-ranked model in four out of the six datasets under examination.
  • The concept of interpretability is illustrated through the outcomes displayed in all tables. It is often challenging to understand the significance of typical regression metrics, such as what specific value denotes superior, inferior, or average performance from a regressor. There is no unanimous consensus on defining these values because traditional measures are unable to provide clear definitions. However, the MHSP metric offers a classification accuracy-like assessment, with 100% indicating a flawless regressor devoid of any errors and values around 0% signifying an inaccurate one. The intermediate percentages attempt to depict the actual performance of a regressor. Consequently, it is no longer necessary to compare a regressor against another for interpreting MHSP results. Even when evaluating a single-regression system, MHSP can be employed without having to make comparisons with other regressors, particularly when the percentage achieved is high and satisfactory.
  • Some measures provide unexpected values, such as EVS = 0, R 2 with very small values for all regressors, or MAPE with very high values; see Table 9. This makes such measures incapable of voting for the best regressor, while MHSP and MHD are more stable and provide a distinctive difference suggesting the best performer.
  • Both MHD and MHSP vote for the same best performer in all cases, this is due to the fact that both measures share the same properties of the Hassanat distance metric, as both are derived from the same source.

6.3. Discussion

The common regression measures demonstrated inconsistencies in identifying the most effective regressor in three specific datasets—ALE, Forest Fires, and the Abalone dataset. Upon examining Figure 2, it is evident that these datasets possess a left-skewed dependent variable, implying smaller values, which could potentially impact the reliability of these metrics. Moreover, Figure 2 reveals the presence of numerous outliers in these datasets, another factor that might affect traditional regression measures.
In contrast, the proposed MHSP is unaffected by variations in value sizes and is also immune to the influence of outliers, owing to its foundation in the Hassanat distance. This characteristic of the Hassanat distance is manifested in the computation of MHSP and MHD, ensuring that these factors do not impact both measures.
Although it is not a common practice to consider the best regressor as the one that best fits the line with s l o p e = 1 in the actual vs. predicted plot, we used this approach just for visualization purposes, as demonstrated in Figure 4, Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9.
Notably, the proposed MHSP voted for the top performer three times out of six, which can be observed in Figure 4, Figure 5 and Figure 8. It is also important to mention that even when it did not vote for the best regressor in Figure 6, Figure 7 and Figure 9, the MHSP still selected a reasonable regressor, as evidenced by the scatter points representing XGBoost, Quantile Regression, and Random Forest.
It is worth noting that all the previous measures are capable of estimating only a single-valued prediction; uncertainty measures, such as Negative Log-Likelihood (NLL) and the Continuous Ranked Probability Score (CRPS), are commonly used to evaluate the confidence of the prediction [53,54].
The average of NLL across all observations can be expressed as follows [55]:
NLL = 1 n i = 1 n log ( p ( x i | θ ) )
where x i represents the i-th data point, n is the total number of data points, and θ denotes the model parameters. The equation calculates the negative log-likelihood by summing over each data point, computing the natural logarithm (log) of the likelihood p ( x i | θ ) for that specific data point. The likelihood measures how well a given model fits the observed data. A smaller value of the NLL indicates a better fit between the model and data. Table 12 shows the results of NLL for all regressors on all datasets tested [55,56].
A comparison of Table 12 reveals that the proposed MHSP method identifies three top-performing models based on the NLL criterion, while the other measures ( R 2 , MSE, RMSE, EVS, and MAPE) identify four. However, it is interesting to note that all of R 2 , MSE, RMSE, and EVS voted for the same regressor on all datasets examined; this is due to the fact that they share almost the same characteristics, as they all depend on the variance between predicted and actual data. Nonetheless, upon examining the last row of the table, it is apparent that there is no substantial disparity between the MHSP accuracy for the best performer according to NLL and its highest accuracy for each dataset for each regressor. This suggests that the MHSP provides a reliable measure when assessed via the NLL uncertainty metric. However, the Forest Fires dataset presents an exception due to its complexity, presence of zero actual values, outliers, and left-skewed distribution indicating small and nearly identical numbers.
The Continuous Ranked Probability Score (CRPS) is a performance measure used for probabilistic forecasts. It can be expressed as follows:
CRPS = ( F ( x ) H ( x ) ) 2 d x
where F ( x ) is the cumulative distribution function (CDF) of the forecast, and H ( x ) represents the Heaviside step function, which equals 1 for x 0 and 0 otherwise. This equation calculates the squared difference between the CDF of the forecast and the Heaviside step function over the entire range of possible values ( x ). A smaller value of CRPS indicates a better forecast performance, as it measures how close the predicted distribution is to the actual observed data. Table 13 shows the results of CRPS for all regressors on all datasets tested [57].
Similar to the previous analysis, a comparison of Table 13 demonstrates that the proposed MHSP method identifies two top-performing models based on the CRPS criterion, while the other measures ( R 2 , MSE, RMSE, and EVS) identify three. However, these measures agree on the same regressor in all tested cases. Nonetheless, upon examining the last row of the table, it is evident that there is no significant divergence between the MHSP accuracy for the best performer according to CRPS and its highest accuracy for each dataset for each regressor. This implies that the MHSP offers a reliable measure when assessed via the CRPS uncertainty metric. Nonetheless, like before, the Forest Fires dataset serves as an exception due to its complexity, presence of zero actual values, outliers, and left-skewed distribution signifying small and nearly identical numbers, which negatively impact all regressors’ performance.
Consistent with the prior analysis and graphical representations, it is postulated that the proposed MHSP regression measure furnishes a dependable, stable, outlier-insensitive, bounded by ( 0 % , 100 % ] , well-defined measure that mimics the classification accuracy for diverse regression techniques in machine learning. Additionally, similar sentiments apply to the MHD as it stems from the same source; however, this metric becomes less interpretable since its lowest value denotes the best performance, unlike the clarity provided by the percentage-based accuracy offered by the MHSP. It is worth mentioning that the range interval of MHSP is left open due to the fact that its value approaches 0 only when a predicted value or an actual value is approaching infinity, as explained in Equations (4)–(6).
A final note on MHSP being similar to the classification accuracy measure: While common regression measures like MAE, MSE, RMSE, etc. are frequently referred to as accuracy measures for machine learning regression models, they actually quantify errors in these models and do not fully capture the true accuracy of a tested model. On the other hand, the accuracy of a classification model is precisely defined as the ratio of correctly classified instances to the total number of instances in the test dataset as follows:
Accuracy = T P + T N n
where T P represents true positives, T N represents true negatives, i.e., both T P and T N can be considered as the total number of the correctly classified examples, and n is the number of observations or the size of the test dataset.
Recall Equation (7), which represents the HasS metric. This measure provides the number of correctly predicted examples, either completely, if there were no errors, or proportionally, based on the similarity between predicted values and actual ones. To obtain the MHSP, we divide this resulting number (HasS) by (the total number of instances in the test dataset). Hence, the MHSP serves as an analog to the classification accuracy definition, which is defined as the ratio of correctly classified instances to the total number of instances in the test dataset. Therefore, understanding and applying appropriate evaluation metrics like MHSP for regression problems helps ensure accurate model assessment and selection based on specific problem requirements.

7. Conclusions

This study involves examining existing machine learning regression measures, which are known to possess certain drawbacks, such as being influenced by outliers, presenting difficulties in terms of interpretation, depending on scale, exhibiting sensitivity to near-zero values, resulting in an infinite percentage of errors, lacking a specific range, and not having a universally accepted definition for evaluating the performance of a regressor. As a result, this research aims to explore MHD and MHSP as potential alternatives to assess the strengths or weaknesses of a regression model.
In order to assess various regression methods, we employed multiple public real-world datasets commonly utilized for regression analysis. These datasets were then evaluated using well-known regression techniques. To compare and contrast the results, we examined several common measures, including the MHD and the proposed MHSP. Additionally, we assessed MHSP against two other uncertainty metrics.
Our experimental findings reveal that the MHSP serves as a dependable, stable, interpretable, outlier-insensitive, and well-defined regression measure with values in the range of ( 0 % , 100 % ] . Importantly, this measure demonstrates similar characteristics to the well-known classification accuracy for a diverse array of machine learning regression techniques.
These findings highlight the potential of MHSP as a promising alternative evaluation metric for machine learning regression models. Building upon these encouraging results and relevant features, we propose that MHSP has the potential to serve effectively not only as a metric for evaluating regression problems but also for forecasting time series issues. However, it is essential to emphasize that while these findings are promising, further research and comprehensive studies are necessary to fully explore and validate this potential application of MHSP in time series forecasting. This will help establish its effectiveness as an evaluation tool and contribute to the ongoing advancements in machine learning methodologies for tackling various types of problems, including forecasting.

Author Contributions

Conceptualization, A.B.H., M.K.A., A.S.T. and G.A.A.; methodology, A.B.H., M.K.A., A.S.T. and R.A.; software, M.K.A., A.S.T., K.A., M.A. and A.A.; validation, M.K.A., A.S.T., K.A., M.A. and A.A.; formal analysis, K.A., M.A., A.A., G.A.A. and R.A.; investigation, A.B.H., M.K.A. and A.S.T.; resources, K.A., M.A., A.A., G.A.A. and R.A.; data curation, K.A., M.A., A.A., G.A.A. and R.A.; writing—original draft preparation, A.B.H., M.K.A., A.S.T., K.A., M.A., A.A., G.A.A. and R.A.; writing—review and editing, A.B.H., M.K.A., A.S.T., K.A., M.A., A.A., G.A.A. and R.A.; visualization, K.A., M.A. and A.A.; supervision, A.B.H.; project administration, A.B.H. and A.S.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are openly available in the UCI Machine Learning Repository at https://archive.ics.uci.edu/ (accessed on 24 October 2024).

Acknowledgments

We genuinely appreciate the reviewers’ voluntary efforts and are grateful for their valuable insights.

Conflicts of Interest

The authors have no conflicts of interest to declare. All co-authors have seen and agree with the manuscript’s contents and there is no financial interest to report. We certify that the submission is original work and is not under review at any other publication.

References

  1. Shcherbakov, M.V.; Brebels, A.; Shcherbakova, N.L.; Tyukov, A.P.; Janovsky, T.A.; Kamaev, V.A. A survey of forecast error measures. World Appl. Sci. J. 2013, 24, 171–176. [Google Scholar]
  2. Mentaschi, L.; Besio, G.; Cassola, F.; Mazzino, A. Problems in RMSE-based wave model validations. Ocean. Model. 2013, 72, 53–58. [Google Scholar] [CrossRef]
  3. Davydenko, A.; Fildes, R. Forecast error measures: Critical review and practical recommendations. Bus. Forecast. Pract. Probl. Solut. 2016, 34, 1–12. [Google Scholar]
  4. Tanni, S.E.; Patino, C.M.; Ferreira, J.C. Correlation vs. regression in association studies. J. Bras. Pneumol. 2020, 46, e20200030. [Google Scholar] [CrossRef]
  5. He, C.; Ma, M.; Wang, P. Extract interpretability-accuracy balanced rules from artificial neural networks: A review. Neurocomputing 2020, 387, 346–358. [Google Scholar] [CrossRef]
  6. Hassanat, A.B. Dimensionality invariant similarity measure. arXiv 2014, arXiv:1409.0923. [Google Scholar]
  7. Abu Alfeilat, H.A.; Hassanat, A.B.; Lasassmeh, O.; Tarawneh, A.S.; Alhasanat, M.B.; Eyal Salman, H.S.; Prasath, V.S. Effects of distance measure choice on k-nearest neighbor classifier performance: A review. Big Data 2019, 7, 221–248. [Google Scholar] [CrossRef]
  8. Hassanat, A.; Alkafaween, E.; Tarawneh, A.S.; Elmougy, S. Applications review of hassanat distance metric. In Proceedings of the 2022 International Conference on Emerging Trends in Computing and Engineering Applications (ETCEA), Karak, Jordan, 23–24 November 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar]
  9. Putri, M.R.; Wijaya, I.G.P.S.; Praja, F.P.A.; Hadi, A.; Hamami, F. The Comparison Study of Regression Models (Multiple Linear Regression, Ridge, Lasso, Random Forest, and Polynomial Regression) for House Price Prediction in West Nusa Tenggara. In Proceedings of the 2023 International Conference on Advancement in Data Science, E-learning and Information System (ICADEIS), Bali, Indonesia, 2–3 August 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar]
  10. Sreehari, E.; Srivastava, S. Prediction of climate variable using multiple linear regression. In Proceedings of the 2018 4th International Conference on Computing Communication and Automation (ICCCA), Greater Noida, India, 14–15 December 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–4. [Google Scholar]
  11. Narloch, P.; Hassanat, A.; Tarawneh, A.S.; Anysz, H.; Kotowski, J.; Almohammadi, K. Predicting compressive strength of cement-stabilized rammed earth based on SEM images using computer vision and deep learning. Appl. Sci. 2019, 9, 5131. [Google Scholar] [CrossRef]
  12. Kozubal, J.V.; Kania, T.; Tarawneh, A.S.; Hassanat, A.; Lawal, R. Ultrasonic assessment of cement-stabilized soils: Deep learning experimental results. Measurement 2023, 223, 113793. [Google Scholar] [CrossRef]
  13. Chai, T.; Draxler, R.R. Root mean squared error (RMSE) or mean absolute error (MAE)?—Arguments against avoiding RMSE in the literature. Geosci. Model Dev. 2014, 7, 1247–1250. [Google Scholar] [CrossRef]
  14. Hodson, T.O. Root mean square error (RMSE) or mean absolute error (MAE): When to use them or not. Geosci. Model Dev. Discuss. 2022, 2022, 1–10. [Google Scholar] [CrossRef]
  15. Chicco, D.; Warrens, M.J.; Jurman, G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. Peerj Comput. Sci. 2021, 7, e623. [Google Scholar] [CrossRef]
  16. Nakagawa, S.; Johnson, P.C.; Schielzeth, H. The coefficient of determination R 2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded. J. R. Soc. Interface 2017, 14, 20170213. [Google Scholar] [CrossRef]
  17. Schielzeth, H. Simple means to improve the interpretability of regression coefficients. Methods Ecol. Evol. 2010, 1, 103–113. [Google Scholar] [CrossRef]
  18. De Myttenaere, A.; Golden, B.; Le Grand, B.; Rossi, F. Mean absolute percentage error for regression models. Neurocomputing 2016, 192, 38–48. [Google Scholar] [CrossRef]
  19. Hyndman, R. Measuring forecast accuracy. In Business Forecasting: Practical Problems and Solutions; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
  20. Hyndman, R.J.; Koehler, A.B. Another look at measures of forecast accuracy. Int. J. Forecast. 2006, 22, 679–688. [Google Scholar] [CrossRef]
  21. Kreinovich, V.; Nguyen, H.T.; Ouncharoen, R. How to Estimate Forecasting Quality: A System-Motivated Derivation of Symmetric Mean Absolute Percentage Error (SMAPE) and Other Similar Characteristics; Technical Report UTEP-CS-14-53; The University of Texas: El Paso, TX, USA, 2014. [Google Scholar]
  22. Moreno, J.J.M.; Pol, A.P.; Abad, A.S.; Blasco, B.C. Using the R-MAPE index as a resistant measure of forecast accuracy. Psicothema 2013, 25, 500–506. [Google Scholar] [CrossRef] [PubMed]
  23. Plevris, V.; Solorzano, G.; Bakas, N.P.; Ben Seghier, M.E.A. Investigation of performance metrics in regression analysis and machine learning-based prediction models. In Proceedings of the 8th European Congress on Computational Methods in Applied Sciences and Engineering (ECCOMAS Congress 2022), Oslo, Norway, 5–9 June 2022; European Community on Computational Methods in Applied Sciences: Barcelona, Spain, 2022. [Google Scholar]
  24. Sluijterman, L.; Cator, E.; Heskes, T. How to evaluate uncertainty estimates in machine learning for regression? Neural Netw. 2024, 173, 106203. [Google Scholar]
  25. Cao, C.; Bao, Y.; Shi, Q.; Shen, Q. Dynamic Spatiotemporal Correlation Graph Convolutional Network for Traffic Speed Prediction. Symmetry 2024, 16, 308. [Google Scholar] [CrossRef]
  26. Karabulut, B.; Arslan, G.; Ünver, H.M. A weighted similarity measure for k-nearest neighbors algorithm. Celal Bayar Univ. J. Sci. 2019, 15, 393–400. [Google Scholar] [CrossRef]
  27. Kim, M.; Kim, Y.; Kim, H.; Piao, W.; Kim, C. Evaluation of the k-nearest neighbor method for forecasting the influent characteristics of wastewater treatment plant. Front. Environ. Sci. Eng. 2016, 10, 299–310. [Google Scholar] [CrossRef]
  28. Na, J.; Wang, Z.; Lv, S.; Xu, Z. An extended K nearest neighbors-based classifier for epilepsy diagnosis. IEEE Access 2021, 9, 73910–73923. [Google Scholar] [CrossRef]
  29. Veerachamy, R.; Ramar, R. Agricultural Irrigation Recommendation and Alert (AIRA) system using optimization and machine learning in Hadoop for sustainable agriculture. Environ. Sci. Pollut. Res. 2022, 29, 19955–19974. [Google Scholar] [CrossRef] [PubMed]
  30. Farooq, M.; Sarfraz, S.; Chesneau, C.; Ul Hassan, M.; Raza, M.A.; Sherwani, R.A.K.; Jamal, F. Computing expectiles using k-nearest neighbours approach. Symmetry 2021, 13, 645. [Google Scholar] [CrossRef]
  31. Tarawneh, A.S.; Celik, C.; Hassanat, A.B.; Chetverikov, D. Detailed investigation of deep features with sparse representation and dimensionality reduction in cbir: A comparative study. Intell. Data Anal. 2020, 24, 47–68. [Google Scholar] [CrossRef]
  32. Biswas, R.; Roy, S.; Biswas, A. Triplet Contents based Medical Image Retrieval System for Lung Nodules CT Images Retrieval and Recognition Application. Int. J. Eng. Adv. Technol. (IJEAT) 2019, 8, 3132–3143. [Google Scholar] [CrossRef]
  33. Nasiri, E.; Milanova, M.; Nasiri, A. Masked Face Detection Using Artificial Intelligent Techniques. In Proceedings of the New Approaches for Multidimensional Signal Processing: Proceedings of International Workshop, NAMSP 2021, Sofia, Bulgaria, 8–10 July 2021; Springer: Berlin/Heidelberg, Germany, 2022; pp. 3–34. [Google Scholar]
  34. Hassanat, A.B.A.; Btoush, E.; Abbadi, M.A.; Al-Mahadeen, B.M.; Al-Awadi, M.; Mseidein, K.I.A.; Almseden, A.M.; Tarawneh, A.S.; Alhasanat, M.B.; Prasath, V.B.S.; et al. Victory Sign Biometric for Terrorists Identification: Preliminary Results, Presentation. In Proceedings of the 2017 8th International Conference on Information and Communication Systems, Irbid, Jordan, 4–6 April 2017. [Google Scholar]
  35. Hassanat, A.B. On identifying terrorists using their victory signs. Data Sci. J. 2018, 17, 27. [Google Scholar] [CrossRef]
  36. Ehsani, R.; Drabløs, F. Robust distance measures for kNN classification of cancer data. Cancer Inform. 2020, 19, 1176935120965542. [Google Scholar] [CrossRef]
  37. Stout, A. Fine-Tuning a k-Nearest Neighbors Machine Learning Model for the Detection of Insurance Fraud. Honors Thesis. 2022. Available online: https://aquila.usm.edu/honors_theses/863/ (accessed on 20 June 2024).
  38. Rezvani, S.; Wang, X. A broad review on class imbalance learning techniques. Appl. Soft Comput. 2023, 143, 110415. [Google Scholar] [CrossRef]
  39. Hassanat, A.; Altarawneh, G.; Alkhawaldeh, I.M.; Alabdallat, Y.J.; Atiya, A.F.; Abujaber, A.; Tarawneh, A.S. The jeopardy of learning from over-sampled class-imbalanced medical datasets. In Proceedings of the 2023 IEEE Symposium on Computers and Communications (ISCC), Gammarth, Tunisia, 9–12 July 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–7. [Google Scholar]
  40. Anwar, M.; Hellwich, O. An Embedded Neural Network Approach for Reinforcing Deep Learning: Advancing Hand Gesture Recognition. J. Univ. Comput. Sci. 2024, 30, 957. [Google Scholar]
  41. Al-Nuaimi, D.H.; Isa, N.A.M.; Akbar, M.F.; Abidin, I.S.Z. Amc2-pyramid: Intelligent pyramidal feature engineering and multi-distance decision making for automatic multi-carrier modulation classification. IEEE Access 2021, 9, 137560–137583. [Google Scholar] [CrossRef]
  42. Kancharla, C.R.; Vankeirsbilck, J.; Vanoost, D.; Boydens, J.; Hallez, H. Latent dimensions of auto-encoder as robust features for inter-conditional bearing fault diagnosis. Appl. Sci. 2022, 12, 965. [Google Scholar] [CrossRef]
  43. Özarı, Ç.; Can, E.N.; Alıcı, A. Forecasting sustainable development level of selected Asian countries using M-EDAS and k-NN algorithm. Int. J. Soc. Sci. Educ. Res. 2023, 9, 101–112. [Google Scholar] [CrossRef]
  44. Kartal, E.; Çalışkan, F.; Eskişehirli, B.B.; Özen, Z. p-adic distance and k-Nearest Neighbor classification. Neurocomputing 2024, 578, 127400. [Google Scholar] [CrossRef]
  45. Nasiri, E.; Milanova, M.; Nasiri, A. Video Surveillance Framework Based on Real-Time Face Mask Detection and Recognition. In Proceedings of the 2021 International Conference on INnovations in Intelligent SysTems and Applications (INISTA), Kocaeli, Turkey, 25–27 August 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–7. [Google Scholar]
  46. Begovic, M.; Causevic, S.; Memic, B.; Haskovic, A. AI-aided traffic differentiated QoS routing and dynamic offloading in distributed fragmentation optimized SDN-IoT. Int. J. Eng. Res. Technol. 2020, 13, 1880–1895. [Google Scholar] [CrossRef]
  47. Alkanhel, R.; Chaaf, A.; Samee, N.A.; Alohali, M.A.; Muthanna, M.S.A.; Poluektov, D.; Muthanna, A. Dedg: Cluster-based delay and energy-aware data gathering in 3d-uwsn with optimal movement of multi-auv. Drones 2022, 6, 283. [Google Scholar] [CrossRef]
  48. Hase, V.J.; Bhalerao, Y.J.; Verma, S.; Wakchaure, V.; Vikhe, G. Intelligent threshold prediction in hybrid mesh segmentation using machine learning classifiers. Int. J. Manag. Technol. Eng. 2018, 8, 1426–1442. [Google Scholar]
  49. Uddin, S.; Haque, I.; Lu, H.; Moni, M.A.; Gide, E. Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction. Sci. Rep. 2022, 12, 6256. [Google Scholar] [CrossRef]
  50. Jiřina, M.; Krayem, S. The Distance Function Optimization for the Near Neighbors-Based Classifiers. ACM Trans. Knowl. Discov. Data (TKDD) 2022, 16, 1–21. [Google Scholar] [CrossRef]
  51. Hofer, E.; v. Mohrenschildt, M. Locally-Scaled Kernels and Confidence Voting. Mach. Learn. Knowl. Extr. 2024, 6, 1126–1144. [Google Scholar] [CrossRef]
  52. Kelly, M.; Longjohn, R.; Nottingham, K. The UCI Machine Learning Repository. 2024. Available online: https://archive.ics.uci.edu/ (accessed on 24 October 2024).
  53. Gebetsberger, M.; Messner, J.W.; Mayr, G.J.; Zeileis, A. Estimation methods for nonhomogeneous regression models: Minimum continuous ranked probability score versus maximum likelihood. Mon. Weather. Rev. 2018, 146, 4323–4338. [Google Scholar] [CrossRef]
  54. Gouttes, A.; Rasul, K.; Koren, M.; Stephan, J.; Naghibi, T. Probabilistic time series forecasting with implicit quantile networks. arXiv 2021, arXiv:2107.03743. [Google Scholar]
  55. Boyko, J.D.; O’Meara, B.C. Dentist: Quantifying uncertainty by sampling points around maximum likelihood estimates. Methods Ecol. Evol. 2024, 15, 628–638. [Google Scholar] [CrossRef]
  56. Maddox, W.J.; Izmailov, P.; Garipov, T.; Vetrov, D.P.; Wilson, A.G. A simple baseline for bayesian uncertainty in deep learning. Adv. Neural Inf. Process. Syst. 2019, 32, 13153–13164. [Google Scholar]
  57. Hersbach, H. Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather. Forecast. 2000, 15, 559–570. [Google Scholar] [CrossRef]
Figure 1. HasD visualization, D ( 0 , Y ) for Y in the range [−10, 10].
Figure 1. HasD visualization, D ( 0 , Y ) for Y in the range [−10, 10].
Mathematics 12 03623 g001
Figure 2. The distribution of the dependent variable in all datasets.
Figure 2. The distribution of the dependent variable in all datasets.
Mathematics 12 03623 g002
Figure 3. A box and whisker plot for the dependent variable in all datasets.
Figure 3. A box and whisker plot for the dependent variable in all datasets.
Mathematics 12 03623 g003
Figure 4. Regression performance on Real Estate Valuation data, visualizing actual vs. predicted values highlighting the regressor that best fits the line with slope = 1.
Figure 4. Regression performance on Real Estate Valuation data, visualizing actual vs. predicted values highlighting the regressor that best fits the line with slope = 1.
Mathematics 12 03623 g004
Figure 5. Regression performance on ALE data, visualizing actual vs. predicted values highlighting the regressor that best fits the line with slope = 1.
Figure 5. Regression performance on ALE data, visualizing actual vs. predicted values highlighting the regressor that best fits the line with slope = 1.
Mathematics 12 03623 g005
Figure 6. Regression performance on Concrete Compressive Strength data, visualizing actual vs. predicted values highlighting the regressor that best fits the line with slope = 1.
Figure 6. Regression performance on Concrete Compressive Strength data, visualizing actual vs. predicted values highlighting the regressor that best fits the line with slope = 1.
Mathematics 12 03623 g006
Figure 7. Regression performance on Forest Fires data, visualizing actual vs. predicted values highlighting the regressor that best fits the line with slope = 1.
Figure 7. Regression performance on Forest Fires data, visualizing actual vs. predicted values highlighting the regressor that best fits the line with slope = 1.
Mathematics 12 03623 g007
Figure 8. Regression performance on Combined Cycle Power Plant data, visualizing actual vs. predicted values highlighting the regressor that best fits the line with slope = 1.
Figure 8. Regression performance on Combined Cycle Power Plant data, visualizing actual vs. predicted values highlighting the regressor that best fits the line with slope = 1.
Mathematics 12 03623 g008
Figure 9. Regression performance on Abalone data, visualizing actual vs. predicted values highlighting the regressor that best fits the line with slope = 1.
Figure 9. Regression performance on Abalone data, visualizing actual vs. predicted values highlighting the regressor that best fits the line with slope = 1.
Mathematics 12 03623 g009
Table 1. The most popular regression metrics and some of their limitations.
Table 1. The most popular regression metrics and some of their limitations.
Regression MetricsLimitations
Mean Squared Error (MSE)Sensitive to outliers,
lack of interpretability,
scale dependence.
Mean Absolute Error (MAE)Never differentiable at zero; can be a restriction in some,
derivative-based optimization strategies,
scale dependence [1].
Root Mean Squared Error (RMSE)Sensitive to outliers,
scale dependence [2].
Mean Absolute Percentage Error (MAPE)Sensitivity to zero values: when the actual values are
zero or near zero, MAPE can produce undefined or
endless percentage errors,
scale dependence [3]
R-squared ( R 2 )Sensitive to outliers,
affected by the sample size: might give misleading results,
cannot distinguish between linear and non-linear relationships,
no universal definition for the strength of a correlation [4,5]
Table 2. Regression metrics, their range, and perfect match (target) value [23].
Table 2. Regression metrics, their range, and perfect match (target) value [23].
Regression MetricRangePerfect Match
Mean Bias [ , + ]0
Mean Absolute Gross Error [ 0 , + ]0
Root Mean Squared Error [ 0 , + ]0
Centered Root Mean Squared Difference [ 0 , + ]0
Mean Normalized Bias [ 1 , + ]0
Mean Normalized Gross Error [ 0 , + ]0
Normalized Mean Bias [ 1 , + ]0
Normalized Mean Error [ 0 , + ]0
Fractional Bias [ 2 , 2 ]0
Fractional Gross Error [ 0 , 2 ]0
Theil’s UI [ 0 , 1 ]0
Index of Agreement [ 0 , 1 ]1
Pearson Correlation Coefficient R 2 [ 1 , 1 ]1
Variance Accounted For [ , 1 ]1
Table 3. Hypothetical data actual and prediction results of two imaginary models.
Table 3. Hypothetical data actual and prediction results of two imaginary models.
InstancesMODEL1MODEL2
Actual Y Predicted X Actual Y Predicted X
111.514
21.111.13
313015
42223
52.322.36
61.62.11.62.1
733.233.2
81.61.21.61.2
92.532.53
1011.411.4
Table 4. Results of different metrics on the data in Table 3.
Table 4. Results of different metrics on the data in Table 3.
MeasureMODEL1MODEL2
Mean actual Y1.711.71
Mean predicted X4.741.71
HasD1.9284343.1746603
MAE3.191.56
MAPE3.0950511.2065135
SMAPE3.2252.45
RMSE9.17722.1014281
MSE84.2214.416
R 2 −180.942−8.5398574
MHD0.1928430.317466
MHSP80.72%68.25%
Table 5. Datasets names and characteristics.
Table 5. Datasets names and characteristics.
Dataset# of Observations# of FeaturesTarget
Real Estate Valuation4146Price
Average Localization Error (ALE)1074ALE
Concrete Compressive Strength10308Compr. Strength
Forest Fires51712Area
Combined Cycle Power Plant95684EP
Abalone41778Rings
Table 6. Regression performance on Real Estate Valuation data, Bold font indicates the best performer.
Table 6. Regression performance on Real Estate Valuation data, Bold font indicates the best performer.
Regression R 2 MAEMSERMSEMAPEEVSMHDMHSP %
Ridge Regression0.6565.58457.6737.5940.1850.6560.14985.061
Huber Regression0.6385.57760.6637.7890.1850.6390.15085.039
Theil–Sen Regression0.6065.74066.0828.1290.2080.6320.17682.438
Quantile Regression0.5586.55874.2048.6140.2230.5580.17182.934
Random Forest0.8103.86731.8595.6440.1200.8110.09990.103
XGBoost0.7963.94734.1705.8450.1240.7960.10389.698
KNN0.6116.10365.1798.0730.1860.6310.14685.415
Table 7. Regression performance on ALE data, Bold font indicates the best performer.
Table 7. Regression performance on ALE data, Bold font indicates the best performer.
Regression R 2 MAEMSERMSEMAPEEVSMHDMHSP %
Ridge Regression0.5680.1060.0230.1532.0330.5680.07192.895
Huber Regression0.5530.0980.0240.1551.0500.5560.06593.550
Theil–Sen Regression0.5590.1040.0240.1540.8920.5590.06993.149
Quantile Regression0.3370.1150.0360.1891.1130.3550.07592.549
Random Forest0.4800.1230.0280.1671.4600.4930.08191.877
XGBoost0.4180.1010.0310.1771.2490.4370.06393.697
KNN0.5200.1020.0260.1611.3040.5400.06793.277
Table 8. Regression performance on Concrete Compressive Strength data, Bold font indicates the best performer.
Table 8. Regression performance on Concrete Compressive Strength data, Bold font indicates the best performer.
Regression R 2 MAEMSERMSEMAPEEVSMHDMHSP %
Ridge Regression0.6287.74695.9719.7960.2930.6280.19680.423
Huber Regression0.5597.808113.67010.6620.2790.5620.18881.228
Theil–Sen Regression−0.1209.561288.58016.9880.301−0.0680.19380.725
Quantile Regression0.5707.970110.90010.5310.3070.5730.19780.308
Random Forest0.8843.73629.8545.4640.1230.8870.10389.662
XGBoost0.9182.99621.2184.6060.1000.9190.08691.393
KNN0.7376.44267.7628.2320.2380.7430.17182.907
Table 9. Regression performance on Forest Fires data, Bold font indicates the best performer.
Table 9. Regression performance on Forest Fires data, Bold font indicates the best performer.
Regression R 2 MAEMSERMSEMAPEEVSMHDMHSP %
Ridge Regression0.00424.39911,740.330108.353 2.39 × 10 16 0.0110.72127.884
Huber Regression−0.02619.65912,089.042109.950 3.96 × 10 15 0.0010.59240.769
Theil–Sen Regression−0.01720.40711,983.918109.471 9.19 × 10 15 0.0030.65734.316
Quantile Regression−0.03019.62112,145.666110.207 1 . 30 × 10 15 0.0000.50949.084
Random Forest−0.01527.01111,960.393109.364 3.00 × 10 16 −0.0120.74325.699
XGBoost−0.06526.84612,554.664112.048 3.35 × 10 16 −0.0600.70229.763
KNN−0.00826.13611,886.223109.024 2.56 × 10 16 −0.0030.69430.640
Table 10. Regression performance on Combined Cycle Power Plant data, Bold font indicates the best performer.
Table 10. Regression performance on Combined Cycle Power Plant data, Bold font indicates the best performer.
Regression R 2 MAEMSERMSEMAPEEVSMHDMHSP %
Ridge Regression0.9313.54319.6084.4280.0080.9320.00899.225
Huber Regression0.9153.92424.1904.9180.0090.9160.00999.145
Theil–Sen Regression0.9323.52719.5074.4170.0080.9320.00899.228
Quantile Regression0.9024.24928.1715.3080.0090.9020.00999.075
Random Forest0.9642.26010.1593.1870.0050.9650.00599.506
XGBoost0.9672.2049.3863.0640.0050.9670.00599.517
KNN0.9472.87715.1153.8880.0060.9470.00699.371
Table 11. Regression performance on Abalone data, Bold font indicates the best performer.
Table 11. Regression performance on Abalone data, Bold font indicates the best performer.
Regression R 2 MAEMSERMSEMAPEEVSMHDMHSP %
Ridge Regression0.5391.6114.9902.2340.1630.5390.12987.120
Huber Regression0.5331.5665.0512.2480.1520.5420.12487.573
Theil–Sen Regression0.5361.6055.0262.2420.1610.5380.12987.132
Quantile Regression−0.0732.37811.6173.4080.2470.0000.18381.656
Random Forest0.5311.5815.0732.2520.1550.5320.12387.665
XGBoost0.4711.6555.7212.3920.1630.4720.12887.168
KNN0.5261.5565.1302.2650.1510.5280.12287.841
Table 12. Regression uncertainty evaluation results using NLL for all regressors on all datasets, Bold font indicates the best performer.
Table 12. Regression uncertainty evaluation results using NLL for all regressors on all datasets, Bold font indicates the best performer.
DatasetAbaloneForest FiresReal EstateALECCPPConcrete
Ridge Regression1.6663.6072.2761.1491.9922.407
Huber Regression1.7283.6102.2971.3352.0502.439
Theil–Sen Regression1.6953.6102.3270.9501.9842.665
Quantile Regression1.9433.6112.3421.4362.0942.433
Random Forest1.6593.6132.1181.9141.8372.138
XGBoost1.6933.6242.1491.6261.8172.057
KNN1.7003.6112.2671.1501.9362.332
MSHP Voted forKNNQR✓RFXGB✓XGB✓XGB
Others Voted fornoneR, MSE, RMSE, EVSallMAPEallall
MHSP of Best %88/8828/4990/9093/9499/9991/ 91
Table 13. Regression uncertainty evaluation results using CRPS for all regressors on all datasets, Bold font indicates the best performer.
Table 13. Regression uncertainty evaluation results using CRPS for all regressors on all datasets, Bold font indicates the best performer.
DatasetAbaloneForest FiresReal EstateALECCPPConcrete
Ridge Regression0.0192.766−0.050−0.0240.2580.484
Huber Regression0.1015.390−0.157−0.0100.1810.441
Theil–Sen Regression0.0604.1701.197−0.0160.2950.537
Quantile Regression0.4557.8410.393−0.0060.5230.548
Random Forest0.0091.627−0.125−0.0130.0180.091
XGBoost0.06812.284−0.0900.0070.0060.087
KNN0.0872.4740.166−0.0180.0280.201
MSHP Voted forKNNQRRFXGB✓XGB✓XGB
Others Voted fornonenonenoneR, MSE, RMSE, EVSallall
MHSP of Best %88/8826/4985/9093/9499/9991/91
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hassanat, A.B.; Alqaralleh, M.K.; Tarawneh, A.S.; Almohammadi, K.; Alamri, M.; Alzahrani, A.; Altarawneh, G.A.; Alhalaseh, R. A Novel Outlier-Robust Accuracy Measure for Machine Learning Regression Using a Non-Convex Distance Metric. Mathematics 2024, 12, 3623. https://doi.org/10.3390/math12223623

AMA Style

Hassanat AB, Alqaralleh MK, Tarawneh AS, Almohammadi K, Alamri M, Alzahrani A, Altarawneh GA, Alhalaseh R. A Novel Outlier-Robust Accuracy Measure for Machine Learning Regression Using a Non-Convex Distance Metric. Mathematics. 2024; 12(22):3623. https://doi.org/10.3390/math12223623

Chicago/Turabian Style

Hassanat, Ahmad B., Mohammad Khaled Alqaralleh, Ahmad S. Tarawneh, Khalid Almohammadi, Maha Alamri, Abdulkareem Alzahrani, Ghada A. Altarawneh, and Rania Alhalaseh. 2024. "A Novel Outlier-Robust Accuracy Measure for Machine Learning Regression Using a Non-Convex Distance Metric" Mathematics 12, no. 22: 3623. https://doi.org/10.3390/math12223623

APA Style

Hassanat, A. B., Alqaralleh, M. K., Tarawneh, A. S., Almohammadi, K., Alamri, M., Alzahrani, A., Altarawneh, G. A., & Alhalaseh, R. (2024). A Novel Outlier-Robust Accuracy Measure for Machine Learning Regression Using a Non-Convex Distance Metric. Mathematics, 12(22), 3623. https://doi.org/10.3390/math12223623

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop