Improving the Robustness of the Theil-Sen Estimator Using a Simple Heuristic-Based Modification
Abstract
:1. Introduction
2. The Theil-Sen Estimator
- Directly—in this case the value is calculated based only on the values of data points
- Hierarchically—using the early calculated value, using (5), of , that is,
3. The Robustified Theil-Sen Estimator
3.1. Causes of the Limited Robustness of the TS Estimator
3.2. General Idea of the RTS Estimator
- A selection of subsets of correct estimates and from sets of all initial estimates, and , respectively,
- Searches for values of optimal estimates and only among sets of correct estimators and , respectively.
3.3. Description of the RTS Estimator Algorithm
3.3.1. Calculation of in RTSd and RTSh Estimators
3.3.2. Calculation of Using the RTSd Estimator
3.3.3. Calculation of Using the RTSh Estimator
4. Numerical Experiments
4.1. General Assumptions
- Exp-A—evaluation of the dispersion of the estimation results,
- Exp-B—preliminary determination of the breakdown point value of the RTS estimator based on the comparison of the RTS estimation results with the results of the TS and RM estimators for data sets containing a different number o of outliers.
4.2. Experiment A
- Drawing of the values of coordinates according to a uniform distribution, ,
- Draw, according to a uniform distribution, a list of indices i of data points that will be outliers,
- For correctly measured (correct, non-outlier) data points, their values are simulated with a normally distributed measurement error as follows
- Outlier values are calculated according to the formula
4.3. Experiment B
- The creation of each set begins by drawing x coordinates for points in the range using a uniform distribution. The resulting vector is sorted in ascending order so that the returned vector satisfies the condition . Then, the vector of y coordinates of the corrected data points was created according to the formula
- Next, the vector of y coordinates of outliers were drawn using the normal distribution , where , .
- The final step is to create the data set for each analyzed value of o; this is achieved as follows
4.4. Discussion of the Results
- The breakdown point value for the RTS estimator is higher than the breakdown point value of the TS estimator, formally
- The breakdown point value for the RTS estimator is equal to the breakdown point value of the RM estimator, hence formally
5. Conclusions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
OLS | Ordinary Least Squares estimator |
OLSc | Ordinary Least Squares estimator calculated only for the correct data points in data set |
RM | Repeated Median estimator |
RMd | Repeated Median estimator, directly calculated estimate |
RMh | Repeated Median estimator, hierarchically calculated estimate |
RTS | Robustifieted Theil-Sen estimator |
RTSd | Robustifieted Theil-Sen estimator, directly calculated estimate |
RTSh | Robustifieted Theil-Sen estimator, hierarchically calculated estimate |
TS | Theil-Sen estimator |
TSd | Theil-Sen estimator, directly calculated estimate |
TSh | Theil-Sen estimator, hierarchically calculated estimate |
Appendix A. Effect of Data Set Size on τ Coefficient Value
Appendix B. Comparison of Computational Times of the Compared Robust Estimators
References
- Chatterjee, S.; Simonoff, J.S. Handbook of Regression Analysis; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2013. [Google Scholar] [CrossRef]
- Draper, N.R.; Smith, H. Applied Regression Analysis; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2014. [Google Scholar] [CrossRef]
- Grubbs, F.E. Procedures for Detecting Outlying Observations in Samples. Technometrics 1969, 11, 1–21. [Google Scholar] [CrossRef]
- Rousseeuw, P.J.; Leroy, A.M. Robust Regression and Outlier Detection; John Wiley: Hoboken, NJ, USA, 1987. [Google Scholar] [CrossRef]
- Møller, S.F.; Frese, J.V.; Bro, R. Robust methods for multivariate data analysis. J. Chemom. 2005, 19, 549–563. [Google Scholar] [CrossRef]
- Rousseeuw, P.J.; Hubert, M. Anomaly detection by robust statistics. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2018, 8, e1236. [Google Scholar] [CrossRef]
- Wang, H.; Bah, M.J.; Hammad, M. Progress in Outlier Detection Techniques: A Survey. IEEE Access 2019, 7, 107964–108000. [Google Scholar] [CrossRef]
- El-Shaarawi, A.H.; Piegorsch, W.W. (Eds.) Encyclopedia of Environmetrics, Volume 1; John Wiley & Sons: Hoboken, NJ, USA, 2001; p. 19. [Google Scholar]
- Theil, H. A rank-invariant method of linear and polynomial regression analysis. Parts: I, II, III. Proc. R. Neth. Acad. Sci. 1950, 53, 386–392, 521–525, 1397–1412. [Google Scholar]
- Sen, P.K. Estimates of the Regression Coefficient Based on Kendall’s Tau. J. Am. Stat. Assoc. 1968, 63, 1379–1389. [Google Scholar] [CrossRef]
- Guerrero, J.; Guijarro, M.; Montalvo, M.; Romeo, J.; Emmi, L.; Ribeiro, A.; Pajares, G. Automatic expert system based on images for accuracy crop row detection in maize fields. Expert Syst. Appl. 2013, 40, 656–664. [Google Scholar] [CrossRef]
- Choi, K.H.; Han, S.K.; Park, K.H.; Kim, K.S.; Kim, S. Vision based guidance line extraction for autonomous weed control robot in paddy field. In Proceedings of the 2015 IEEE International Conference on Robotics and Biomimetics (ROBIO), Zhuhai, Chiana, 6–9 December 2015; pp. 831–836. [Google Scholar] [CrossRef]
- Henninger, M.; Sengupta, S.; Mandelli, S.; ten Brink, S. Performance Evaluation of Array Calibration for Angle-of-Arrival-Based 5G Positioning. In Proceedings of the WSA & SCC 2023 26th International ITG Workshop on Smart Antennas and 13th Conference on Systems, Communications, and Coding, Braunschweig, Germany, 27 February 2023; pp. 1–6. [Google Scholar]
- Kasimati, A.; Espejo-García, B.; Darra, N.; Fountas, S. Predicting Grape Sugar Content under Quality Attributes Using Normalized Difference Vegetation Index Data and Automated Machine Learning. Sensors 2022, 22, 3249. [Google Scholar] [CrossRef] [PubMed]
- Mann, H.B. Nonparametric Tests Against Trend. Econometrica 1945, 13, 245–259. [Google Scholar] [CrossRef]
- Kendall, M.G. Further Contributions to the Theory of Paired Comparisons. Biometrics 1955, 11, 43–62. [Google Scholar] [CrossRef]
- Davtalab, M.; Byčenkienė, S.; Bimbaitė, V. Long-term spatial and temporal evaluation of the PM2.5 and PM10 mass concentrations in Lithuania. Atmos. Pollut. Res. 2023, 14, 101951. [Google Scholar] [CrossRef]
- Chen, Y.; Rich, D.Q.; Hopke, P.K. Changes in source specific PM2.5 from 2010 to 2019 in New York and New Jersey identified by dispersion normalized PMF. Atmos. Res. 2024, 304, 107353. [Google Scholar] [CrossRef]
- Carreno-Madinabeitia, S.; Ibarra-Berastegi, G.; Sáenz, J.; Ulazia, A. Long-term changes in offshore wind power density and wind turbine capacity factor in the Iberian Peninsula (1900–2010). Energy 2021, 226, 120364. [Google Scholar] [CrossRef]
- Yeh, C.F.; Wang, J.; Yeh, H.F.; Lee, C.H. Spatial and Temporal Streamflow Trends in Northern Taiwan. Water 2015, 7, 634–651. [Google Scholar] [CrossRef]
- Kubiak-Wójcicka, K.; Pilarska, A.; Kamiński, D. The Analysis of Long-Term Trends in the Meteorological and Hydrological Drought Occurrences Using Non-Parametric Methods—Case Study of the Catchment of the Upper Noteć River (Central Poland). Atmosphere 2021, 12, 1098. [Google Scholar] [CrossRef]
- Caloiero, T.; Filice, E.; Coscarelli, R.; Pellicone, G. A Homogeneous Dataset for Rainfall Trend Analysis in the Calabria Region (Southern Italy). Water 2020, 12, 2541. [Google Scholar] [CrossRef]
- Muthoni, F. Spatial-Temporal Trends of Rainfall, Maximum and Minimum Temperatures Over West Africa. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 2960–2973. [Google Scholar] [CrossRef]
- Ali Mohammed, J.; Gashaw, T.; Worku Tefera, G.; Dile, Y.T.; Worqlul, A.W.; Addisu, S. Changes in observed rainfall and temperature extremes in the Upper Blue Nile Basin of Ethiopia. Weather. Clim. Extrem. 2022, 37, 100468. [Google Scholar] [CrossRef]
- Kourtis, I.M.; Vangelis, H.; Tigkas, D.; Mamara, A.; Nalbantis, I.; Tsakiris, G.; Tsihrintzis, V.A. Drought Assessment in Greece Using SPI and ERA5 Climate Reanalysis Data. Sustainability 2023, 15, 15999. [Google Scholar] [CrossRef]
- Henao Casas, J.; Fernández Escalante, E.; Ayuga, F. Increasing groundwater storage and maintaining irrigation through managed aquifer recharge. Groundw. Sustain. Dev. 2022, 19, 100842. [Google Scholar] [CrossRef]
- Aubard, V.; Paulo, J.A.; Silva, J.M.N. Long-Term Monitoring of Cork and Holm Oak Stands Productivity in Portugal with Landsat Imagery. Remote Sens. 2019, 11, 525. [Google Scholar] [CrossRef]
- Vanem, E.; Walker, S.E. Identifying trends in the ocean wave climate by time series analyses of significant wave heightdata. Ocean. Eng. 2013, 61, 148–160. [Google Scholar] [CrossRef]
- Aydoğan, B.; Ayat, B. Spatial variability of long-term trends of significant wave heights in the Black Sea. Appl. Ocean. Res. 2018, 79, 20–35. [Google Scholar] [CrossRef]
- Wang, J.; Liu, J.; Wang, Y.; Liao, Z.; Sun, P. Spatiotemporal variations and extreme value analysis of significant wave height in the South China Sea based on 71-year long ERA5 wave reanalysis. Appl. Ocean. Res. 2021, 113, 102750. [Google Scholar] [CrossRef]
- Caloiero, T.; Aristodemo, F.; Ferraro, D.A. Annual and seasonal trend detection of significant wave height, energy period and wave power in the Mediterranean Sea. Ocean. Eng. 2022, 243, 110322. [Google Scholar] [CrossRef]
- Wilcox, R.R. Fundamentals of Modern Statistical Methods. Substantially Improving Power and Accuracy; Springer: New York, NY, USA, 2010. [Google Scholar]
- Borroni, C.G.; Cifarelli, D.M. Some maximum-indifference estimators for the slope of a univariate linear model. J. Nonparametric Stat. 2016, 28, 395–412. [Google Scholar] [CrossRef]
- Hampel, F.R. A General Qualitative Definition of Robustness. Ann. Math. Statist. 1971, 42, 1887–1896. [Google Scholar] [CrossRef]
- Donoho, D.L.; Gasko, M. Breakdown Properties of Location Estimates Based on Halfspace Depth and Projected Outlyingness. Ann. Statist. 1992, 20, 1803–1827. [Google Scholar] [CrossRef]
- Hubert, M.; Rousseeuw, P.J.; Aelst, S. Robustness. In Encyclopedia of Actuarial Science; American Cancer Society: New York, NY, USA, 2006. [Google Scholar] [CrossRef]
- Siegel, A.F. Robust regression using repeated medians. Biometrika 1982, 69, 242–244. [Google Scholar] [CrossRef]
- Stein, A.; Werman, M. Finding the Repeated Median Regression Line. In Proceedings of the Third Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’92), Philadelphia, PA, USA, September 1992; pp. 409–413. [Google Scholar]
- Rousseeuw, P.J.; Netanyahu, N.S.; Mount, D.M. New Statistical and Computational Results on the Repeated Median Regression Estimator. In New Directions in Statistical Data Analysis and Robustness; Morgenthaler, S., Ronchetti, E., Stahel, W.A., Eds.; Birkhäuser: Basel, Switzerland, 1993; pp. 177–194. [Google Scholar]
- Rousseeuw, P.J. Least Median of Squares Regression. J. Am. Stat. Assoc. 1984, 79, 871–880. [Google Scholar] [CrossRef]
- Katz, M.J.; Sharir, M. Optimal slope selection via expanders. Inf. Process. Lett. 1993, 47, 115–122. [Google Scholar] [CrossRef]
- Brönnimann, H.; Chazelle, B. Optimal slope selection via cuttings. Comput. Geom. 1998, 10, 23–29. [Google Scholar] [CrossRef]
- Matoušek, J. Randomized optimal algorithm for slope selection. Inf. Process. Lett. 1991, 39, 183–187. [Google Scholar] [CrossRef]
- Matoušek, J.; Mount, D.M.; Netanyahu, N.S. Efficient Randomized Algorithms for the Repeated Median Line Estimator. Algorithmica 1998, 20, 136–150. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bal, A. Improving the Robustness of the Theil-Sen Estimator Using a Simple Heuristic-Based Modification. Symmetry 2024, 16, 698. https://doi.org/10.3390/sym16060698
Bal A. Improving the Robustness of the Theil-Sen Estimator Using a Simple Heuristic-Based Modification. Symmetry. 2024; 16(6):698. https://doi.org/10.3390/sym16060698
Chicago/Turabian StyleBal, Artur. 2024. "Improving the Robustness of the Theil-Sen Estimator Using a Simple Heuristic-Based Modification" Symmetry 16, no. 6: 698. https://doi.org/10.3390/sym16060698
APA StyleBal, A. (2024). Improving the Robustness of the Theil-Sen Estimator Using a Simple Heuristic-Based Modification. Symmetry, 16(6), 698. https://doi.org/10.3390/sym16060698