Statistical Machine Learning Regression Models for Salary Prediction Featuring Economy Wide Activities and Occupations
Abstract
:1. Introduction
2. Background
2.1. How Salary Scales Are Developed
2.1.1. Job Analysis and Evaluation
2.1.2. Salary Surveys and Benchmarking
2.2. Salary Predictive Models
2.3. Statistical Machine Learning (ML)
3. Methodology
3.1. Sets of Economic Activities and Occupations
3.1.1. Finite List of Economic Activities
3.1.2. Finite List of Occupations in an Economy
3.2. Prediction Models
3.3. Prediction Model for Saudi Labor Market Salary
3.3.1. Overview of Saudi Labor Market
3.3.2. Training Data
3.3.3. Mean Annual Salaries across Economic Activities
3.3.4. Mean Annual Salaries across Occupations Groups
4. Results and Discussions
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
GPR | Gaussian Process Regression |
MAE | Mean absolute error |
ML | Machine Learning |
Coefficient of determination | |
RMSE | Root-mean-square error |
SVR | Support Vector Regression |
References
- International Labour Office. International Standard Classification of Occupations (ISCO-08); International Labour Office: Geneva, Switzerland, 2012. [Google Scholar]
- Department of Economic and Social Affairs, Statistics Division. International Standard Industrial Classification of All Economic Activities (Revision 4); Department of Economic and Social Affairs, Statistics Division: New York, NY, USA, 2008. [Google Scholar]
- Ingster, B. Job Analysis, Documentation, and Job Evaluation. In The Compensation Handbook: A State of the Art Guide to Compensation Strategy and Design; Berger, L.A., Berger, D.R., Eds.; McGraw Hill: New York, NY, USA, 2008. [Google Scholar]
- Benge, E.J.; Burk, S.L.; Hay, E.N. Manual of Job Evaluation: Procedures of Job Analysis and Appraisal; Harper and Brothers: New York, NY, USA, 1941. [Google Scholar]
- Heneman, R.L. Job and Work Evaluation: A Literature Review. Public Pers. Manag. 2003, 32, 47–71. [Google Scholar] [CrossRef]
- Griffenhagen, E.O. Classification and Compensation Plans as Tools in Personnel Administration: Notes on Definitions, Principles, and Developments; Reprinted in Office Management Series; Office Management: New York, NY, USA, 1926. [Google Scholar]
- Armstrong, M.; Cummins, A.; Hastings, S.; Wood, W. Job Evaluation A Guide to Achieving Equal Pay; Kogan Page Limited: London, UK, 2005. [Google Scholar]
- Lott, M.R. Wage Scales and Job Evaluation; Ronald Press Company: New York, NY, USA, 1926. [Google Scholar]
- Hay, E.; Purves, D. A New Method of Job Evaluation: The Guide–Profile Method. Pers. Am. Manag. Assoc. 1954, 31, 72–80. [Google Scholar]
- York, D.; Brown, T. Salary Surveys. In The Compensation Handbook: A State of the Art Guide to Compensation Strategy and Design; Berger, L.A., Berger, D.R., Eds.; McGraw Hill: New York, NY, USA, 2008. [Google Scholar]
- Dyl, E.A. Corporate control and management compensation: Evidence on the agency problem. Manag. Decis. Econ. 1988, 9, 21–25. [Google Scholar] [CrossRef]
- Roth, K.; O’Donnell, S. Foreign Subsidiary Compensation Strategy An Agency Theory Perspective. Acad. Manag. J. 1996, 39, 678–703. [Google Scholar]
- Bebchuk, L.A.; Fried, J. Executive Compensation as an Agency Problem; Technical report; National Bureau of Economic Research: Cambridge, MA, USA, 2003. [Google Scholar]
- Chen, C.X.; Lu, H.; Sougiannis, T. The Agency Problem, Corporate Governance, and the Asymmetrical Behavior of Selling, General, and Administrative Costs. Contemp. Account. Res. 2012, 29, 252–282. [Google Scholar] [CrossRef]
- Fitzpatrick, I.; McMullen, T.D. Benchmarking. In The Compensation Handbook: A State of the Art Guide to Compensation Strategy and Design; Berger, L.A., Berger, D.R., Eds.; McGraw Hill: New York, NY, USA, 2008. [Google Scholar]
- Wiler, J.L.; Rounds, K.; McGowan, B.; Baird, J.; Choo, E.K. Continuation of Gender Disparities in Pay Among Academic Emergency Medicine Physicians. Acad. Emerg. Med. 2019, 26, 286–292. [Google Scholar] [CrossRef] [PubMed]
- Gutiérrez-Martínez, I.; Saifuddin, S.M.; Haq, R. The United Nations Gender Inequality Index; Edward Elgar Publishing: Cheltenham, UK, 2021. [Google Scholar]
- Bloyer, K. Gender Inequity in the Tech Industry Workplace. In Proceedings of the Perspectives on Critical Issues; Saint Mary’s University: Halifax, NS, Canada, 2021; Volume 4, pp. 215–224. [Google Scholar]
- Acheson, J.; Collins, M. The gender pay gap in Revenue. Administration 2021, 69, 45–75. [Google Scholar] [CrossRef]
- Singh, P.; Pattanaik, F. Unequal Reward for Equal Work? Understanding Women’s Work and Wage Discrimination in India Through the Meniscus of Social Hierarchy. Contemp. Voice Dalit 2020, 12, 19–36. [Google Scholar] [CrossRef]
- Johnson, C.B.; Riggs, M.L.; Downey, R.G. Fun with Numbers: Alternative Models for Predicting Salary Levels. Res. High. Educ. 1987, 27, 349–362. [Google Scholar] [CrossRef]
- Meng, Q.; Zhu, H.; Xiao, K.; Xiong, H. Intelligent Salary Benchmarking for Talent Recruitment: A Holistic Matrix Factorization Approach. In Proceedings of the 2018 IEEE International Conference on Data Mining (ICDM), Singapore, 17–20 November 2018; pp. 337–346. [Google Scholar] [CrossRef]
- Pfeffer, J.; Davis-Blake, A. Understanding Organizational Wage Structures: A Resource Dependence Approach. Acad. Manag. J. 1987, 30, 437–455. [Google Scholar]
- Carroll, G.R.; Mayer, K.U. Job-Shift Patterns in the Federal Republic of Germany: The Effects of Social Class, Industrial Sector, and Organizational Size. Am. Sociol. Rev. 1986, 51, 323–341. [Google Scholar] [CrossRef]
- Mincer, J. The Distribution of Labor Incomes: A Survey With Special Reference to the Human Capital Approach. J. Econ. Lit. 1970, 8, 1–26. [Google Scholar]
- Kubat, M. An Introduction to Machine Learning, 2nd ed.; Springer: New York, NY, YSA, 2017. [Google Scholar]
- James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning with applications in R, 2nd ed.; Springer: New York, NY, USA, 2021. [Google Scholar]
- Krollner, B.; Vanstone, B.; Finnie, G. Financial time series forecasting with machine learning techniques: A survey. In Proceedings of the 18th European Symposium on Artificial Neural Networks (ESANN 2010), Bruges, Belgium, 28–30 April 2010; pp. 25–30. [Google Scholar]
- Obthong, M.; Tantisantiwong, N.; Jeamwatthanachai, W.; Wills, G. A survey on machine learning for stock price prediction: Algorithms and techniques. In Proceedings of the 2nd International Conference on Finance, Economics, Management and IT Business, Online, 5–6 May 2020; pp. 63–71. [Google Scholar]
- Leung, C.K.S.; MacKinnon, R.K.; Wang, Y. A Machine Learning Approach for Stock Price Prediction. In Proceedings of the 18th International Database Engineering & Applications Symposium, Porto, Portugal, 7–9 July 2014; Association for Computing Machinery: New York, NY, USA, 2014; pp. 274–277. [Google Scholar] [CrossRef]
- Kavitha, S.; Varuna, S.; Ramya, R. A comparative analysis on linear regression and support vector regression. In Proceedings of the 2016 Online International Conference on Green Engineering and Technologies (IC-GET), Coimbatore, India, 19 November 2016; pp. 1–5. [Google Scholar] [CrossRef]
- Shen, R.; Zhang, B.-W. The research of regression model in machine learning field. MATEC Web Conf. 2018, 176, 01033. [Google Scholar] [CrossRef]
- Maulud, D.; Abdulazeez, A.M. A review on linear regression comprehensive in machine learning. J. Appl. Sci. Technol. Trends 2020, 1, 140–147. [Google Scholar] [CrossRef]
- Specht, D.F. A general regression neural network. IEEE Trans. Neural Netw. 1991, 2, 568–576. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Nghiep, N.; Al, C. Predicting housing value: A comparison of multiple regression analysis and artificial neural networks. J. Real Estate Res. 2001, 22, 313–336. [Google Scholar] [CrossRef]
- Mitchell, T. Machine Learning; McGraw-Hill: New York, NY, USA, 1997. [Google Scholar]
- Hastie, T.; Tibshirani, R.; Friedman, J.H.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer: New York, NY, USA, 2009. [Google Scholar]
- Drucker, H.; Burges, C.J.C.; Kaufman, L.; Smola, A.; Vapnik, V. Support Vector Regression Machines. In Proceedings of the Advances in Neural Information Processing Systems, Denver, CO, USA, 2–5 December 1996; Mozer, M., Jordan, M., Petsche, T., Eds.; MIT Press: Cambridge, MA, USA, 1996; Volume 9. [Google Scholar]
- Awad, M.; Khanna, R. Efficient Learning Machines: Theories, Concepts, and Applications for Engineers and System Designers; Apress: Berkeley, CA, USA, 2015; pp. 67–80. [Google Scholar] [CrossRef]
- Williams, C.K.; Rasmussen, C.E. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006. [Google Scholar]
- Rasmussen, C.E.; Nickisch, H. Gaussian processes for machine learning (GPML) toolbox. J. Mach. Learn. Res. 2010, 11, 3011–3015. [Google Scholar]
- The World Bank. Personal Remittances Paid in USD; The World Bank: Washington, DC, USA, 2020. [Google Scholar]
- General Authority for Statistics Saudi Arabia (GaStat). Annual Economic Surveys, Business Establishments (2003–2017); General Authority for Statistics Saudi Arabia (GaStat): Riyadh, Saudi Arabia, 2020. [Google Scholar]
Section | Description | Divisions | Groups | Classes |
---|---|---|---|---|
A | Agriculture, forestry and fishing | 3 | 13 | 38 |
B | Mining and quarrying | 5 | 10 | 14 |
C | Manufacturing | 24 | 69 | 137 |
D | Electricity, gas, steam and air conditioning supply | 1 | 3 | 3 |
E | Water supply; sewerage, waste management and remediation activities | 4 | 6 | 8 |
F | Construction | 3 | 8 | 11 |
G | Wholesale and retail trade; repair of motor vehicles and motorcycles | 3 | 20 | 43 |
H | Transportation and storage | 5 | 11 | 20 |
I | Accommodation and food service activities | 2 | 6 | 7 |
J | Information and communication | 6 | 12 | 23 |
K | Financial and insurance activities | 3 | 10 | 18 |
L | Real estate activities | 1 | 2 | 2 |
M | Professional, scientific and technical activities | 7 | 14 | 14 |
N | Administrative and support service activities | 6 | 19 | 26 |
O | Public administration and defense; compulsory social security | 1 | 3 | 7 |
P | Education | 1 | 5 | 8 |
Q | Human health and social work activities | 3 | 9 | 12 |
R | Arts, entertainment and recreation | 4 | 4 | 10 |
S | Other service activities | 3 | 5 | 17 |
T | Activities of households as employers | 2 | 3 | 3 |
U | Activities of extraterritorial organizations and bodies | 1 | 1 | 1 |
Major Groups | Skill Level | Sub-Major Groups | Minor Groups | Unit Groups |
---|---|---|---|---|
1 Managers | 3 + 4 | 4 | 11 | 31 |
2 Professionals | 4 | 6 | 27 | 92 |
3 Technicians and Associate Professionals | 3 | 5 | 20 | 84 |
4 Clerical Support Workers | 2 | 4 | 8 | 29 |
5 Services and Sales Workers | 2 | 4 | 13 | 40 |
6 Skilled Agricultural, Forestry, and Fishery Workers | 2 | 3 | 9 | 18 |
7 Craft and Related Trades Workers | 2 | 5 | 14 | 66 |
8 Plant and Machine Operators, and Assemblers | 2 | 3 | 14 | 40 |
9 Elementary Occupations | 1 | 6 | 11 | 33 |
0 Armed Forces Occupations | 1 + 2 + 4 | 3 | 3 | 3 |
# | Survey Data | Model Mapping for Training Data | Type of Variable | |
---|---|---|---|---|
01 | Incumbent job title | ⟹ | ISCO-08 [1] Major occupational groups → sub-major groups → minor groups → unit groups | Discrete set of occupations |
02 | Demographic information about the incumbent | ⟹ | Such as gender, ethnic group, age, education level, experience, and etc. | Ignored, not included in the prediction model |
03 | Name of company/organization | ⟹ | ISIC4 [2] Economic Sections → Divisions | Discrete set of economic activities |
04 | Description of product/services provided | |||
05 | Total number of employees | ⟹ | Size of the company | |
06 | Total Annual Salary | ⟹ | Use total annual salary as output for training data, calculate salary mean, median, first and third quartiles as inputs | continuous variable |
Regression Model | Validation Data | Test Data | Model Hyperparameters | ||||
---|---|---|---|---|---|---|---|
RMSE | MAE | RMSE | MAE | ||||
MLR | 0.50 | 8706 | 3999 | 0.55 | 8799 | 4397 | Least squares method to estimate coefficients |
ANN | 0.97 | 1964 | 627 | 0.99 | 1392 | 560 | Three layers with 10 nodes each |
TBR | 0.97 | 2134 | 585 | 0.97 | 2088 | 578 | Minimum leaf size of 4 |
SVM | 0.93 | 3209 | 641 | 0.99 | 1437 | 538 | Cubic kernel function |
Bayesian-based GPR | 0.98 | 1882 | 481 | 0.99 | 1496 | 526 | Squared exponential isotropic kernel |
Regression Model | Validation Data | Test Data | Model Hyperparameters | ||||
---|---|---|---|---|---|---|---|
RMSE | MAE | RMSE | MAE | ||||
MLR | 0.62 | 7710 | 5056 | 0.62 | 6575 | 3493 | Least squares method to estimate coefficients |
ANN | 0.94 | 3133 | 2036 | 0.91 | 3169 | 2311 | One layer of with 100 nodes |
TBR | 0.47 | 9125 | 6203 | 0.12 | 9989 | 6439 | Minimum leaf size of 12 |
SVM | 0.64 | 7442 | 4062 | 0.75 | 5382 | 3571 | Gaussian kernel function |
Bayesian-based GPR | 0.87 | 4509 | 2299 | 0.91 | 3172 | 1927 | Rational quadratic kernel function |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Matbouli, Y.T.; Alghamdi, S.M. Statistical Machine Learning Regression Models for Salary Prediction Featuring Economy Wide Activities and Occupations. Information 2022, 13, 495. https://doi.org/10.3390/info13100495
Matbouli YT, Alghamdi SM. Statistical Machine Learning Regression Models for Salary Prediction Featuring Economy Wide Activities and Occupations. Information. 2022; 13(10):495. https://doi.org/10.3390/info13100495
Chicago/Turabian StyleMatbouli, Yasser T., and Suliman M. Alghamdi. 2022. "Statistical Machine Learning Regression Models for Salary Prediction Featuring Economy Wide Activities and Occupations" Information 13, no. 10: 495. https://doi.org/10.3390/info13100495
APA StyleMatbouli, Y. T., & Alghamdi, S. M. (2022). Statistical Machine Learning Regression Models for Salary Prediction Featuring Economy Wide Activities and Occupations. Information, 13(10), 495. https://doi.org/10.3390/info13100495