From Data to Insight: Transforming Online Job Postings into Labor-Market Intelligence
Abstract
:1. Introduction
2. Literature Review
3. Methodology
3.1. The Challenges
- 1.
- Unbiased and Labor Market-Representative Data:
- 2.
- Noisy or Irrelevant Data:
- 3.
- Language and Translation Issues:
- 4.
- Handling Missing Values:
- 5.
- Variability in Data:
- 6.
- Classification of Ambiguous Data:
- 7.
- Recruiters’ Job Postings:
- 8.
- Changing Labor-Market Dynamics:
3.2. Data Gathering
3.2.1. Data-Source Selection
- Local expert suggestions: Country labor-market experts may come from various professions, such as human-resource professionals, government officials, chamber of commerce officers, labor-market analysts, and labor economists. Conducting interviews with these experts can help identify the most appropriate job portals for each country of interest.
- Google search results: Google search results are widely considered a valuable and unbiased method for selecting the most reputable and widely used websites of a particular category. This is because Google’s ranking algorithms take into account a website’s content quality, quantity, and analytics, such as visitors and page views. In the proposed methodology, Google search results are important criteria for selecting the best job portals in a country.The main concept is to use the Google Trends Tool to find the top results that appear on top search queries of the “Jobs” category for a particular country of interest. To do this, the following filters should be selected in Google Trends: (1) the country of interest, (2) the “Past 12 months” time period, (3) the “Jobs” category, and (4) the “Web Search” option.
- Online SEO marketing tools: Popular SEO marketing tools such as Similarweb.com, Alexa.com, and Moz.com can be used to measure and map the digital world in a timely and comprehensive way. These tools can help us find the most popular and valuable job portals in each country of interest.
3.2.2. Data Extraction
- Job title: It is the title of the job posting that usually indicates the occupation of the job.
- Job description: It is the main text that analytically describes the job vacancy. The job description may contain valuable information that needs to be extracted, as it usually contains details about the job occupation and the responsibilities, the industry of the employer, the workplace, the requested skills and qualifications, etc.
- Employer’s name: The employer’s name indicates the name of the company that is posting the job vacancy. However, in many cases, this field may be empty or “confidential”, or it is a recruiting company that posts on behalf of a client. In such cases, the proposed methodology extracts the employer’s industry from the job-description field.
- Workplace: The workplace information provides the location of the job posting, which can be used to analyze the distribution of job opportunities across different regions.
- Employment type: Employment type is an important factor in labor-market analysis as it provides information about the type of work arrangement between employers and workers. Employment types can include full-time, part-time, temporary, contract, self-employed, and freelance. Understanding the distribution of employment types can help in identifying labor-market trends. It can also provide insights into the availability of different types of jobs in different industries and regions, and help policymakers develop strategies to support job creation and job security for workers.
- Education level: The education level requested in a job posting depends on the specific requirements and qualifications necessary for the position. Some jobs may require a high-school diploma or equivalent, while others may require a bachelor’s or master’s degree in a specific field. Additionally, some jobs may require additional certifications or specialized training.
- It is important for a job posting to clearly state the minimum education level required for the position. This helps to attract qualified candidates and ensures that all applicants meet the necessary educational qualifications.
- Qualifications/skills: Job postings typically include a list of qualifications and skills that are required or preferred for the position. These qualifications and skills will vary depending on the nature of the job and the level of experience required. Some common qualifications and skills that may be listed in a job posting include the following:
- Work experience in a related field or industry;
- Certifications or licenses;
- Languages knowledge;
- Technical knowledge or expertise;
- Driving license;
- Communication skills (both written and verbal);
- Problem-solving and critical thinking skills;
- Time management and organizational skills;
- Interpersonal skills (such as the ability to work well with others and collaborate effectively);
- Adaptability, flexibility, and attention to detail.
- Estimated salary: The salary information provides an estimate of the salary range for the job posting. This information can be used to analyze the salaries offered for different occupations and to identify the factors that influence salary levels.
- Date posted/expiration date: The extracted data usually contain the job-posting date and the expiration date and are crucial for the proposed analysis as they can provide valuable information about the dynamic and the changes through time of the labor market.
3.3. Data Preprocessing
3.3.1. Data Cleansing and Preparation
- Fix encoding problems: Many online job portals use non-English characters in their postings, which can cause issues with data processing if not properly encoded. For example, if a job posting in a language with non-ASCII characters (such as Greek or Chinese) is not encoded properly, the text may appear as a series of unintelligible symbols. Fixing encoding problems involves identifying and correcting these issues to ensure that the data can be properly processed. The chardet library is used to detect the encoding of text and ftfy (fixes text for you) to fix any encoding issues, ensuring all text is properly encoded in UTF-8.
- HTML tags removal: Online job postings often contain HTML tags that are used to format the text, such as bold or italicized text. These tags are not useful for our data-processing steps and must be removed to extract only the relevant text. “BeautifulSoup” is used to parse HTML content and remove all HTML tags, retaining only the text content.
- URLs removal: Some online job postings contain links to external websites that are not relevant to our analysis. These links can be removed to reduce noise in the data and ensure that only relevant information is extracted. At this point, we used regular expressions to identify and remove URLs from the text.
- Remove noise data: Replace numbers, addresses, phone numbers, currency symbols, etc., with special tokens. Regular expressions are used to replace numbers, addresses, phone numbers, and currency symbols with special tokens.
- Translate to English: Many online job postings are written in languages other than English. Translating these postings into English may be necessary to ensure consistency and ease of analysis. Automated translation tools such as Google Translate API can be used for this purpose, but it is important to note that these tools may not always produce accurate translations.
- Capitalize all fields: Standardizing the capitalization of all fields in the job postings can make the data easier to read and analyze. This involves converting all text to uppercase or lowercase letters, depending on the desired format, using Python string methods.
- Stop-words removal and stemming: Stop-words are common words that do not carry much meaning, such as “the” and “of”. These words can be removed to reduce noise in the data and improve the accuracy of the analysis. Stemming involves reducing words to their base form, such as converting “running” to “run”. This helps to reduce the dimensionality of the data and makes the data easier to analyze. NLTK library was used for the above tasks.
3.3.2. Entities Normalization
- Employer name: Normalizing the employer’s name is a challenging task, as it may vary between different portals. The first step involves removing punctuation and all non-alphabetic characters (other than spaces). Next, a list of “stop words” in companies’ names should be compiled. These “stop words” include the company’s legal entity type (e.g., ΑΕ, ΙΚΕ, ΟΕ, and ΕΠΕ for Greece), which can easily be found in a country’s list of legal forms; and frequently used words, such as “Company”, “Corporation”, and “Group”.
- Workplace: Workplaces should be normalized to the standard territory name of the NUTS Taxonomy to obtain accurate workplace statistics in the Information Extraction phase. The preferred level of information is NUTS3; however, higher NUTS levels are accepted if NUTS3 data are not available.
- Education level: Normalizing the various education levels in the dataset is essential for accurate analysis. The International Standard Classification of Education (ISCED) [20] is a commonly used standard classification for this purpose.
- Employment type: Employment type is a crucial piece of information in a job posting, and it should be classified using the International Labor Organization. Classification of status in employment standards [21].
- Salary information: In entities normalization procedure the salary field is cleared from text, it is converted to decimal, and the final value is either the given range (min–max values) or one value which represents the estimated salary. An extra field should be added to indicate whether we refer to monthly or annual salary.
- Date fields: These fields are converted to date type and are normalized based on a standard date format.
3.3.3. Deduplication
- Training phase: Users provide a sample of matched and unmatched record pairs. dedupe uses this sample to train a model, learning how to distinguish between duplicates and non-duplicates.
- Blocking: To improve efficiency, dedupe employs a blocking technique that partitions data into smaller blocks based on certain criteria, reducing the number of comparisons needed.
- Prediction: Once trained, the model predicts the likelihood of pairs of records being duplicates, allowing for automated deduplication and entity resolution.
3.3.4. Missing Values Handling
3.4. Information Extraction
- Job title: The job title is typically the first piece of information that can be extracted from a job posting. This can provide insights into the type of job, level of seniority, and responsibilities.
- Job description: The job description outlines the duties and responsibilities of the role, as well as the skills and qualifications required to perform the job. This information can be used to understand the requirements of the job and to assess whether a candidate is a good fit.
- Company information: Job postings may include information about the company, such as its size, industry, location, and mission. This can provide insights into the company culture and values.
- Salary and benefits: Some job postings may include information about the salary and benefits package, such as health insurance, retirement plans, and vacation time. This can help candidates evaluate the compensation package and make informed decisions about whether to apply for the job.
- Required qualifications and skills: Job postings often list the required qualifications, such as education, experience, and skills. This information can be used to assess whether a candidate meets the minimum requirements for the job.
- Application instructions: Job postings may provide instructions on how to apply for the job, such as submitting a resume and cover letter. This information can be used to determine the application process and timeline.
- Key performance indicators: Some job postings may list key performance indicators (KPIs) that the candidate will be responsible for achieving. This can provide insights into the goals and objectives of the role.
3.4.1. Industry Extraction
3.4.2. Occupation Extraction
3.4.3. Skills Extraction
4. Skill-Extraction Use Case
- Surface form extraction: A skill-detection algorithm utilizing spacy PhraseMatcher class scanned for “surface form” phrases correlated with the 13,000+ ESCO skills, refined to exclude non-representative terms. These noun phrases, or “surface forms”, are potential skill descriptors.
- Quality assessment: A machine-learning model is employed to predict the quality of each surface form as a skill entity. This model was trained on a manually labeled dataset of high-quality skill surface forms.
- Skill entity mapping: High-quality surface forms are mapped to skill entities. A skill entity represents a unique skill concept and may be associated with multiple surface forms.
- Clustering analysis: Unsupervised machine learning is used (specifically, hierarchical clustering) to aggregate the skill entities into coherent categories. This results in a three-level skills hierarchy. The resulting skills taxonomy, as illustrated in Table 3, consists of the following:
- 8 categories at Level 1 (highest level),
- 15 categories at Level 2,
- 41 categories at Level 3 (most granular level).
- The extracted surface form,
- The predicted skill entity,
- The three levels of skill categorization,
- A quality score indicating the confidence of the match.
5. Tourism Industry in Greece Use Case
- Unbiased and labor-market representative data: In order to derive useful and valid insights, we should collect as many data as possible from various sources that represent the labor market in a certain period. We applied our proposed method for data-source selection, and the OJV portals that came from the above procedure were Indeed, Careerjet, Jobfind, and Karriera.
- Industry extraction: The main challenge our methodology addressed in the tourism industry-analysis use case is the answer to the question, “what is the tourism industry”? Our collected raw data comprised approximately 140,000 online job advertisements posted between July 2019 and August 2021. These data were preprocessed according to the proposed methodology. The crucial phase was the industry extraction, as we had to securely annotate each job posting with the right employer industry. In Greece, the official business registry (GEMI) provides limited access to the companies’ data, so we relied on private business directories that contain valid and complete business information. An automated scraping mechanism was built to extract the economic activity code and classify the employer’s dataset.However, not all employers in the dataset were classified through the above method. Moreover, many job postings are posted by recruiter companies, and there is not any information about the final employer. For the above cases, we proceeded to the industry dictionary step of the proposed methodology. In our case, we focused on tourism-industry terms to build our dictionary, such as “hotel”, “restaurant”, “bar”, “tourism”, “resort”, “real estate”, etc. The above procedures resulted in an employer-industry annotated dataset of over 85% of the original postings.
- Tourism-industry employers: In order to choose only the job postings that belong to the tourism industry, we referred to the work of Demunter and Dimitrakopoulou (2013) [40], as it provides a list of NACE rev 2.0 codes associated with the tourism industries. This resulted in the identification of over 20,000 online job advertisements within the tourism industry.
- A 7.5% increase in part-time and short-term contract positions was noted, reflecting the industry’s adaptation to the pandemic’s uncertainties;
- There was a notable rise in demand for high-skilled blue-collar jobs, particularly those requiring tertiary education;
- The skill requirements for tourism roles evolved, with a new emphasis on healthcare, information management, and food service administration skills.
6. Discussion
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Boselli, R.; Cesarini, M.; Marrara, S.; Mercorio, F.; Mezzanzanica, M.; Pasi, G.; Viviani, M. WoLMIS: A labor market intelligence system for classifying web job vacancies. J. Intell. Inf. Syst. 2017, 51, 477–502. [Google Scholar] [CrossRef]
- Pavani, V.; Pujitha, N.; Vaishnavi, P.; Neha, K.; Sahithi, D. Feature Extraction based Online Job Portal. In Proceedings of the 2022 International Conference on Electronics and Renewable Systems (ICEARS), Tuticorin, India, 16–18 March 2022; pp. 1676–1683. [Google Scholar] [CrossRef]
- Naveed, H.; Khan, A.; Qiu, S.; Saqib, M.; Anwar, S.; Usman, M.; Barnes, N.; Mian, A. A Comprehensive Overview of Large Language Models. arXiv 2023, arXiv:2307.06435. [Google Scholar] [CrossRef]
- Singh, C.; Askari, A.; Caruana, R.; Gao, J. Augmenting interpretable models with large language models during training. Nat. Commun. 2023, 14, 7913. [Google Scholar] [CrossRef] [PubMed]
- CEDEFOP (European Centre for the Development of Vocational Training). Available online: https://www.cedefop.europa.eu/en/themes/skills-labour-market (accessed on 1 May 2024).
- Cedefop. Online Job Vacancies and Skills Analysis: A Cedefop Pan-European Approach; Publications Office: Luxembourg, 2019. [Google Scholar]
- Cedefop. The Online Job Vacancy Market in the EU: Driving Forces and Emerging Trends; Publications Office: Luxembourg, 2019; Cedefop Research Paper; No 72. [Google Scholar]
- Skills-OVATE Cedefop’s Project. Available online: https://www.cedefop.europa.eu/en/tools/skills-online-vacancies (accessed on 1 May 2024).
- Carnevale, A.P.; Jayasundera, T.; Repnikov, D. Understanding Online Job Ads Data; Georgetown Univ.: Washington, DC, USA, 2014; Center Educ. Workforce, Tech. Rep. [Google Scholar]
- Brancatelli, C.; Brodmann, S.; Marguerie, A. Job Creation and Demand for Skills in Kosovo: What Can We Learn from Job Portal Data? The World Bank: Washington, DC, USA, 2020. [Google Scholar]
- Betcherman, G.; Giannakopoulos, N.; Laliotis, I.; Pantelaiou, I.; Testaverde, M.; Tzimas, G. Reacting Quickly and Protecting Jobs: The Short-Term Impacts of the COVID-19 Lockdown on the Greek Labor Market. Empir. Econ. 2023, 65, 1273–1307. [Google Scholar] [CrossRef] [PubMed]
- Karakatsanis, I.; Alkhader, W.; MacCrory, F.; Alibasic, A.; Omar, M.; Aung, Z.; Woon, W. Data Mining Approach to Monitoring The Requirements of the Job Market: A Case Study. Inf. Syst. 2017, 65, 1–6. [Google Scholar] [CrossRef]
- Sibarani, E.; Scerri, S.; Morales, C.; Auer, S.; Collarana, D. Ontology-guided Job Market Demand Analysis: A Cross-Sectional Study for the Data Science field. In Proceedings of the 13th International Conference on Semantic Systems, Amsterdam, The Netherlands, 11–14 September 2017. [Google Scholar] [CrossRef]
- Boselli, R.; Cesarini, M.; Mercorio, F.; Mezzanzanica, M. Classifying online Job Advertisements through Machine Learning. Future Gener. Comput. Syst. 2018, 86, 319–328. [Google Scholar] [CrossRef]
- Marrara, S.; Pasi, G.; Viviani, M.; Cesarini, M.; Mercorio, F.; Mezzanzanica, M.; Pappagallo, M. A language modelling approach for discovering novel labor market occupations from the web. In Proceedings of the 2017 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2017), Leipzig, Germany, 23–26 August 2017; pp. 1026–1034, ISBN 978-1-4503-4951-2. [Google Scholar] [CrossRef]
- ISCO-08 Classification (International Standard Classification of Occupations). Available online: https://ilostat.ilo.org/methods/concepts-and-definitions/classification-occupation/ (accessed on 1 May 2024).
- Kim, J.; Angnakoon, P. Research using job advertisements: A methodological assessment. Libr. Inf. Sci. Res. 2016, 38, 327–335. [Google Scholar] [CrossRef]
- Bäck, A.; Hajikhani, A.; Suominen, A. Text mining on job advertisement data: Systematic process for detecting artificial intelligence related jobs. In Proceedings of the 1st Workshop on AI + Informetrics (AII2021) Co-Located with the iConference 2021 (AII 2021); CEUR-WS: Aachen, Germany, 2021; Volume 2871, pp. 111–124. Available online: http://ceur-ws.org/Vol-2871/paper9.pdf (accessed on 1 May 2024).
- Bamieh, O.; Ziegler, L. How Does the COVID-19 Crisis Affect Labor Demand? An Analysis Using Job Board Data from Austria; IZA Institute of Labor Economics: Bonn, Germany, 2020; IZA Discussion Paper No. 13801. [Google Scholar]
- ISCED (International Standard Classification of Education). Available online: https://ilostat.ilo.org/resources/concepts-and-definitions/classification-education/ (accessed on 1 May 2024).
- ICSE and ICSaW (International Classifications of Status in Employment and Status at Work). Available online: https://ilostat.ilo.org/methods/concepts-and-definitions/classification-status-at-work/ (accessed on 1 May 2024).
- Zhao, Y.; Chen, H.; Mason, C.M. A framework for duplicate detection from online job postings. In Proceedings of the 20th IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Melbourne, Australia, 14–17 December 2021; pp. 249–256. [Google Scholar] [CrossRef]
- Peng, J.; Hahn, J.; Huang, K.-W. Handling Missing Values in Information Systems Research: A Review of Methods and Assumptions. Inf. Syst. Res. 2023, 34, 5–26. [Google Scholar] [CrossRef]
- ESCO (European Skills, Competences, Qualifications and Occupations). Available online: https://esco.ec.europa.eu/en/classification/occupation_main (accessed on 1 May 2024).
- ISIC (International Standard Industrial Classification of All Economic Activities). Available online: https://unstats.un.org/unsd/publication/seriesm/seriesm_4rev4e.pdf (accessed on 1 May 2024).
- NAICS (North American Industry Classification System). Available online: https://www.naics.com/ (accessed on 1 May 2024).
- NACE Rev.2 (Statistical classification of economic activities in the European Community). Available online: https://ec.europa.eu/eurostat/documents/3859598/5902521/KS-RA-07-015-EN.PDF (accessed on 1 May 2024).
- Kühnemann, H.; van Delden, A.; Windmeijer, D. Exploring a knowledge-based approach to predicting NACE codes of enterprises based on web page texts. Stat. J. IAOS 2020, 36, 807–821. [Google Scholar] [CrossRef]
- Roy, S.; Chiticariu, L.; Feldman, V.; Reiss, F.; Zhu, H. Provenance-based dictionary refinement in information extraction. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD’13), New York, NY, USA, 22–27 June 2013; Association for Computing Machinery: New York, NY, USA, 2013; pp. 457–468. [Google Scholar] [CrossRef]
- Baronchelli, A.; Caglioti, E.; Loreto, V.; Pizzi, E. Dictionary based methods for information extraction. Phys. A Stat. Mech. Its Appl. 2004, 342, 294–300. [Google Scholar] [CrossRef]
- Albanesi, S.; Kim, J. Effects of the COVID-19 recession on the US labor market: Occupation, family, and gender. J. Econ. Perspect. 2021, 35, 3–24. [Google Scholar] [CrossRef]
- Papoutsoglou, M.; Ampatzoglou, A.; Mittas, N.; Angelis, L. Extracting Knowledge from On-Line Sources for Software Engineering Labor Market: A Mapping Study. IEEE Access 2019, 7, 157595–157613. [Google Scholar] [CrossRef]
- Schierholz, M.; Schonlau, M. Machine learning for occupation coding—A comparison study. J. Surv. Stat. Methodol. 2020, 9, 1013–1034. [Google Scholar] [CrossRef]
- Djumalieva, J.; Lima, A.; Sleeman, C. Classifying Occupations According to Their Skill Requirements in Job Advertisements; Economic Statistics Centre of Excellence: London, UK, 2018; Economic Statistics Centre of Excellence (ESCoE) Discussion Papers ESCoE DP-2018-04, Economic Statistics Centre of Excellence (ESCoE). [Google Scholar]
- Dogra, V.; Verma, S.; Kavita Chatterjee, P.; Shafi, J.; Choi, J.; Ijaz, M.F. A Complete Process of Text Classification System Using State-of-the-Art NLP Models. Comput. Intell. Neurosci. 2022, 2022, 1883698. [Google Scholar] [CrossRef] [PubMed]
- Zhang, M.; Jensen, K.; Sonniks, S.; Plank, B. SkillSpan: Hard and Soft Skill Extraction from English Job Postings. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, WA, USA, 10–15 July 2022; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 4962–4984. [Google Scholar]
- Fareri, S.; Melluso, N.; Chiarello, F.; Fantoni, G. SkillNER: Mining and mapping soft skills from any text. Expert Syst. Appl. 2021, 184, 115544. [Google Scholar] [CrossRef]
- Djumalieva, J.; Sleeman, C. An Open and Data-Driven Taxonomy of Skills Extracted from Online Job Adverts; Economic Statistics Centre of Excellence: London, UK, 2018; Economic Statistics Centre of Excellence (ESCoE) Discussion Papers ESCoE DP-2018-13, Economic Statistics Centre of Excellence (ESCoE). [Google Scholar]
- NESTA. The Open Jobs Observatory. Available online: https://www.nesta.org.uk/project/open-jobs-observatory/ (accessed on 1 May 2024).
- Demunter, C.; Dimitrakopoulou, K. One in Seven Businesses Belong to the Tourism Industries, EDC collection. In Industry, Trade and Services; European Union: Brussels, Belgium, 2013; Volumes 32-2013 of Statistics in Focus; ISSN 2314-9647. [Google Scholar]
Focus Area | Key Findings | References |
---|---|---|
CEDEFOP’s contributions | CEDEFOP’s work on labor-market analysis, focusing on VET systems and the impact of economic and social megatrends on skill demands and mismatches. In the Skills-OVATE project, CEDEFOP built a Business Intelligence platform to provide EU detailed information on the jobs and skills employers demand, grouped by regions and sectors. | CEDEFOP [6,7,8]. |
Online Job-Posting Analysis | An analysis of online job-posting data utilizing text-mining and data-mining approaches leads to an understanding of labor-market dynamics and provides real-time insights into job trends and skill demands. | Carnevale et al. (2014) [9]; Brancatelli et al. (2020) [10]; Karakatsannis et al. (2017) [12]; Kim and Angnakoon (2016) [17]. |
Impact of COVID-19 | Analysis of the COVID-19 pandemic’s short-term impacts on the labor market, using job-vacancy data from online job portals can help to monitor real-time changes in labor demand during a pandemic. | Betcherman et al. (2020) [11]; Bamieh et al. (2020) [19]. |
Machine-learning Techniques | Use of machine-learning, Natural Language Processing, and Named Entity Recognition techniques may be used for labor-information extraction to classify job ads on standard occupation taxonomies and identify emerging occupations and skills, improving labor-market insights. | Boselli et al. (2018) [14]; Marrara et al. (2017) [15]. |
Ontology-based information extraction | Application of ontology-based methods and domain-specific vocabularies for extracting data science skills from job postings, showing improved performance over manual methods. | Sibarani et al. (2017) [13]. |
Emerging technologies in job-ads Analyses | Investigation of the emergence of AI-related jobs and technology adoption using job-ad data, highlighting the relevance of these ads in monitoring labor-market trends. | Bäck et al. (2021) [18]. |
The Challenges | Description | Suggested Approach |
---|---|---|
Unbiased and labor market-representative data | The labor data must be diverse and reflect the labor market at the time of the analysis | Online job-posting data are collected from multiple sources. (Section 3.2.1. Data-Source Selection) |
Noisy or irrelevant data | Many job postings, especially in the free-text fields, contain data that are not useful for our data-processing steps and must be removed. | Sequential steps for data cleansing and preparation. (Section 3.3.1. Data Cleansing and Preparation) |
Language and translation issues | Online job-posting data may be available in multiple languages, which can pose challenges for analysis. For example, different languages may have different conventions for job titles or descriptions, making it difficult to classify or extract information from the data. | Data quality-assurance tests (Section 3.3.1. Data Cleansing and Preparation) |
Missing values | Online job-posting data may contain missing or incomplete information, which can affect the quality and reliability of the analysis | NLP techniques to handle missing values. (Section 3.3.4. Missing Values Handling) |
Variability in the data | Job titles, company names, locations, and skills may appear in many different words and expressions but with the same context. | Entity normalization is the process of standardizing entities to a common format in a job-posting analysis. Normalizing entities can help to reduce the variability in the data and make it easier to merge and analyze job postings across different sources. (Section 3.3.2. Entities Normalization) |
Classification | Online job-posting data may contain unstructured or ambiguous information, such as job titles or descriptions that are difficult to classify. | Text-mining and NLP techniques to classify job postings into well-defined categories, such as occupation, industry, and skills. (Section 3.4.1. Industry Extraction; Section 3.4.2. Occupation Extraction; Section 3.4.3. Skills Extraction; and Section 4. Skill-Extraction Use Case) |
Recruiters’ job postings | Many job postings are posted by recruiting companies and do not provide any information about the final employer, making it difficult to classify the ad in a certain industry. | Text-mining and NLP techniques on description field to extract industry information, if it exists. (Section 3.4.1. Industry Extraction) |
Changing labor-market dynamics | The labor market is continuously changing, with new occupations, skills, and industries emerging over time. | Machine-learning, data-driven methods. (Section 4. Skill-Extraction Use Case) |
Level 1 | Level 2 | Level 3 |
---|---|---|
Transversal skills | General workplace skills | General workplace skills |
Languages | Languages | |
Healthcare, social work, and research | Care and social work | Care and social work |
Scientific research | Scientific research | |
Healthcare | Medical specialist skills | |
Public health administration | ||
Psychology and mental health | ||
Physiotherapy | ||
Education | Education | Teaching |
Learning support | ||
Education management | ||
Extracurricular and sports activities | ||
Sales and communication | Communication | Multimedia and product design |
Marketing | ||
Public relations | ||
Customer services and sales | Customer services | |
Sales | ||
Procurement, logistics, and trade | International trade | |
Transport and logistics | ||
Procurement | ||
Information and communication technologies | Information and communication technologies | Data analytics |
Web and software development | ||
IT support services | ||
Security and cybersecurity | ||
Business administration, finance, and Law | Finance and law | Financial services |
Accounting | ||
Law | ||
Tax | ||
Business administration | Business and project administration | |
Office administration | ||
Human resources | ||
Engineering, construction, and maintenance | Manufacturing and engineering | Manufacturing and mechanical engineering |
Electrical engineering | ||
Civil engineering | ||
Construction, installation, and maintenance | Automotive maintenance and waste management | |
Workplace-safety management | ||
Horticulture, animal husbandry, and environment | ||
Electrical, heating, and ventilation installation | ||
Construction | ||
Food, cleaning, and hospitality | Food, cleaning, and hospitality | Food, hospitality, and beauty services |
Cleaning services |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tzimas, G.; Zotos, N.; Mourelatos, E.; Giotopoulos, K.C.; Zervas, P. From Data to Insight: Transforming Online Job Postings into Labor-Market Intelligence. Information 2024, 15, 496. https://doi.org/10.3390/info15080496
Tzimas G, Zotos N, Mourelatos E, Giotopoulos KC, Zervas P. From Data to Insight: Transforming Online Job Postings into Labor-Market Intelligence. Information. 2024; 15(8):496. https://doi.org/10.3390/info15080496
Chicago/Turabian StyleTzimas, Giannis, Nikos Zotos, Evangelos Mourelatos, Konstantinos C. Giotopoulos, and Panagiotis Zervas. 2024. "From Data to Insight: Transforming Online Job Postings into Labor-Market Intelligence" Information 15, no. 8: 496. https://doi.org/10.3390/info15080496
APA StyleTzimas, G., Zotos, N., Mourelatos, E., Giotopoulos, K. C., & Zervas, P. (2024). From Data to Insight: Transforming Online Job Postings into Labor-Market Intelligence. Information, 15(8), 496. https://doi.org/10.3390/info15080496