JobSkape: A Framework for Generating Synthetic Job Postings to Enhance Skill Matching

Antoine Magron, Anna Dai, Mike Zhang, Syrielle Montariol, Antoine Bosselut


Abstract
Recent approaches in skill matching, employing synthetic training data for classification or similarity model training, have shown promising results, reducing the need for time-consuming and expensive annotations. However, previous synthetic datasets have limitations, such as featuring only one skill per sentence and generally comprising short sentences. In this paper, we introduce JobSkape, a framework to generate synthetic data that tackles these limitations, specifically designed to enhance skill-to-taxonomy matching. Within this framework, we create SkillSkape, a comprehensive open-source synthetic dataset of job postings tailored for skill-matching tasks. We introduce several offline metrics that show that our dataset resembles real-world data. Additionally, we present a multi-step pipeline for skill extraction and matching tasks using large language models (LLMs), benchmarking against known supervised methodologies. We outline that the downstream evaluation results on real-world data can beat baselines, underscoring its efficacy and adaptability.
Anthology ID:
2024.nlp4hr-1.4
Volume:
Proceedings of the First Workshop on Natural Language Processing for Human Resources (NLP4HR 2024)
Month:
March
Year:
2024
Address:
St. Julian’s, Malta
Editors:
Estevam Hruschka, Thom Lake, Naoki Otani, Tom Mitchell
Venues:
NLP4HR | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
43–58
Language:
URL:
https://aclanthology.org/2024.nlp4hr-1.4
DOI:
Bibkey:
Cite (ACL):
Antoine Magron, Anna Dai, Mike Zhang, Syrielle Montariol, and Antoine Bosselut. 2024. JobSkape: A Framework for Generating Synthetic Job Postings to Enhance Skill Matching. In Proceedings of the First Workshop on Natural Language Processing for Human Resources (NLP4HR 2024), pages 43–58, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):
JobSkape: A Framework for Generating Synthetic Job Postings to Enhance Skill Matching (Magron et al., NLP4HR-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.nlp4hr-1.4.pdf