Thomas Green


2022

pdf bib
Development of a Benchmark Corpus to Support Entity Recognition in Job Descriptions
Thomas Green | Diana Maynard | Chenghua Lin
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We present the development of a benchmark suite consisting of an annotation schema, training corpus and baseline model for Entity Recognition (ER) in job descriptions, published under a Creative Commons license. This was created to address the distinct lack of resources available to the community for the extraction of salient entities, such as skills, from job descriptions. The dataset contains 18.6k entities comprising five types (Skill, Qualification, Experience, Occupation, and Domain). We include a benchmark CRF-based ER model which achieves an F1 score of 0.59. Through the establishment of a standard definition of entities and training/testing corpus, the suite is designed as a foundation for future work on tasks such as the development of job recommender systems.