Eva Bühlmann

2022

Evaluation of Transfer Learning and Domain Adaptation for Analyzing German-Speaking Job Advertisements
Ann-Sophie Gnehm | Eva Bühlmann | Simon Clematide
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper presents text mining approaches on German-speaking job advertisements to enable social science research on the development of the labour market over the last 30 years. In order to build text mining applications providing information about profession and main task of a job, as well as experience and ICT skills needed, we experiment with transfer learning and domain adaptation. Our main contribution consists in building language models which are adapted to the domain of job advertisements, and their assessment on a broad range of machine learning problems. Our findings show the large value of domain adaptation in several respects. First, it boosts the performance of fine-tuned task-specific models consistently over all evaluation experiments. Second, it helps to mitigate rapid data shift over time in our special domain, and enhances the ability to learn from small updates with new, labeled task data. Third, domain-adaptation of language models is efficient: With continued in-domain pre-training we are able to outperform general-domain language models pre-trained on ten times more data. We share our domain-adapted language models and data with the research community.

pdf bib abs

Fine-Grained Extraction and Classification of Skill Requirements in German-Speaking Job Ads
Ann-sophie Gnehm | Eva Bühlmann | Helen Buchs | Simon Clematide
Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)

Monitoring the development of labor market skill requirements is an information need that is more and more approached by applying text mining methods to job advertisement data. We present an approach for fine-grained extraction and classification of skill requirements from German-speaking job advertisements. We adapt pre-trained transformer-based language models to the domain and task of computing meaningful representations of sentences or spans. By using context from job advertisements and the large ESCO domain ontology we improve our similarity-based unsupervised multi-label classification results. Our best model achieves a mean average precision of 0.969 on the skill class level.

Co-authors

Venues

lrec1
nlpcss1

Fix author