Recognising Occupational Titles in German Parliamentary Debates

Johanna Binnewitt


Abstract
The application of text mining methods is becoming more and more popular, not only in Digital Humanities (DH) and Computational Social Sciences (CSS) in general, but also in vocational education and training (VET) research. Employing algorithms offers the possibility to explore corpora that are simply too large for manual methods. However, challenges arise when dealing with abstract concepts like occupations or skills, which are crucial subjects of VET research. Since algorithms require concrete instructions, either in the form of rules or annotated examples, these abstract concepts must be broken down as part of the operationalisation process. In our paper, we tackle the task of identifying occupational titles in the plenary protocols of the German Bundestag. The primary focus lies in the comparative analysis of two distinct approaches: a dictionary-based method and a BERT fine-tuning approach. Both approaches are compared in a quantitative evaluation and applied to a larger corpus sample. Results indicate comparable precision for both approaches (0.93), but the BERT-based models outperform the dictionary-based approach in terms of recall (0.86 vs. 0.77). Errors in the dictionary-based method primarily stem from the ambiguity of occupational titles (e.g., ‘baker’ as both a surname and a profession) and missing terms in the dictionary. In contrast, the BERT model faces challenges in distinguishing occupational titles from other personal names, such as ‘mother’ or ‘Christians’.
Anthology ID:
2024.latechclfl-1.21
Volume:
Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)
Month:
March
Year:
2024
Address:
St. Julians, Malta
Editors:
Yuri Bizzoni, Stefania Degaetano-Ortlieb, Anna Kazantseva, Stan Szpakowicz
Venues:
LaTeCHCLfL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
221–230
Language:
URL:
https://aclanthology.org/2024.latechclfl-1.21
DOI:
Bibkey:
Cite (ACL):
Johanna Binnewitt. 2024. Recognising Occupational Titles in German Parliamentary Debates. In Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024), pages 221–230, St. Julians, Malta. Association for Computational Linguistics.
Cite (Informal):
Recognising Occupational Titles in German Parliamentary Debates (Binnewitt, LaTeCHCLfL-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.latechclfl-1.21.pdf
Supplementary material:
 2024.latechclfl-1.21.SupplementaryMaterial.zip