Beatriz Guerrero García
2025
Legal Terminology Extraction in Spanish: Gold-standard Generation and LLM Evaluation
Lucia Palacios Palacios
|
Beatriz Guerrero García
|
Patricia Martín Chozas
|
Elena Montiel Ponsoda
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
This study aims to develop a gold-standard for terminological extraction in Castilian Spanish within the domain of labour law. To achieve this, a methodology was developed based on established linguistic theories and reviewed by a team of expert terminologists. Departing from previous extraction studies and reference theoretical frameworks, candidate terms were identified by their morphosyntactic patterns, enriched by assessing their degree of specialisation in reference resources. The candidate terms were then subjected to manual validation. To evaluate its applicability, we assessed the performance of the LLaMA3-8B and Mistral-7B language models in extracting labour law terms from the latest version of the Real Decreto Legislativo 2/2015 Ley del Estatuto de los Trabajadores. YAKE was also included as a statistical baseline for comparison between traditional methods and generative approaches. All models were evaluated against the validated gold-standard.