Patricia Martín-Chozas

Also published as: Patricia Martín Chozas, Patricia Martín Chozas

2025

Bringing IATE into the Semantic Web Family
Paula Diez Ibarbia | Patricia Martín Chozas | Elena Montiel Ponsoda
Proceedings of the 5th Conference on Language, Data and Knowledge: The 5th OntoLex Workshop

This paper is an extension of previous work by the authors and other researchers that studies the application of the OntoLex-lemon model for representing the InterActive Terminology for Europe (IATE) database in the Semantic Web. While traditional XML-based approaches have been effective for multilingual terminological work, the Semantic Web enables richer, more interoperable representations. The study evaluates the suitability of OntoLex-lemon for modeling IATE’s complex structure and identifies limitations in existing vocabularies. To address these, this paper tries to identify orher existing vocabularies and ontologies that could satisfy those limitations, which include term reliability, regional usage, lifecycle statuses, lookup forms, and concept cross-references. Still, some representation requirements are not covered by existing vocabularies and may need to be further discussed within the community.

pdf bib abs

Legal Terminology Extraction in Spanish: Gold-standard Generation and LLM Evaluation
Lucia Palacios Palacios | Beatriz Guerrero García | Patricia Martín Chozas | Elena Montiel Ponsoda
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

This study aims to develop a gold-standard for terminological extraction in Castilian Spanish within the domain of labour law. To achieve this, a methodology was developed based on established linguistic theories and reviewed by a team of expert terminologists. Departing from previous extraction studies and reference theoretical frameworks, candidate terms were identified by their morphosyntactic patterns, enriched by assessing their degree of specialisation in reference resources. The candidate terms were then subjected to manual validation. To evaluate its applicability, we assessed the performance of the LLaMA3-8B and Mistral-7B language models in extracting labour law terms from the latest version of the Real Decreto Legislativo 2/2015 Ley del Estatuto de los Trabajadores. YAKE was also included as a statistical baseline for comparison between traditional methods and generative approaches. All models were evaluated against the validated gold-standard.

pdf bib abs

Terminology Enhanced Retrieval Augmented Generation for Spanish Legal Corpora
Patricia Martín Chozas | Pablo Calleja | Carlos Rodríguez Limón
Proceedings of the 5th Conference on Language, Data and Knowledge

This paper intends to highlight the importance of reusing terminologies in the context of Large Language Models (LLMs), particularly within a Retrieval-Augmented Generation (RAG) scenario. We explore the application of query expansion techniques using a controlled terminology enriched with synonyms. Our case study focuses on the Spanish legal domain, investigating both query expansion and improvements in retrieval effectiveness within the RAG model. The experimental setup includes various LLMs, such as Mistral, LLaMA3.2, and Granite 3, along with multiple Spanish-language embedding models. The results demonstrate that integrating current neural approaches with linguistic resources enhances RAG performance, reinforcing the role of structured lexical and terminological knowledge in modern NLP pipelines.

2024

pdf bib

2020

pdf bib abs

Defying Wikidata: Validation of Terminological Relations in the Web of Data
Patricia Martín-Chozas | Sina Ahmadi | Elena Montiel-Ponsoda
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper we present an approach to validate terminological data retrieved from open encyclopaedic knowledge bases. This need arises from the enrichment of automatically extracted terms with information from existing resources in theLinguistic Linked Open Data cloud. Specifically, the resource employed for this enrichment is WIKIDATA, since it is one of the biggest knowledge bases freely available within the Semantic Web. During the experiment, we noticed that certain RDF properties in the Knowledge Base did not contain the data they are intended to represent, but a different type of information. In this paper we propose an approach to validate the retrieved data based on four axioms that rely on two linguistic theories: the x-bar theory and the multidimensional theory of terminology. The validation process is supported by a second knowledge base specialised in linguistic data; in this case, CONCEPTNET. In our experiment, we validate terms from the legal domain in four languages: Dutch, English, German and Spanish. The final aim is to generate a set of sound and reliable terminological resources in RDF to contribute to the population of the Linguistic Linked Open Data cloud.

Co-authors

Katerina Gkirtzou 1

Beatriz Guerrero García 1

Maxim Ionov 1

Fahad Khan 1

Carlos Rodríguez Limón 1

John Philip McCrae 1

Lucia Palacios Palacios 1

Venues

RANLP1

Fix author