2025
pdf
bib
abs
Terminology Enhanced Retrieval Augmented Generation for Spanish Legal Corpora
Patricia Martín Chozas
|
Pablo Calleja
|
Carlos Rodríguez Limón
Proceedings of the 5th Conference on Language, Data and Knowledge
This paper intends to highlight the importance of reusing terminologies in the context of Large Language Models (LLMs), particularly within a Retrieval-Augmented Generation (RAG) scenario. We explore the application of query expansion techniques using a controlled terminology enriched with synonyms. Our case study focuses on the Spanish legal domain, investigating both query expansion and improvements in retrieval effectiveness within the RAG model. The experimental setup includes various LLMs, such as Mistral, LLaMA3.2, and Granite 3, along with multiple Spanish-language embedding models. The results demonstrate that integrating current neural approaches with linguistic resources enhances RAG performance, reinforcing the role of structured lexical and terminological knowledge in modern NLP pipelines.
pdf
bib
abs
Bringing IATE into the Semantic Web Family
Paula Diez Ibarbia
|
Patricia Martín Chozas
|
Elena Montiel Ponsoda
Proceedings of the 5th Conference on Language, Data and Knowledge: The 5th OntoLex Workshop
This paper is an extension of previous work by the authors and other researchers that studies the application of the OntoLex-lemon model for representing the InterActive Terminology for Europe (IATE) database in the Semantic Web. While traditional XML-based approaches have been effective for multilingual terminological work, the Semantic Web enables richer, more interoperable representations. The study evaluates the suitability of OntoLex-lemon for modeling IATE’s complex structure and identifies limitations in existing vocabularies. To address these, this paper tries to identify orher existing vocabularies and ontologies that could satisfy those limitations, which include term reliability, regional usage, lifecycle statuses, lookup forms, and concept cross-references. Still, some representation requirements are not covered by existing vocabularies and may need to be further discussed within the community.
2024
pdf
bib
Proceedings of the 9th Workshop on Linked Data in Linguistics @ LREC-COLING 2024
Christian Chiarcos
|
Katerina Gkirtzou
|
Maxim Ionov
|
Fahad Khan
|
John P. McCrae
|
Elena Montiel Ponsoda
|
Patricia Martín Chozas
Proceedings of the 9th Workshop on Linked Data in Linguistics @ LREC-COLING 2024
2020
pdf
bib
abs
Defying Wikidata: Validation of Terminological Relations in the Web of Data
Patricia Martín-Chozas
|
Sina Ahmadi
|
Elena Montiel-Ponsoda
Proceedings of the Twelfth Language Resources and Evaluation Conference
In this paper we present an approach to validate terminological data retrieved from open encyclopaedic knowledge bases. This need arises from the enrichment of automatically extracted terms with information from existing resources in theLinguistic Linked Open Data cloud. Specifically, the resource employed for this enrichment is WIKIDATA, since it is one of the biggest knowledge bases freely available within the Semantic Web. During the experiment, we noticed that certain RDF properties in the Knowledge Base did not contain the data they are intended to represent, but a different type of information. In this paper we propose an approach to validate the retrieved data based on four axioms that rely on two linguistic theories: the x-bar theory and the multidimensional theory of terminology. The validation process is supported by a second knowledge base specialised in linguistic data; in this case, CONCEPTNET. In our experiment, we validate terms from the legal domain in four languages: Dutch, English, German and Spanish. The final aim is to generate a set of sound and reliable terminological resources in RDF to contribute to the population of the Linguistic Linked Open Data cloud.