Jorge Gracia


2023

pdf bib
MEAN: Metaphoric Erroneous ANalogies dataset for PTLMs metaphor knowledge probing
Lucia Pitarch | Jordi Bernad | Jorge Gracia
Proceedings of the 4th Conference on Language, Data and Knowledge

pdf bib
No clues good clues: out of context Lexical Relation Classification
Lucia Pitarch | Jordi Bernad | Lacramioara Dranca | Carlos Bobed Lisbona | Jorge Gracia
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The accurate prediction of lexical relations between words is a challenging task in Natural Language Processing (NLP). The most recent advances in this direction come with the use of pre-trained language models (PTLMs). A PTLM typically needs “well-formed” verbalized text to interact with it, either to fine-tune it or to exploit it. However, there are indications that commonly used PTLMs already encode enough linguistic knowledge to allow the use of minimal (or none) textual context for some linguistically motivated tasks, thus notably reducing human effort, the need for data pre-processing, and favoring techniques that are language neutral since do not rely on syntactic structures. In this work, we explore this idea for the tasks of lexical relation classification (LRC) and graded Lexical Entailment (LE). After fine-tuning PTLMs for LRC with different verbalizations, our evaluation results show that very simple prompts are competitive for LRC and significantly outperform graded LE SoTA. In order to gain a better insight into this phenomenon, we perform a number of quantitative statistical analyses on the results, as well as a qualitative visual exploration based on embedding projections.

2022

pdf bib
Cross-Lingual Link Discovery for Under-Resourced Languages
Michael Rosner | Sina Ahmadi | Elena-Simona Apostol | Julia Bosque-Gil | Christian Chiarcos | Milan Dojchinovski | Katerina Gkirtzou | Jorge Gracia | Dagmar Gromann | Chaya Liebeskind | Giedrė Valūnaitė Oleškevičienė | Gilles Sérasset | Ciprian-Octavian Truică
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this paper, we provide an overview of current technologies for cross-lingual link discovery, and we discuss challenges, experiences and prospects of their application to under-resourced languages. We rst introduce the goals of cross-lingual linking and associated technologies, and in particular, the role that the Linked Data paradigm (Bizer et al., 2011) applied to language data can play in this context. We de ne under-resourced languages with a speci c focus on languages actively used on the internet, i.e., languages with a digitally versatile speaker community, but limited support in terms of language technology. We argue that languages for which considerable amounts of textual data and (at least) a bilingual word list are available, techniques for cross-lingual linking can be readily applied, and that these enable the implementation of downstream applications for under-resourced languages via the localisation and adaptation of existing technologies and resources.

pdf bib
TIAD 2022: The Fifth Translation Inference Across Dictionaries Shared Task
Jorge Gracia | Besim Kabashi | Ilan Kernerman
Proceedings of Globalex Workshop on Linked Lexicography within the 13th Language Resources and Evaluation Conference

The objective of the Translation Inference Across Dictionaries (TIAD) series of shared tasks is to explore and compare methods and techniques that infer translations indirectly between language pairs, based on other bilingual/multilingual lexicographic resources. In this fifth edition, the participating systems were asked to generate new translations automatically among three languages - English, French, Portuguese - based on known indirect translations contained in the Apertium RDF graph. Such evaluation pairs have been the same during the four last TIAD editions. Since the fourth edition, however, a larger graph is used as a basis to produce the translations, namely Apertium RDF v2. The evaluation of the results was carried out by the organisers against manually compiled language pairs of K Dictionaries. For the second time in the TIAD series, some systems beat the proposed baselines. This paper gives an overall description of the shard task, the evaluation data and methodology, and the systems’ results.

pdf bib
Fuzzy Lemon: Making Lexical Semantic Relations More Juicy
Fernando Bobillo | Julia Bosque-Gil | Jorge Gracia | Marta Lanau-Coronas
Proceedings of the 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference

The OntoLex-Lemon model provides a vocabulary to enrich ontologies with linguistic information that can be exploited by Natural Language Processing applications. The increasing uptake of Lemon illustrates the growing interest in combining linguistic information and Semantic Web technologies. In this paper, we present Fuzzy Lemon, an extension of Lemon that allows to assign an uncertainty degree to lexical semantic relations. Our approach is based on an OWL ontology that defines a hierarchy of data properties encoding different types of uncertainty. We also illustrate the usefulness of Fuzzy Lemon by showing that it can be used to represent the confidence degrees of automatically discovered translations between pairs of bilingual dictionaries from the Apertium family.

pdf bib
A Survey of Guidelines and Best Practices for the Generation, Interlinking, Publication, and Validation of Linguistic Linked Data
Fahad Khan | Christian Chiarcos | Thierry Declerck | Maria Pia Di Buono | Milan Dojchinovski | Jorge Gracia | Giedre Valunaite Oleskeviciene | Daniela Gifu
Proceedings of the 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference

This article discusses a survey carried out within the NexusLinguarum COST Action which aimed to give an overview of existing guidelines (GLs) and best practices (BPs) in linguistic linked data. In particular it focused on four core tasks in the production/publication of linked data: generation, interlinking, publication, and validation. We discuss the importance of GLs and BPs for LLD before describing the survey and its results in full. Finally we offer a number of directions for future work in order to address the findings of the survey.

2020

pdf bib
Proceedings of the 2020 Globalex Workshop on Linked Lexicography
Ilan Kernerman | Simon Krek | John P. McCrae | Jorge Gracia | Sina Ahmadi | Besim Kabashi
Proceedings of the 2020 Globalex Workshop on Linked Lexicography

pdf bib
Graph Exploration and Cross-lingual Word Embeddings for Translation Inference Across Dictionaries
Marta Lanau-Coronas | Jorge Gracia
Proceedings of the 2020 Globalex Workshop on Linked Lexicography

This paper describes the participation of two different approaches in the 3rd Translation Inference Across Dictionaries (TIAD 2020) shared task. The aim of the task is to automatically generate new bilingual dictionaries from existing ones. To that end, we essayed two different types of techniques: based on graph exploration on the one hand and, on the other hand, based on cross-lingual word embeddings. The task evaluation results show that graph exploration is very effective, accomplishing relatively high precision and recall values in comparison with the other participating systems, while cross-lingual embeddings reaches high precision but smaller recall.

pdf bib
Orchestrating NLP Services for the Legal Domain
Julian Moreno-Schneider | Georg Rehm | Elena Montiel-Ponsoda | Víctor Rodriguez-Doncel | Artem Revenko | Sotirios Karampatakis | Maria Khvalchik | Christian Sageder | Jorge Gracia | Filippo Maganza
Proceedings of the Twelfth Language Resources and Evaluation Conference

Legal technology is currently receiving a lot of attention from various angles. In this contribution we describe the main technical components of a system that is currently under development in the European innovation project Lynx, which includes partners from industry and research. The key contribution of this paper is a workflow manager that enables the flexible orchestration of workflows based on a portfolio of Natural Language Processing and Content Curation services as well as a Multilingual Legal Knowledge Graph that contains semantic information and meaningful references to legal documents. We also describe different use cases with which we experiment and develop prototypical solutions.

pdf bib
Recent Developments for the Linguistic Linked Open Data Infrastructure
Thierry Declerck | John Philip McCrae | Matthias Hartung | Jorge Gracia | Christian Chiarcos | Elena Montiel-Ponsoda | Philipp Cimiano | Artem Revenko | Roser Saurí | Deirdre Lee | Stefania Racioppa | Jamal Abdul Nasir | Matthias Orlikowsk | Marta Lanau-Coronas | Christian Fäth | Mariano Rico | Mohammad Fazleh Elahi | Maria Khvalchik | Meritxell Gonzalez | Katharine Cooney
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper we describe the contributions made by the European H2020 project “Prêt-à-LLOD” (‘Ready-to-use Multilingual Linked Language Data for Knowledge Services across Sectors’) to the further development of the Linguistic Linked Open Data (LLOD) infrastructure. Prêt-à-LLOD aims to develop a new methodology for building data value chains applicable to a wide range of sectors and applications and based around language resources and language technologies that can be integrated by means of semantic technologies. We describe the methods implemented for increasing the number of language data sets in the LLOD. We also present the approach for ensuring interoperability and for porting LLOD data sets and services to other infrastructures, as well as the contribution of the projects to existing standards.

pdf bib
Proceedings of the 7th Workshop on Linked Data in Linguistics (LDL-2020)
Maxim Ionov | John P. McCrae | Christian Chiarcos | Thierry Declerck | Julia Bosque-Gil | Jorge Gracia
Proceedings of the 7th Workshop on Linked Data in Linguistics (LDL-2020)

2019

pdf bib
Developing and Orchestrating a Portfolio of Natural Legal Language Processing and Document Curation Services
Georg Rehm | Julián Moreno-Schneider | Jorge Gracia | Artem Revenko | Victor Mireles | Maria Khvalchik | Ilan Kernerman | Andis Lagzdins | Marcis Pinnis | Artus Vasilevskis | Elena Leitner | Jan Milde | Pia Weißenhorn
Proceedings of the Natural Legal Language Processing Workshop 2019

We present a portfolio of natural legal language processing and document curation services currently under development in a collaborative European project. First, we give an overview of the project and the different use cases, while, in the main part of the article, we focus upon the 13 different processing services that are being deployed in different prototype applications using a flexible and scalable microservices architecture. Their orchestration is operationalised using a content and document curation workflow manager.

2016

pdf bib
Leveraging RDF Graphs for Crossing Multiple Bilingual Dictionaries
Marta Villegas | Maite Melero | Núria Bel | Jorge Gracia
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The experiments presented here exploit the properties of the Apertium RDF Graph, principally cycle density and nodes’ degree, to automatically generate new translation relations between words, and therefore to enrich existing bilingual dictionaries with new entries. Currently, the Apertium RDF Graph includes data from 22 Apertium bilingual dictionaries and constitutes a large unified array of linked lexical entries and translations that are available and accessible on the Web (http://linguistic.linkeddata.es/apertium/). In particular, its graph structure allows for interesting exploitation opportunities, some of which are addressed in this paper. Two ‘massive’ experiments are reported: in the first one, the original EN-ES translation set was removed from the Apertium RDF Graph and a new EN-ES version was generated. The results were compared against the previously removed EN-ES data and against the Concise Oxford Spanish Dictionary. In the second experiment, a new non-existent EN-FR translation set was generated. In this case the results were compared against a converted wiktionary English-French file. The results we got are really good and perform well for the extreme case of correlated polysemy. This lead us to address the possibility to use cycles and nodes degree to identify potential oddities in the source data. If cycle density proves efficient when considering potential targets, we can assume that in dense graphs nodes with low degree may indicate potential errors.

pdf bib
The Open Linguistics Working Group: Developing the Linguistic Linked Open Data Cloud
John Philip McCrae | Christian Chiarcos | Francis Bond | Philipp Cimiano | Thierry Declerck | Gerard de Melo | Jorge Gracia | Sebastian Hellmann | Bettina Klimek | Steven Moran | Petya Osenova | Antonio Pareja-Lora | Jonathan Pool
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The Open Linguistics Working Group (OWLG) brings together researchers from various fields of linguistics, natural language processing, and information technology to present and discuss principles, case studies, and best practices for representing, publishing and linking linguistic data collections. A major outcome of our work is the Linguistic Linked Open Data (LLOD) cloud, an LOD (sub-)cloud of linguistic resources, which covers various linguistic databases, lexicons, corpora, terminologies, and metadata repositories. We present and summarize five years of progress on the development of the cloud and of advancements in open data in linguistics, and we describe recent community activities. The paper aims to serve as a guideline to orient and involve researchers with the community and/or Linguistic Linked Open Data.

2015

pdf bib
Reconciling Heterogeneous Descriptions of Language Resources
John Philip McCrae | Philipp Cimiano | Victor Rodríguez Doncel | Daniel Vila-Suero | Jorge Gracia | Luca Matteis | Roberto Navigli | Andrejs Abele | Gabriela Vulcu | Paul Buitelaar
Proceedings of the 4th Workshop on Linked Data in Linguistics: Resources and Applications

2014

pdf bib
Enabling Language Resources to Expose Translations as Linked Data on the Web
Jorge Gracia | Elena Montiel-Ponsoda | Daniel Vila-Suero | Guadalupe Aguado-de-Cea
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Language resources, such as multilingual lexica and multilingual electronic dictionaries, contain collections of lexical entries in several languages. Having access to the corresponding explicit or implicit translation relations between such entries might be of great interest for many NLP-based applications. By using Semantic Web-based techniques, translations can be available on the Web to be consumed by other (semantic enabled) resources in a direct manner, not relying on application-specific formats. To that end, in this paper we propose a model for representing translations as linked data, as an extension of the lemon model. Our translation module represents some core information associated to term translations and does not commit to specific views or translation theories. As a proof of concept, we have extracted the translations of the terms contained in Terminesp, a multilingual terminological database, and represented them as linked data. We have made them accessible on the Web both for humans (via a Web interface) and software agents (with a SPARQL endpoint).

2012

pdf bib
Using Cross-Lingual Explicit Semantic Analysis for Improving Ontology Translation
Kartik Asooja | Jorge Gracia | Nitish Aggarwal | Asunción Gómez Pérez
Proceedings of the Second Workshop on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid MT