Jordi Carrera


pdf bib
Technology for Translators: What doesn’t Kill you, Makes you Stronger
Jordi Carrera | Alex Yanishevsky
Proceedings of Machine Translation Summit XII: Commercial MT User Program


pdf bib
Complete and Consistent Annotation of WordNet using the Top Concept Ontology
Javier Álvez | Jordi Atserias | Jordi Carrera | Salvador Climent | Egoitz Laparra | Antoni Oliver | German Rigau
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper presents the complete and consistent ontological annotation of the nominal part of WordNet. The annotation has been carried out using the semantic features defined in the EuroWordNet Top Concept Ontology and made available to the NLP community. Up to now only an initial core set of 1,024 synsets, the so-called Base Concepts, was ontologized in such a way. The work has been achieved by following a methodology based on an iterative and incremental expansion of the initial labeling through the hierarchy while setting inheritance blockage points. Since this labeling has been set on the EuroWordNet’s Interlingual Index (ILI), it can be also used to populate any other wordnet linked to it through a simple porting process. This feature-annotated WordNet is intended to be useful for a large number of semantic NLP tasks and for testing for the first time componential analysis on real environments. Moreover, the quantitative analysis of the work shows that more than 40% of the nominal part of WordNet is involved in structure errors or inadequacies.

pdf bib
Towards Spanish Verbs’ Selectional Preferences Automatic Acquisition: Semantic Annotation of the SenSem Corpus
Jordi Carrera | Irene Castellón | Salvador Climent | Marta Coll-Florit
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We present the results of an agreement task carried out in the framework of the KNOW Project and consisting in manually annotating an agreement sample totaling 50 sentences extracted from the SenSem corpus. Diambiguation was carried out for all nouns, proper nouns and adjectives in the sample, all of which were assigned EuroWordNet (EWN) synsets. As a result of the task, Spanish WN has been shown to exhibit 1) lack of explanatory clarity (it does not define word meanings, but glosses and examplifies them instead; it does not systematically encode metaphoric meanings, either); 2) structural inadequacy (some words appear as hyponyms of another sense of the same word; sometimes there even coexist in Spanish WN a general sense and a specific one related to the same concept, but with no structural link in between; hyperonymy relationships have been detected that are likely to raise doubts to human annotators; there can even be found cases of auto-hyponymy); 3) cross-linguistic inconsistency (there exist in English EWN concepts whose lexical equivalent is missing in Spanish WN; glosses in one language more often than not contradict or diverge from glosses in another language).