2024
pdf
bib
abs
Humanitarian Corpora for English, French and Spanish
Loryn Isaacs
|
Santiago Chambó
|
Pilar León-Araúz
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
This paper presents three corpora of English, French and Spanish humanitarian documents compiled with reports obtained from ReliefWeb through its API. ReliefWeb is a leading database of humanitarian documents operated by the UN Office for the Coordination of Humanitarian Affairs (OCHA). To compile these corpora, documents were selected with language identification and noise reduction techniques. They were subsequently tokenized, lemmatized, tagged by part of speech, and enriched with metadata for use by linguists in corpus query software. These corpora were compiled to satisfy the research needs of the Humanitarian Encyclopedia, a project with a focus on conceptual variation. However, they can also be useful for other humanitarian endeavors, whether they are research- or practitioner-oriented; the source code for generating the corpora is available on GitHub. To compare materials, an exploratory analysis of definitional and generic-specific information was conducted for the concept of ARMED ACTOR with lexical data extracted from an English legacy corpus (where the concept is underrepresented) as well as on the new English and Spanish corpora. Lexical data were compared among corpora and presented by means of online data visualization to illustrate its potential to inform conceptual modelling.
pdf
bib
abs
Ideological Knowledge Representation: Framing Climate Change in EcoLexicon
Arianne Reimerink
|
Melania Cabezas-García
|
Pilar León-Araúz
|
Pamela Faber
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Culture is underrepresented in terminological resources and ideology is an especially complicated cultural aspect to convey. This complexity stems from the intertwined relationships among the discourse community of politicians, the media and the general public, as well as their interactions with scientific knowledge. Nevertheless, terminological resources should provide the necessary information to understand the political perspective taken in discourse on scientific issues with a high political profile. As in all specialized domains, environmental concepts and terms are subject to dynamism and variation (León-Araúz, 2017). Cognitive term variants (e.g., climate change, climate crisis) are of particular interest because of their presence in political discourse and their potential to influence climate actions. They can be used to reflect multidimensionality, imprecision or ideological attachment. This paper describes a method based on framing in Communication Studies to extract ideological knowledge from corpora. We used Spanish and English parliamentary debates (ParlaMint 2.1) and annotated the interventions that included a term variant of climate change according to an adapted version of the frames proposed by Bolsen and Shapiro (2018). The results showed how climate change discourse changes across de ideological spectrum and we give a proposal on how to represent that knowledge in an environmental TKB on the environment.
2020
pdf
bib
abs
Representing Multiword Term Variation in a Terminological Knowledge Base: a Corpus-Based Study
Pilar León-Araúz
|
Arianne Reimerink
|
Melania Cabezas-García
Proceedings of the Twelfth Language Resources and Evaluation Conference
In scientific and technical communication, multiword terms are the most frequent type of lexical units. Rendering them in another language is not an easy task due to their cognitive complexity, the proliferation of different forms, and their unsystematic representation in terminographic resources. This often results in a broad spectrum of translations for multiword terms, which also foment term variation since they consist of two or more constituents. In this study we carried out a quantitative and qualitative analysis of Spanish translation variants of a set of environment-related concepts by evaluating equivalents in three parallel corpora, two comparable corpora and two terminological resources. Our results showed that MWTs exhibit a significant degree of term variation of different characteristics, which were used to establish a set of criteria according to which term variants should be selected, organized and described in terminological knowledge bases.
pdf
bib
abs
Extraction of Hyponymic Relations in French with Knowledge-Pattern-Based Word Sketches
Antonio San Martín
|
Catherine Trekker
|
Pilar León-Araúz
Proceedings of the Twelfth Language Resources and Evaluation Conference
Hyponymy is the cornerstone of taxonomies and concept hierarchies. However, the extraction of hypernym-hyponym pairs from a corpus can be time-consuming, and reconstructing the hierarchical network of a domain is often an extremely complex process. This paper presents the development and evaluation of the French EcoLexicon Semantic Sketch Grammar (ESSG-fr), a French hyponymic sketch grammar for Sketch Engine based on knowledge patterns. It offers a user-friendly way of extracting hyponymic pairs in the form of word sketches in any user-owned corpus. The ESSG-fr contains three times more hyponymic patterns than its English counterpart and has been tested in a multidisciplinary corpus. It is thus expected to be domain-independent. Moreover, the following methodological innovations have been included in its development: (1) use of English hyponymic patterns in a parallel corpus to find new French patterns; (2) automatic inclusion of the results of the Sketch Engine thesaurus to find new variants of the patterns. As for its evaluation, the ESSG-fr returns 70% valid hyperonyms and hyponyms, measured on 180 extracted pairs of terms in three different domains.
2018
pdf
bib
Manzanilla: An Image Annotation Tool for TKB Building
Arianne Reimerink
|
Pilar León-Araúz
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
pdf
bib
Evaluating EcoLexiCAT: a Terminology-Enhanced CAT Tool
Pilar León-Araúz
|
Arianne Reimerink
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
pdf
bib
Towards the Inference of Semantic Relations in Complex Nominals: a Pilot Study
Melania Cabezas-García
|
Pilar León-Araúz
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
2016
pdf
bib
abs
Pattern-based Word Sketches for the Extraction of Semantic Relations
Pilar León-Araúz
|
Antonio San Martín
|
Pamela Faber
Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016)
Despite advances in computer technology, terminologists still tend to rely on manual work to extract all the semantic information that they need for the description of specialized concepts. In this paper we propose the creation of new word sketches in Sketch Engine for the extraction of semantic relations. Following a pattern-based approach, new sketch grammars are devel-oped in order to extract some of the most common semantic relations used in the field of ter-minology: generic-specific, part-whole, location, cause and function.
2010
pdf
bib
abs
EcoLexicon: An Environmental TKB
Arianne Reimerink
|
Pilar León Araúz
|
Pedro J. Magaña Redondo
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
EcoLexicon, a multilingual knowledge resource on the environment, provides an internally coherent information system covering a wide range of specialized linguistic and conceptual needs. Data in our terminological knowledge base (TKB) are primarily hosted in a relational database which is now linked to an ontology in order to apply reasoning techniques and enhance user queries. The advantages of ontological reasoning can only be obtained if conceptual description is based on systematic criteria and a wide inventory of non-hierarchical relations, which confer dynamism to knowledge representation. Thus, our research has mainly focused on conceptual modelling and providing a user-friendly multimodal interface. The dynamic interface, which combines conceptual (networks and definitions), linguistic (contexts, concordances) and graphical information offers users the freedom to surf it according to their needs. Furthermore, dynamism is also present at the representational level. Contextual constraints have been applied to reconceptualise versatile concepts that cause a great deal of information overload.