uppdf
bib
Proceedings of the Workshop on Terminology in the 21st century: many faces, many places
Rute Costa
|
Sara Carvalho
|
Ana Ostroški Anić
|
Anas Fahad Khan
pdf
bib
abs
Lexicon-driven approach for Terminology: specialized resources on the environment in Brazilian Portuguese
Flávia Lamberti Arraes
This paper presents a terminological research carried out to account for terms of the environment in Brazilian Portuguese based on a lexico-semantic perspective for Terminology (L’Homme, 2015, 2016, 2017, 2020; L’Homme et al., 2014, 2020). This work takes place in the context of a collaboration for the development of DiCoEnviro (Dictionnaire Fondamental de l’Environnment – Fundamental Dictionary on the environment), a multilingual terminological resource developed by the Observatoire de Linguistique Sens Texte at the University of Montreal, Canada. By following a methodolgy especially devised to develop terminological work based on a lexicon-driven approach (L’Homme et al., 2020), the terminological analysis reveals how the linguistic behavior of terms may be unveiled and how this is effective for identifying the meaning of a term and supporting meaning distinctions.
pdf
bib
abs
Knowledge Representation and Language Simplification of Human Rights
Sara Silecchia
|
Federica Vezzani
|
Giorgio Maria Di Nunzio
In this paper, we propose the description of a very recent interdisciplinary project aiming at analysing both the conceptual and linguistic dimensions of humanitarian rights terminology. This analysis will result in the form of a new knowledge-based multilingual terminological resource which is designed in order to meet the FAIR principles for Open Science and will serve, in the future, as a prototype for the development of a new software for the simplified rewriting of international legal texts relating to human rights. Given the early stage of the project, we will focus on the description of its rationale, the planned workflow, and the theoretical approach which will be adopted to achieve the main goal of this ambitious research project.
pdf
bib
abs
Converting from the Nordic Terminological Record Format to the TBX Format
Maria Skeppstedt
|
Marie Mattson
|
Magnus Ahltorp
|
Rickard Domeij
Rikstermbanken (Sweden’s National Term Bank), which was launched in 2009, uses the Nordic Terminological Record Format (NTRF) for organising its terminological data. Since then, new terminology formats have been established as standards, e.g., the Termbase eXchange format (TBX). We here describe work carried out by the Institute for Language and Folklore within the Federated eTranslation TermBank Network Action. This network develops a technical infrastructure for facilitating sharing of terminology resources throughout Europe. To be able to share some of the term collections of Rikstermbanken within this network and export them to Eurotermbank, we have implemented a conversion from the Nordic Terminological Record Format, as used in Rikstermbanken, to the TBX format.
pdf
bib
abs
A Dataset for Term Extraction in Hindi
Shubhanker Banerjee
|
Bharathi Raja Chakravarthi
|
John Philip McCrae
Automatic Term Extraction (ATE) is one of the core problems in natural language processing and forms a key component of text mining pipelines of domain specific corpora. Complex low-level tasks such as machine translation and summarization for domain specific texts necessitate the use of term extraction systems. However, the development of these systems requires the use of large annotated datasets and thus there has been little progress made on this front for under-resourced languages. As a part of ongoing research, we present a dataset for term extraction from Hindi texts in this paper. To the best of our knowledge, this is the first dataset that provides term annotated documents for Hindi. Furthermore, we have evaluated this dataset on statistical term extraction methods and the results obtained indicate the problems associated with development of term extractors for under-resourced languages.
pdf
bib
abs
Terminology extraction using co-occurrence patterns as predictors of semantic relevance
Rogelio Nazar
|
David Lindemann
We propose a method for automatic term extraction based on a statistical measure that ranks term candidates according to their semantic relevance to a specialised domain. As a measure of relevance we use term co-occurrence, defined as the repeated instantiation of two terms in the same sentences, in indifferent order and at variable distances. In this way, term candidates are ranked higher if they show a tendency to co-occur with a selected group of other units, as opposed to those showing more uniform distributions. No external resources are needed for the application of the method, but performance improves when provided with a pre-existing term list. We present results of the application of this method to a Spanish-English Linguistics corpus, and the evaluation compares favourably with a standard method based on reference corpora.
pdf
bib
abs
Evaluating Pre-Trained Language Models for Focused Terminology Extraction from Swedish Medical Records
Oskar Jerdhaf
|
Marina Santini
|
Peter Lundberg
|
Tomas Bjerner
|
Yosef Al-Abasse
|
Arne Jonsson
|
Thomas Vakili
In the experiments briefly presented in this abstract, we compare the performance of a generalist Swedish pre-trained language model with a domain-specific Swedish pre-trained model on the downstream task of focussed terminology extraction of implant terms, which are terms that indicate the presence of implants in the body of patients. The fine-tuning is identical for both models. For the search strategy we rely on KD-Tree that we feed with two different lists of term seeds, one with noise and one without noise. Results shows that the use of a domain-specific pre-trained language model has a positive impact on focussed terminology extraction only when using term seeds without noise.
pdf
bib
abs
D-Terminer: Online Demo for Monolingual and Bilingual Automatic Term Extraction
Ayla Rigouts Terryn
|
Veronique Hoste
|
Els Lefever
This contribution presents D-Terminer: an open access, online demo for monolingual and multilingual automatic term extraction from parallel corpora. The monolingual term extraction is based on a recurrent neural network, with a supervised methodology that relies on pretrained embeddings. Candidate terms can be tagged in their original context and there is no need for a large corpus, as the methodology will work even for single sentences. With the bilingual term extraction from parallel corpora, potentially equivalent candidate term pairs are extracted from translation memories and manual annotation of the results shows that good equivalents are found for most candidate terms. Accompanying the release of the demo is an updated version of the ACTER Annotated Corpora for Term Extraction Research (version 1.5).