The ACL RD-TEC 2.0: A Language Resource for Evaluating Term Extraction and Entity Recognition Methods

Behrang QasemiZadeh, Anne-Kathrin Schumann


Abstract
This paper introduces the ACL Reference Dataset for Terminology Extraction and Classification, version 2.0 (ACL RD-TEC 2.0). The ACL RD-TEC 2.0 has been developed with the aim of providing a benchmark for the evaluation of term and entity recognition tasks based on specialised text from the computational linguistics domain. This release of the corpus consists of 300 abstracts from articles in the ACL Anthology Reference Corpus, published between 1978–2006. In these abstracts, terms (i.e., single or multi-word lexical units with a specialised meaning) are manually annotated. In addition to their boundaries in running text, annotated terms are classified into one of the seven categories method, tool, language resource (LR), LR product, model, measures and measurements, and other. To assess the quality of the annotations and to determine the difficulty of this annotation task, more than 171 of the abstracts are annotated twice, independently, by each of the two annotators. In total, 6,818 terms are identified and annotated in more than 1300 sentences, resulting in a specialised vocabulary made of 3,318 lexical forms, mapped to 3,471 concepts. We explain the development of the annotation guidelines and discuss some of the challenges we encountered in this annotation task.
Anthology ID:
L16-1294
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1862–1868
Language:
URL:
https://aclanthology.org/L16-1294
DOI:
Bibkey:
Cite (ACL):
Behrang QasemiZadeh and Anne-Kathrin Schumann. 2016. The ACL RD-TEC 2.0: A Language Resource for Evaluating Term Extraction and Entity Recognition Methods. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1862–1868, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
The ACL RD-TEC 2.0: A Language Resource for Evaluating Term Extraction and Entity Recognition Methods (QasemiZadeh & Schumann, LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1294.pdf