The Quaero Evaluation Initiative on Term Extraction

Thibault Mondary, Adeline Nazarenko, Haïfa Zargayouna, Sabine Barreaux


Abstract
The Quaero program has organized a set of evaluations for terminology extraction systems in 2010 and 2011. Three objectives were targeted in this initiative: the first one was to evaluate the behavior and scalability of term extractors regarding the size of corpora, the second goal was to assess progress between different versions of the same systems, the last one was to measure the influence of corpus type. The protocol used during this initiative was a comparative analysis of 32 runs against a gold standard. Scores were computed using metrics that take into account gradual relevance. Systems produced by Quaero partners and publicly available systems were evaluated on pharmacology corpora composed of European Patents or abstracts of scientific articles, all in English. The gold standard was an unstructured version of the pharmacology thesaurus used by INIST-CNRS for indexing purposes. Most systems scaled with large corpora, contrasted differences were observed between different versions of the same systems and with better results on scientific articles than on patents. During the ongoing adjudication phase domain experts are enriching the thesaurus with terms found by several systems.
Anthology ID:
L12-1479
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
663–669
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/812_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Thibault Mondary, Adeline Nazarenko, Haïfa Zargayouna, and Sabine Barreaux. 2012. The Quaero Evaluation Initiative on Term Extraction. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 663–669, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
The Quaero Evaluation Initiative on Term Extraction (Mondary et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/812_Paper.pdf