Detecting Annotation Scheme Variation in Out-of-Domain Treebanks

Yannick Versley, Julius Steen


Abstract
To ensure portability of NLP systems across multiple domains, existing treebanks are often extended by adding trees from interesting domains that were not part of the initial annotation effort. In this paper, we will argue that it is both useful from an application viewpoint and enlightening from a linguistic viewpoint to detect and reduce divergence in annotation schemes between extant and new parts in a set of treebanks that is to be used in evaluation experiments. The results of our correction and harmonization efforts will be made available to the public as a test suite for the evaluation of constituent parsing.
Anthology ID:
L16-1373
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2354–2360
Language:
URL:
https://aclanthology.org/L16-1373
DOI:
Bibkey:
Cite (ACL):
Yannick Versley and Julius Steen. 2016. Detecting Annotation Scheme Variation in Out-of-Domain Treebanks. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 2354–2360, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Detecting Annotation Scheme Variation in Out-of-Domain Treebanks (Versley & Steen, LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1373.pdf