Further Developments in Treebank Error Detection Using Derivation Trees

Seth Kulick, Ann Bies, Justin Mott


Abstract
This work describes how derivation tree fragments based on a variant of Tree Adjoining Grammar (TAG) can be used to check treebank consistency. Annotation of word sequences are compared both for their internal structural consistency, and their external relation to the rest of the tree. We expand on earlier work in this area in three ways. First, we provide a more complete description of the system, showing how a naive use of TAG structures will not work, leading to a necessary refinement. We also provide a more complete account of the processing pipeline, including the grouping together of structurally similar errors and their elimination of duplicates. Second, we include the new experimental external relation check to find an additional class of errors. Third, we broaden the evaluation to include both the internal and external relation checks, and evaluate the system on both an Arabic and English treebank. The evaluation has been successful enough that the internal check has been integrated into the standard pipeline for current English treebank construction at the Linguistic Data Consortium
Anthology ID:
L12-1100
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1840–1847
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/251_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Seth Kulick, Ann Bies, and Justin Mott. 2012. Further Developments in Treebank Error Detection Using Derivation Trees. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 1840–1847, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Further Developments in Treebank Error Detection Using Derivation Trees (Kulick et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/251_Paper.pdf