The Denoised Web Treebank: Evaluating Dependency Parsing under Noisy Input Conditions

Joachim Daiber, Rob van der Goot


Abstract
We introduce the Denoised Web Treebank: a treebank including a normalization layer and a corresponding evaluation metric for dependency parsing of noisy text, such as Tweets. This benchmark enables the evaluation of parser robustness as well as text normalization methods, including normalization as machine translation and unsupervised lexical normalization, directly on syntactic trees. Experiments show that text normalization together with a combination of domain-specific and generic part-of-speech taggers can lead to a significant improvement in parsing accuracy on this test set.
Anthology ID:
L16-1102
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
649–653
Language:
URL:
https://aclanthology.org/L16-1102
DOI:
Bibkey:
Cite (ACL):
Joachim Daiber and Rob van der Goot. 2016. The Denoised Web Treebank: Evaluating Dependency Parsing under Noisy Input Conditions. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 649–653, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
The Denoised Web Treebank: Evaluating Dependency Parsing under Noisy Input Conditions (Daiber & van der Goot, LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1102.pdf