Using Linguistic Resources to Evaluate the Quality of Annotated Corpora

Max Silberztein


Abstract
Statistical and neural-network-based methods that compute their results by comparing a given text to be analyzed with a reference corpus assume that the reference corpus is complete and reliable enough. In this article, I conduct several experiments on an extract of the Open American National Corpus to verify this assumption.
Anthology ID:
W18-3802
Volume:
Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Peter Machonis, Anabela Barreiro, Kristina Kocijan, Max Silberztein
Venue:
LR4NLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2–11
Language:
URL:
https://aclanthology.org/W18-3802
DOI:
Bibkey:
Cite (ACL):
Max Silberztein. 2018. Using Linguistic Resources to Evaluate the Quality of Annotated Corpora. In Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing, pages 2–11, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Using Linguistic Resources to Evaluate the Quality of Annotated Corpora (Silberztein, LR4NLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-3802.pdf