Automatic Correction of Syntactic Dependency Annotation Differences

Andrew Zupon, Andrew Carnie, Michael Hammond, Mihai Surdeanu


Abstract
Annotation inconsistencies between data sets can cause problems for low-resource NLP, where noisy or inconsistent data cannot be easily replaced. We propose a method for automatically detecting annotation mismatches between dependency parsing corpora, along with three related methods for automatically converting the mismatches. All three methods rely on comparing unseen examples in a new corpus with similar examples in an existing corpus. These three methods include a simple lexical replacement using the most frequent tag of the example in the existing corpus, a GloVe embedding-based replacement that considers related examples, and a BERT-based replacement that uses contextualized embeddings to provide examples fine-tuned to our data. We evaluate these conversions by retraining two dependency parsers—Stanza and Parsing as Tagging (PaT)—on the converted and unconverted data. We find that applying our conversions yields significantly better performance in many cases. Some differences observed between the two parsers are observed. Stanza has a more complex architecture with a quadratic algorithm, taking longer to train, but it can generalize from less data. The PaT parser has a simpler architecture with a linear algorithm, speeding up training but requiring more training data to reach comparable or better performance.
Anthology ID:
2022.lrec-1.769
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
7106–7112
Language:
URL:
https://aclanthology.org/2022.lrec-1.769
DOI:
Bibkey:
Cite (ACL):
Andrew Zupon, Andrew Carnie, Michael Hammond, and Mihai Surdeanu. 2022. Automatic Correction of Syntactic Dependency Annotation Differences. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 7106–7112, Marseille, France. European Language Resources Association.
Cite (Informal):
Automatic Correction of Syntactic Dependency Annotation Differences (Zupon et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.769.pdf