A Universal Dependencies Corpora Maintenance Methodology Using Downstream Application

Ran Iwamoto, Hiroshi Kanayama, Alexandre Rademaker, Takuya Ohko


Abstract
This paper investigates updates of Universal Dependencies (UD) treebanks in 23 languages and their impact on a downstream application. Numerous people are involved in updating UD’s annotation guidelines and treebanks in various languages. However, it is not easy to verify whether the updated resources maintain universality with other language resources. Thus, validity and consistency of multilingual corpora should be tested through application tasks involving syntactic structures with PoS tags, dependency labels, and universal features. We apply the syntactic parsers trained on UD treebanks from multiple versions (2.0 to 2.7) to a clause-level sentiment extractor. We then analyze the relationships between attachment scores of dependency parsers and performance in application tasks. For future UD developments, we show examples of outputs that differ depending on version.
Anthology ID:
2021.sigtyp-1.3
Volume:
Proceedings of the Third Workshop on Computational Typology and Multilingual NLP
Month:
June
Year:
2021
Address:
Online
Venues:
NAACL | SIGTYP
SIG:
SIGTYP
Publisher:
Association for Computational Linguistics
Note:
Pages:
23–31
Language:
URL:
https://aclanthology.org/2021.sigtyp-1.3
DOI:
10.18653/v1/2021.sigtyp-1.3
Bibkey:
Cite (ACL):
Ran Iwamoto, Hiroshi Kanayama, Alexandre Rademaker, and Takuya Ohko. 2021. A Universal Dependencies Corpora Maintenance Methodology Using Downstream Application. In Proceedings of the Third Workshop on Computational Typology and Multilingual NLP, pages 23–31, Online. Association for Computational Linguistics.
Cite (Informal):
A Universal Dependencies Corpora Maintenance Methodology Using Downstream Application (Iwamoto et al., SIGTYP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.sigtyp-1.3.pdf