A Dependency Treebank of Spoken Second Language English

Kristopher Kyle, Masaki Eguchi, Aaron Miller, Theodore Sither


Abstract
In this paper, we introduce a dependency treebank of spoken second language (L2) English that is annotated with part of speech (Penn POS) tags and syntactic dependencies (Universal Dependencies). We then evaluate the degree to which the use of this treebank as training data affects POS and UD annotation accuracy for L1 web texts, L2 written texts, and L2 spoken texts as compared to models trained on L1 texts only.
Anthology ID:
2022.bea-1.7
Volume:
Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022)
Month:
July
Year:
2022
Address:
Seattle, Washington
Venues:
BEA | NAACL
SIG:
SIGEDU
Publisher:
Association for Computational Linguistics
Note:
Pages:
39–45
Language:
URL:
https://aclanthology.org/2022.bea-1.7
DOI:
10.18653/v1/2022.bea-1.7
Bibkey:
Cite (ACL):
Kristopher Kyle, Masaki Eguchi, Aaron Miller, and Theodore Sither. 2022. A Dependency Treebank of Spoken Second Language English. In Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022), pages 39–45, Seattle, Washington. Association for Computational Linguistics.
Cite (Informal):
A Dependency Treebank of Spoken Second Language English (Kyle et al., BEA 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.bea-1.7.pdf
Data
FCE