Unified Syntactic Annotation of English in the CGEL Framework

Brett Reynolds, Aryaman Arora, Nathan Schneider


Abstract
We investigate whether the Cambridge Grammar of the English Language (2002) and its extensive descriptions work well as a corpus annotation scheme. We develop annotation guidelines and in the process outline some interesting linguistic uncertainties that we had to resolve. To test the applicability of CGEL to real-world corpora, we conduct an interannotator study on sentences from the English Web Treebank, showing that consistent annotation of even complex syntactic phenomena like gapping using the CGEL formalism is feasible. Why introduce yet another formalism for English syntax? We argue that CGEL is attractive due to its exhaustive analysis of English syntactic phenomena, its labeling of both constituents and functions, and its accessibility. We look towards expanding CGELBank and augmenting it with automatic conversions from existing treebanks in the future.
Anthology ID:
2023.law-1.22
Volume:
Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Jakob Prange, Annemarie Friedrich
Venue:
LAW
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
220–234
Language:
URL:
https://aclanthology.org/2023.law-1.22
DOI:
10.18653/v1/2023.law-1.22
Bibkey:
Cite (ACL):
Brett Reynolds, Aryaman Arora, and Nathan Schneider. 2023. Unified Syntactic Annotation of English in the CGEL Framework. In Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII), pages 220–234, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Unified Syntactic Annotation of English in the CGEL Framework (Reynolds et al., LAW 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.law-1.22.pdf