Improving Domain Independent Question Parsing with Synthetic Treebanks

Halim-Antoine Boukaram, Nizar Habash, Micheline Ziadee, Majd Sakr


Abstract
Automatic syntactic parsing for question constructions is a challenging task due to the paucity of training examples in most treebanks. The near absence of question constructions is due to the dominance of the news domain in treebanking efforts. In this paper, we compare two synthetic low-cost question treebank creation methods with a conventional manual high-cost annotation method in the context of three domains (news questions, political talk shows, and chatbots) for Modern Standard Arabic, a language with relatively low resources and rich morphology. Our results show that synthetic methods can be effective at significantly reducing parsing errors for a target domain without having to invest large resources on manual annotation; and the combination of manual and synthetic methods is our best domain-independent performer.
Anthology ID:
W18-4924
Volume:
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Venue:
LAW
SIGs:
SIGLEX | SIGANN
Publisher:
Association for Computational Linguistics
Note:
Pages:
214–221
Language:
URL:
https://aclanthology.org/W18-4924
DOI:
Bibkey:
Cite (ACL):
Halim-Antoine Boukaram, Nizar Habash, Micheline Ziadee, and Majd Sakr. 2018. Improving Domain Independent Question Parsing with Synthetic Treebanks. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), pages 214–221, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Improving Domain Independent Question Parsing with Synthetic Treebanks (Boukaram et al., LAW 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-4924.pdf