The Change that Matters in Discourse Parsing: Estimating the Impact of Domain Shift on Parser Error

Katherine Atwell, Anthony Sicilia, Seong Jae Hwang, Malihe Alikhani


Abstract
Discourse analysis allows us to attain inferences of a text document that extend beyond the sentence-level. The current performance of discourse models is very low on texts outside of the training distribution’s coverage, diminishing the practical utility of existing models. There is need for a measure that can inform us to what extent our model generalizes from the training to the test sample when these samples may be drawn from distinct distributions. While this can be estimated via distribution shift, we argue that this does not directly correlate with change in the observed error of a classifier (i.e. error-gap). Thus, we propose to use a statistic from the theoretical domain adaptation literature which can be directly tied to error-gap. We study the bias of this statistic as an estimator of error-gap both theoretically and through a large-scale empirical study of over 2400 experiments on 6 discourse datasets from domains including, but not limited to: news, biomedical texts, TED talks, Reddit posts, and fiction. Our results not only motivate our proposal and help us to understand its limitations, but also provide insight on the properties of discourse models and datasets which improve performance in domain adaptation. For instance, we find that non-news datasets are slightly easier to transfer to than news datasets when the training and test sets are very different. Our code and an associated Python package are available to allow practitioners to make more informed model and dataset choices.
Anthology ID:
2022.findings-acl.68
Volume:
Findings of the Association for Computational Linguistics: ACL 2022
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
824–845
Language:
URL:
https://aclanthology.org/2022.findings-acl.68
DOI:
10.18653/v1/2022.findings-acl.68
Bibkey:
Cite (ACL):
Katherine Atwell, Anthony Sicilia, Seong Jae Hwang, and Malihe Alikhani. 2022. The Change that Matters in Discourse Parsing: Estimating the Impact of Domain Shift on Parser Error. In Findings of the Association for Computational Linguistics: ACL 2022, pages 824–845, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
The Change that Matters in Discourse Parsing: Estimating the Impact of Domain Shift on Parser Error (Atwell et al., Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-acl.68.pdf
Video:
 https://aclanthology.org/2022.findings-acl.68.mp4
Code
 anthonysicilia/change-that-matters-acl2022 +  additional community code