Tatsuya Aoyama


2023

pdf bib
What’s Hard in English RST Parsing? Predictive Models for Error Analysis
Yang Janet Liu | Tatsuya Aoyama | Amir Zeldes
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Despite recent advances in Natural Language Processing (NLP), hierarchical discourse parsing in the framework of Rhetorical Structure Theory remains challenging, and our understanding of the reasons for this are as yet limited. In this paper, we examine and model some of the factors associated with parsing difficulties in previous work: the existence of implicit discourse relations, challenges in identifying long-distance relations, out-of-vocabulary items, and more. In order to assess the relative importance of these variables, we also release two annotated English test-sets with explicit correct and distracting discourse markers associated with gold standard RST relations. Our results show that as in shallow discourse parsing, the explicit/implicit distinction plays a role, but that long-distance dependencies are the main challenge, while lack of lexical overlap is less of a problem, at least for in-domain parsing. Our final model is able to predict where errors will occur with an accuracy of 76.3% for the bottom-up parser and 76.6% for the top-down parser.

pdf bib
GENTLE: A Genre-Diverse Multilayer Challenge Set for English NLP and Linguistic Evaluation
Tatsuya Aoyama | Shabnam Behzad | Luke Gessler | Lauren Levine | Jessica Lin | Yang Janet Liu | Siyao Peng | Yilun Zhu | Amir Zeldes
Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII)

We present GENTLE, a new mixed-genre English challenge corpus totaling 17K tokens and consisting of 8 unusual text types for out-of-domain evaluation: dictionary entries, esports commentaries, legal documents, medical notes, poetry, mathematical proofs, syllabuses, and threat letters. GENTLE is manually annotated for a variety of popular NLP tasks, including syntactic dependency parsing, entity recognition, coreference resolution, and discourse parsing. We evaluate state-of-the-art NLP systems on GENTLE and find severe degradation for at least some genres in their performance on all tasks, which indicates GENTLE’s utility as an evaluation dataset for NLP systems.

2022

pdf bib
Comparing Native and Learner Englishes Using a Large Pre-trained Language Model
Tatsuya Aoyama
Proceedings of the 11th Workshop on NLP for Computer Assisted Language Learning

pdf bib
Probe-Less Probing of BERT’s Layer-Wise Linguistic Knowledge with Masked Word Prediction
Tatsuya Aoyama | Nathan Schneider
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop

The current study quantitatively (and qualitatively for an illustrative purpose) analyzes BERT’s layer-wise masked word prediction on an English corpus, and finds that (1) the layerwise localization of linguistic knowledge primarily shown in probing studies is replicated in a behavior-based design and (2) that syntactic and semantic information is encoded at different layers for words of different syntactic categories. Hypothesizing that the above results are correlated with the number of likely potential candidates of the masked word prediction, we also investigate how the results differ for tokens within multiword expressions.