Grace Hui Yang


pdf bib
High-Quality Dialogue Diversification by Intermittent Short Extension Ensembles
Zhiwen Tang | Hrishikesh Kulkarni | Grace Hui Yang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021


pdf bib
More Diverse Dialogue Datasets via Diversity-Informed Data Collection
Katherine Stasaski | Grace Hui Yang | Marti A. Hearst
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Automated generation of conversational dialogue using modern neural architectures has made notable advances. However, these models are known to have a drawback of often producing uninteresting, predictable responses; this is known as the diversity problem. We introduce a new strategy to address this problem, called Diversity-Informed Data Collection. Unlike prior approaches, which modify model architectures to solve the problem, this method uses dynamically computed corpus-level statistics to determine which conversational participants to collect data from. Diversity-Informed Data Collection produces significantly more diverse data than baseline data collection methods, and better results on two downstream tasks: emotion classification and dialogue generation. This method is generalizable and can be used with other corpus-level metrics.