Enhance Incomplete Utterance Restoration by Joint Learning Token Extraction and Text Generation

Shumpei Inoue, Tsungwei Liu, Son Nguyen, Minh-Tien Nguyen


Abstract
This paper introduces a model for incomplete utterance restoration (IUR) called JET (Joint learning token Extraction and Text generation). Different from prior studies that only work on extraction or abstraction datasets, we design a simple but effective model, working for both scenarios of IUR. Our design simulates the nature of IUR, where omitted tokens from the context contribute to restoration. From this, we construct a Picker that identifies the omitted tokens. To support the picker, we design two label creation methods (soft and hard labels), which can work in cases of no annotation data for the omitted tokens. The restoration is done by using a Generator with the help of the Picker on joint learning. Promising results on four benchmark datasets in extraction and abstraction scenarios show that our model is better than the pretrained T5 and non-generative language model methods in both rich and limited training data settings.
Anthology ID:
2022.naacl-main.229
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
July
Year:
2022
Address:
Seattle, United States
Editors:
Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3149–3158
Language:
URL:
https://aclanthology.org/2022.naacl-main.229
DOI:
10.18653/v1/2022.naacl-main.229
Bibkey:
Cite (ACL):
Shumpei Inoue, Tsungwei Liu, Son Nguyen, and Minh-Tien Nguyen. 2022. Enhance Incomplete Utterance Restoration by Joint Learning Token Extraction and Text Generation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3149–3158, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
Enhance Incomplete Utterance Restoration by Joint Learning Token Extraction and Text Generation (Inoue et al., NAACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.naacl-main.229.pdf
Video:
 https://aclanthology.org/2022.naacl-main.229.mp4
Code
 shumpei19/jet
Data
CANARD