Generating Paraphrases with Lean Vocabulary

Tadashi Nomoto


Abstract
In this work, we examine whether it is possible to achieve the state of the art performance in paraphrase generation with reduced vocabulary. Our approach consists of building a convolution to sequence model (Conv2Seq) partially guided by the reinforcement learning, and training it on the subword representation of the input. The experiment on the Quora dataset, which contains over 140,000 pairs of sentences and corresponding paraphrases, found that with less than 1,000 token types, we were able to achieve performance which exceeded that of the current state of the art.
Anthology ID:
W19-8655
Volume:
Proceedings of the 12th International Conference on Natural Language Generation
Month:
October–November
Year:
2019
Address:
Tokyo, Japan
Editors:
Kees van Deemter, Chenghua Lin, Hiroya Takamura
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
438–442
Language:
URL:
https://aclanthology.org/W19-8655
DOI:
10.18653/v1/W19-8655
Bibkey:
Cite (ACL):
Tadashi Nomoto. 2019. Generating Paraphrases with Lean Vocabulary. In Proceedings of the 12th International Conference on Natural Language Generation, pages 438–442, Tokyo, Japan. Association for Computational Linguistics.
Cite (Informal):
Generating Paraphrases with Lean Vocabulary (Nomoto, INLG 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-8655.pdf