Generating and Exploiting Large-scale Pseudo Training Data for Zero Pronoun Resolution

Ting Liu, Yiming Cui, Qingyu Yin, Wei-Nan Zhang, Shijin Wang, Guoping Hu


Abstract
Most existing approaches for zero pronoun resolution are heavily relying on annotated data, which is often released by shared task organizers. Therefore, the lack of annotated data becomes a major obstacle in the progress of zero pronoun resolution task. Also, it is expensive to spend manpower on labeling the data for better performance. To alleviate the problem above, in this paper, we propose a simple but novel approach to automatically generate large-scale pseudo training data for zero pronoun resolution. Furthermore, we successfully transfer the cloze-style reading comprehension neural network model into zero pronoun resolution task and propose a two-step training mechanism to overcome the gap between the pseudo training data and the real one. Experimental results show that the proposed approach significantly outperforms the state-of-the-art systems with an absolute improvements of 3.1% F-score on OntoNotes 5.0 data.
Anthology ID:
P17-1010
Volume:
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2017
Address:
Vancouver, Canada
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
102–111
Language:
URL:
https://aclanthology.org/P17-1010
DOI:
10.18653/v1/P17-1010
Bibkey:
Cite (ACL):
Ting Liu, Yiming Cui, Qingyu Yin, Wei-Nan Zhang, Shijin Wang, and Guoping Hu. 2017. Generating and Exploiting Large-scale Pseudo Training Data for Zero Pronoun Resolution. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 102–111, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
Generating and Exploiting Large-scale Pseudo Training Data for Zero Pronoun Resolution (Liu et al., ACL 2017)
Copy Citation:
PDF:
https://aclanthology.org/P17-1010.pdf
Video:
 https://vimeo.com/234953310
Data
CoNLL-2012