Using active learning to expand training data for implicit discourse relation recognition

Yang Xu; Yu Hong (洪宇); Huibin Ruan; Jianmin Yao; Min Zhang (张民); Guodong Zhou (周国栋)

doi:10.18653/v1/D18-1079

Using active learning to expand training data for implicit discourse relation recognition

Yang Xu, Yu Hong, Huibin Ruan, Jianmin Yao, Min Zhang, Guodong Zhou

Abstract

We tackle discourse-level relation recognition, a problem of determining semantic relations between text spans. Implicit relation recognition is challenging due to the lack of explicit relational clues. The increasingly popular neural network techniques have been proven effective for semantic encoding, whereby widely employed to boost semantic relation discrimination. However, learning to predict semantic relations at a deep level heavily relies on a great deal of training data, but the scale of the publicly available data in this field is limited. In this paper, we follow Rutherford and Xue (2015) to expand the training data set using the corpus of explicitly-related arguments, by arbitrarily dropping the overtly presented discourse connectives. On the basis, we carry out an experiment of sampling, in which a simple active learning approach is used, so as to take the informative instances for data expansion. The goal is to verify whether the selective use of external data not only reduces the time consumption of retraining but also ensures a better system performance. Using the expanded training data, we retrain a convolutional neural network (CNN) based classifer which is a simplified version of Qin et al. (2016)’s stacking gated relation recognizer. Experimental results show that expanding the training set with small-scale carefully-selected external data yields substantial performance gain, with the improvements of about 4% for accuracy and 3.6% for F-score. This allows a weak classifier to achieve a comparable performance against the state-of-the-art systems.

Anthology ID:: D18-1079
Volume:: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:: October-November
Year:: 2018
Address:: Brussels, Belgium
Editors:: Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
Venue:: EMNLP
SIG:: SIGDAT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 725–731
Language:
URL:: https://aclanthology.org/D18-1079/
DOI:: 10.18653/v1/D18-1079
Bibkey:
Cite (ACL):: Yang Xu, Yu Hong, Huibin Ruan, Jianmin Yao, Min Zhang, and Guodong Zhou. 2018. Using active learning to expand training data for implicit discourse relation recognition. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 725–731, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):: Using active learning to expand training data for implicit discourse relation recognition (Xu et al., EMNLP 2018)
Copy Citation:
PDF:: https://aclanthology.org/D18-1079.pdf

PDF Cite Search Fix data