Unregulated Chinese-to-English Data Expansion Does NOT Work for Neural Event Detection

Zhongqiu Li, Yu Hong, Jie Wang, Shiming He, Jianmin Yao, Guodong Zhou


Abstract
We leverage cross-language data expansion and retraining to enhance neural Event Detection (abbr., ED) on English ACE corpus. Machine translation is utilized for expanding English training set of ED from that of Chinese. However, experimental results illustrate that such strategy actually results in performance degradation. The survey of translations suggests that the mistakenly-aligned triggers in the expanded data negatively influences the retraining process. We refer this phenomenon to “trigger falsification”. To overcome the issue, we apply heuristic rules for regulating the expanded data, fixing the distracting samples that contain the falsified triggers. The supplementary experiments show that the rule-based regulation is beneficial, yielding the improvement of about 1.6% F1-score for ED. We additionally prove that, instead of transfer learning from the translated ED data, the straight data combination by random pouring surprisingly performs better.
Anthology ID:
2022.coling-1.232
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Editors:
Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
2633–2638
Language:
URL:
https://aclanthology.org/2022.coling-1.232
DOI:
Bibkey:
Cite (ACL):
Zhongqiu Li, Yu Hong, Jie Wang, Shiming He, Jianmin Yao, and Guodong Zhou. 2022. Unregulated Chinese-to-English Data Expansion Does NOT Work for Neural Event Detection. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2633–2638, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Unregulated Chinese-to-English Data Expansion Does NOT Work for Neural Event Detection (Li et al., COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.232.pdf