Shorten the Long Tail for Rare Entity and Event Extraction

Pengfei Yu, Heng Ji


Abstract
The distribution of knowledge elements such as entity types and event types is long-tailed in natural language. Hence information extraction datasets naturally conform long-tailed distribution. Although imbalanced datasets can teach the model about the useful real-world bias, deep learning models may learn features not generalizable to rare or unseen expressions of entities or events during evaluation, especially for rare types without sufficient training instances. Existing approaches for the long-tailed learning problem seek to manipulate the training data by re-balancing, augmentation or introducing extra prior knowledge. In comparison, we propose to handle the generalization challenge by making the evaluation instances closer to the frequent training cases. We design a new transformation module that transforms infrequent candidate mention representation during evaluation with the average mention representation in the training dataset. Experimental results on classic benchmarks on three entity or event extraction datasets demonstrates the effectiveness of our framework.
Anthology ID:
2023.eacl-main.97
Volume:
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Andreas Vlachos, Isabelle Augenstein
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1339–1350
Language:
URL:
https://aclanthology.org/2023.eacl-main.97
DOI:
10.18653/v1/2023.eacl-main.97
Bibkey:
Cite (ACL):
Pengfei Yu and Heng Ji. 2023. Shorten the Long Tail for Rare Entity and Event Extraction. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 1339–1350, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Shorten the Long Tail for Rare Entity and Event Extraction (Yu & Ji, EACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.eacl-main.97.pdf
Video:
 https://aclanthology.org/2023.eacl-main.97.mp4