Emancipating Event Extraction from the Constraints of Long-Tailed Distribution Data Utilizing Large Language Models

Zhigang Kan, Liwen Peng, Linbo Qiao, Dongsheng Li


Abstract
Event Extraction (EE) is a challenging task that aims to extract structural event-related information from unstructured text. Traditional methods for EE depend on manual annotations, which are both expensive and scarce. Furthermore, the existing datasets mostly follow the long-tail distribution, severely hindering the previous methods of modeling tail types. Two techniques can address this issue: transfer learning and data generation. However, the existing methods based on transfer learning still rely on pre-training with a large amount of labeled data in the source domain. Additionally, the quality of data generated by previous data generation methods is difficult to control. In this paper, leveraging Large Language Models (LLMs), we propose novel methods for event extraction and generation based on dialogues, overcoming the problems of relying on source domain data and maintaining data quality. Specifically, this paper innovatively transforms the EE task into multi-turn dialogues, guiding LLMs to learn event schemas from historical dialogue information and output structural events. Furthermore, we introduce a novel LLM-based method for generating high-quality data, significantly improving traditional models’ performance with various paradigms and structures, especially on tail types. Adequate experiments on real-world datasets demonstrate the effectiveness of the proposed event extraction and data generation methods.
Anthology ID:
2024.lrec-main.501
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
5644–5653
Language:
URL:
https://aclanthology.org/2024.lrec-main.501
DOI:
Bibkey:
Cite (ACL):
Zhigang Kan, Liwen Peng, Linbo Qiao, and Dongsheng Li. 2024. Emancipating Event Extraction from the Constraints of Long-Tailed Distribution Data Utilizing Large Language Models. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 5644–5653, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Emancipating Event Extraction from the Constraints of Long-Tailed Distribution Data Utilizing Large Language Models (Kan et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.501.pdf