MailEx: Email Event and Argument Extraction

Saurabh Srivastava, Gaurav Singh, Shou Matsumoto, Ali Raz, Paulo Costa, Joshua Poore, Ziyu Yao


Abstract
In this work, we present the first dataset, MailEx, for performing event extraction from conversational email threads. To this end, we first proposed a new taxonomy covering 10 event types and 76 arguments in the email domain. Our final dataset includes 1.5K email threads and ~4K emails, which are annotated with a total of ~8K event instances. To understand the task challenges, we conducted a series of experiments comparing three types of approaches, i.e., fine-tuned sequence labeling, fine-tuned generative extraction, and few-shot in-context learning. Our results showed that the task of email event extraction is far from being addressed, due to challenges lying in, e.g., extracting non-continuous, shared trigger spans, extracting non-named entity arguments, and modeling the email conversational history. Our work thus suggests more future investigations in this domain-specific event extraction task.
Anthology ID:
2023.emnlp-main.801
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12964–12987
Language:
URL:
https://aclanthology.org/2023.emnlp-main.801
DOI:
10.18653/v1/2023.emnlp-main.801
Bibkey:
Cite (ACL):
Saurabh Srivastava, Gaurav Singh, Shou Matsumoto, Ali Raz, Paulo Costa, Joshua Poore, and Ziyu Yao. 2023. MailEx: Email Event and Argument Extraction. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12964–12987, Singapore. Association for Computational Linguistics.
Cite (Informal):
MailEx: Email Event and Argument Extraction (Srivastava et al., EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.801.pdf
Video:
 https://aclanthology.org/2023.emnlp-main.801.mp4