MINION: a Large-Scale and Diverse Dataset for Multilingual Event Detection

Amir Pouran Ben Veyseh, Minh Van Nguyen, Franck Dernoncourt, Thien Nguyen


Abstract
Event Detection (ED) is the task of identifying and classifying trigger words of event mentions in text. Despite considerable research efforts in recent years for English text, the task of ED in other languages has been significantly less explored. Switching to non-English languages, important research questions for ED include how well existing ED models perform on different languages, how challenging ED is in other languages, and how well ED knowledge and annotation can be transferred across languages. To answer those questions, it is crucial to obtain multilingual ED datasets that provide consistent event annotation for multiple languages. There exist some multilingual ED datasets; however, they tend to cover a handful of languages and mainly focus on popular ones. Many languages are not covered in existing multilingual ED datasets. In addition, the current datasets are often small and not accessible to the public. To overcome those shortcomings, we introduce a new large-scale multilingual dataset for ED (called MINION) that consistently annotates events for 8 different languages; 5 of them have not been supported by existing multilingual datasets. We also perform extensive experiments and analysis to demonstrate the challenges and transferability of ED across languages in MINION that in all call for more research effort in this area. We will release the dataset to promote future research on multilingual ED.
Anthology ID:
2022.naacl-main.166
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
July
Year:
2022
Address:
Seattle, United States
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2286–2299
Language:
URL:
https://aclanthology.org/2022.naacl-main.166
DOI:
10.18653/v1/2022.naacl-main.166
Bibkey:
Cite (ACL):
Amir Pouran Ben Veyseh, Minh Van Nguyen, Franck Dernoncourt, and Thien Nguyen. 2022. MINION: a Large-Scale and Diverse Dataset for Multilingual Event Detection. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2286–2299, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
MINION: a Large-Scale and Diverse Dataset for Multilingual Event Detection (Pouran Ben Veyseh et al., NAACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.naacl-main.166.pdf