MECI: A Multilingual Dataset for Event Causality Identification

Viet Dac Lai, Amir Pouran Ben Veyseh, Minh Van Nguyen, Franck Dernoncourt, Thien Huu Nguyen


Abstract
Event Causality Identification (ECI) is the task of detecting causal relations between events mentioned in the text. Although this task has been extensively studied for English materials, it is under-explored for many other languages. A major reason for this issue is the lack of multilingual datasets that provide consistent annotations for event causality relations in multiple non-English languages. To address this issue, we introduce a new multilingual dataset for ECI, called MECI. The dataset employs consistent annotation guidelines for five typologically different languages, i.e., English, Danish, Spanish, Turkish, and Urdu. Our dataset thus enable a new research direction on cross-lingual transfer learning for ECI. Our extensive experiments demonstrate high quality for MECI that can provide ample research challenges and directions for future research. We will publicly release MECI to promote research on multilingual ECI.
Anthology ID:
2022.coling-1.206
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
2346–2356
Language:
URL:
https://aclanthology.org/2022.coling-1.206
DOI:
Bibkey:
Cite (ACL):
Viet Dac Lai, Amir Pouran Ben Veyseh, Minh Van Nguyen, Franck Dernoncourt, and Thien Huu Nguyen. 2022. MECI: A Multilingual Dataset for Event Causality Identification. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2346–2356, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
MECI: A Multilingual Dataset for Event Causality Identification (Lai et al., COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.206.pdf
Data
ConceptNet