MultiMUC: Multilingual Template Filling on MUC-4

William Gantt, Shabnam Behzad, Hannah An, Yunmo Chen, Aaron White, Benjamin Van Durme, Mahsa Yarmohammadi


Abstract
We introduce MultiMUC, the first multilingual parallel corpus for template filling, comprising translations of the classic MUC-4 template filling benchmark into five languages: Arabic, Chinese, Farsi, Korean, and Russian. We obtain automatic translations from a strong multilingual machine translation system and manually project the original English annotations into each target language. For all languages, we also provide human translations for key portions of the dev and test splits. Finally, we present baselines on MultiMUC both with state-of-the-art template filling models for MUC-4 and with ChatGPT. We release MUC-4 and the supervised baselines to facilitate further work on document-level information extraction in multilingual settings.
Anthology ID:
2024.eacl-long.21
Volume:
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
March
Year:
2024
Address:
St. Julian’s, Malta
Editors:
Yvette Graham, Matthew Purver
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
349–368
Language:
URL:
https://aclanthology.org/2024.eacl-long.21
DOI:
Bibkey:
Cite (ACL):
William Gantt, Shabnam Behzad, Hannah An, Yunmo Chen, Aaron White, Benjamin Van Durme, and Mahsa Yarmohammadi. 2024. MultiMUC: Multilingual Template Filling on MUC-4. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 349–368, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):
MultiMUC: Multilingual Template Filling on MUC-4 (Gantt et al., EACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.eacl-long.21.pdf