NusaDialogue: Dialogue Summarization and Generation for Underrepresented and Extremely Low-Resource Languages

Ayu Purwarianti, Dea Adhista, Agung Baptiso, Miftahul Mahfuzh, Yusrina Sabila, Aulia Adila, Samuel Cahyawijaya, Alham Fikri Aji


Abstract
Developing dialogue summarization for extremely low-resource languages is a challenging task. We introduce NusaDialogue, a dialogue summarization dataset for three underrepresented languages in the Malayo-Polynesian language family: Minangkabau, Balinese, and Buginese. NusaDialogue covers 17 topics and 185 subtopics, with annotations provided by 73 native speakers. Additionally, we conducted experiments using fine-tuning on a specifically designed medium-sized language model for Indonesian, as well as zero- and few-shot learning on various multilingual large language models (LLMs). The results indicate that, for extremely low-resource languages such as Minangkabau, Balinese, and Buginese, the fine-tuning approach yields significantly higher performance compared to zero- and few-shot prompting, even when applied to LLMs with considerably larger parameter sizes.
Anthology ID:
2025.sealp-1.8
Volume:
Proceedings of the Second Workshop in South East Asian Language Processing
Month:
January
Year:
2025
Address:
Online
Editors:
Derry Wijaya, Alham Fikri Aji, Clara Vania, Genta Indra Winata, Ayu Purwarianti
Venues:
sealp | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
82–100
Language:
URL:
https://aclanthology.org/2025.sealp-1.8/
DOI:
Bibkey:
Cite (ACL):
Ayu Purwarianti, Dea Adhista, Agung Baptiso, Miftahul Mahfuzh, Yusrina Sabila, Aulia Adila, Samuel Cahyawijaya, and Alham Fikri Aji. 2025. NusaDialogue: Dialogue Summarization and Generation for Underrepresented and Extremely Low-Resource Languages. In Proceedings of the Second Workshop in South East Asian Language Processing, pages 82–100, Online. Association for Computational Linguistics.
Cite (Informal):
NusaDialogue: Dialogue Summarization and Generation for Underrepresented and Extremely Low-Resource Languages (Purwarianti et al., sealp 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.sealp-1.8.pdf