Miftahul Mahfuzh


2025

pdf bib
NusaDialogue: Dialogue Summarization and Generation for Underrepresented and Extremely Low-Resource Languages
Ayu Purwarianti | Dea Adhista | Agung Baptiso | Miftahul Mahfuzh | Yusrina Sabila | Aulia Adila | Samuel Cahyawijaya | Alham Fikri Aji
Proceedings of the Second Workshop in South East Asian Language Processing

Developing dialogue summarization for extremely low-resource languages is a challenging task. We introduce NusaDialogue, a dialogue summarization dataset for three underrepresented languages in the Malayo-Polynesian language family: Minangkabau, Balinese, and Buginese. NusaDialogue covers 17 topics and 185 subtopics, with annotations provided by 73 native speakers. Additionally, we conducted experiments using fine-tuning on a specifically designed medium-sized language model for Indonesian, as well as zero- and few-shot learning on various multilingual large language models (LLMs). The results indicate that, for extremely low-resource languages such as Minangkabau, Balinese, and Buginese, the fine-tuning approach yields significantly higher performance compared to zero- and few-shot prompting, even when applied to LLMs with considerably larger parameter sizes.