Aulia Adila
2025
NusaDialogue: Dialogue Summarization and Generation for Underrepresented and Extremely Low-Resource Languages
Ayu Purwarianti
|
Dea Adhista
|
Agung Baptiso
|
Miftahul Mahfuzh
|
Yusrina Sabila
|
Aulia Adila
|
Samuel Cahyawijaya
|
Alham Fikri Aji
Proceedings of the Second Workshop in South East Asian Language Processing
Developing dialogue summarization for extremely low-resource languages is a challenging task. We introduce NusaDialogue, a dialogue summarization dataset for three underrepresented languages in the Malayo-Polynesian language family: Minangkabau, Balinese, and Buginese. NusaDialogue covers 17 topics and 185 subtopics, with annotations provided by 73 native speakers. Additionally, we conducted experiments using fine-tuning on a specifically designed medium-sized language model for Indonesian, as well as zero- and few-shot learning on various multilingual large language models (LLMs). The results indicate that, for extremely low-resource languages such as Minangkabau, Balinese, and Buginese, the fine-tuning approach yields significantly higher performance compared to zero- and few-shot prompting, even when applied to LLMs with considerably larger parameter sizes.
Search
Fix data
Co-authors
- Dea Adhista 1
- Alham Fikri Aji 1
- Agung Baptiso 1
- Samuel Cahyawijaya 1
- Miftahul Mahfuzh 1
- show all...