Leveraging Machine-Generated Data for Joint Intent Detection and Slot Filling in Bangla: A Resource-Efficient Approach

A H M Rezaul Karim, Özlem Uzuner


Abstract
Natural Language Understanding (NLU) is crucial for conversational AI, yet low-resource languages lag behind in essential tasks like intent detection and slot-filling. To address this gap, we converted the widely-used English SNIPS dataset to Bangla using LLaMA 3, creating a dataset that captures the linguistic complexities of the language. With this translated dataset for model training, our experimental evaluation compares both independent and joint modeling approaches using transformer architecture. Results demonstrate that a joint approach based on multilingual BERT (mBERT) achieves superior performance, with 97.83% intent accuracy and 91.03% F1 score for slot filling. This work advances NLU capabilities for Bangla and provides insights for developing robust models in other low-resource languages.
Anthology ID:
2025.chipsal-1.21
Volume:
Proceedings of the First Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025)
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Kengatharaiyer Sarveswaran, Ashwini Vaidya, Bal Krishna Bal, Sana Shams, Surendrabikram Thapa
Venues:
CHiPSAL | WS
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
208–216
Language:
URL:
https://aclanthology.org/2025.chipsal-1.21/
DOI:
Bibkey:
Cite (ACL):
A H M Rezaul Karim and Özlem Uzuner. 2025. Leveraging Machine-Generated Data for Joint Intent Detection and Slot Filling in Bangla: A Resource-Efficient Approach. In Proceedings of the First Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025), pages 208–216, Abu Dhabi, UAE. International Committee on Computational Linguistics.
Cite (Informal):
Leveraging Machine-Generated Data for Joint Intent Detection and Slot Filling in Bangla: A Resource-Efficient Approach (Karim & Uzuner, CHiPSAL 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.chipsal-1.21.pdf