LTG at VarDial 2025 NorSID: More and Better Training Data for Slot and Intent Detection

Marthe Midtgaard; Petter Mæhlum; Yves Scherrer

LTG at VarDial 2025 NorSID: More and Better Training Data for Slot and Intent Detection

Marthe Midtgaard, Petter Mæhlum, Yves Scherrer

Abstract

This paper describes the LTG submission to the VarDial 2025 shared task, where we participate in the Norwegian slot and intent detection subtasks. The shared task focuses on Norwegian dialects, which present challenges due to their low-resource nature and variation. We test a variety of neural models and training data configurations, with the focus on improving and extending the available Norwegian training data. This includes automatically re-aligning slot spans in Norwegian Bokmål, as well as re-translating the original English training data into both Bokmål and Nynorsk. % to address dialectal diversity. We also re-annotate an external Norwegian dataset to augment the training data. Our best models achieve first place in both subtasks, achieving an span F1 score of 0.893 for slot filling and an accuracy of 0.980 for intent detection. Our results indicate that while translation quality is less critical, improving the slot labels has a notable impact on slot performance. Moreover, adding more standard Norwegian data improves performance, but incorporating even small amounts of dialectal data leads to greater gains.

Anthology ID:: 2025.vardial-1.15
Volume:: Proceedings of the 12th Workshop on NLP for Similar Languages, Varieties and Dialects
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Yves Scherrer, Tommi Jauhiainen, Nikola Ljubešić, Preslav Nakov, Jorg Tiedemann, Marcos Zampieri
Venues:: VarDial | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 200–208
Language:
URL:: https://aclanthology.org/2025.vardial-1.15/
DOI:
Bibkey:
Cite (ACL):: Marthe Midtgaard, Petter Mæhlum, and Yves Scherrer. 2025. LTG at VarDial 2025 NorSID: More and Better Training Data for Slot and Intent Detection. In Proceedings of the 12th Workshop on NLP for Similar Languages, Varieties and Dialects, pages 200–208, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: LTG at VarDial 2025 NorSID: More and Better Training Data for Slot and Intent Detection (Midtgaard et al., VarDial 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.vardial-1.15.pdf

PDF Cite Search Fix data