BabelBot at AraFinNLP2024: Fine-tuning T5 for Multi-dialect Intent Detection with Synthetic Data and Model Ensembling

Murhaf Fares, Samia Touileb


Abstract
This paper presents our results for the Arabic Financial NLP (AraFinNLP) shared task at the Second Arabic Natural Language Processing Conference (ArabicNLP 2024). We participated in the first sub-task, Multi-dialect Intent Detection, which focused on cross-dialect intent detection in the banking domain. Our approach involved fine-tuning an encoder-only T5 model, generating synthetic data, and model ensembling. Additionally, we conducted an in-depth analysis of the dataset, addressing annotation errors and problematic translations. Our model was ranked third in the shared task, achieving a F1-score of 0.871.
Anthology ID:
2024.arabicnlp-1.40
Volume:
Proceedings of The Second Arabic Natural Language Processing Conference
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Nizar Habash, Houda Bouamor, Ramy Eskander, Nadi Tomeh, Ibrahim Abu Farha, Ahmed Abdelali, Samia Touileb, Injy Hamed, Yaser Onaizan, Bashar Alhafni, Wissam Antoun, Salam Khalifa, Hatem Haddad, Imed Zitouni, Badr AlKhamissi, Rawan Almatham, Khalil Mrini
Venues:
ArabicNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
433–440
Language:
URL:
https://aclanthology.org/2024.arabicnlp-1.40
DOI:
Bibkey:
Cite (ACL):
Murhaf Fares and Samia Touileb. 2024. BabelBot at AraFinNLP2024: Fine-tuning T5 for Multi-dialect Intent Detection with Synthetic Data and Model Ensembling. In Proceedings of The Second Arabic Natural Language Processing Conference, pages 433–440, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
BabelBot at AraFinNLP2024: Fine-tuning T5 for Multi-dialect Intent Detection with Synthetic Data and Model Ensembling (Fares & Touileb, ArabicNLP-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.arabicnlp-1.40.pdf