2024
pdf
bib
abs
Performance Analysis of Speech Encoders for Low-Resource SLU and ASR in Tunisian Dialect
Salima Mdhaffar
|
Haroun Elleuch
|
Fethi Bougares
|
Yannick Estève
Proceedings of The Second Arabic Natural Language Processing Conference
Speech encoders pretrained through self-supervised learning (SSL) have demonstrated remarkable performance in various downstream tasks, including Spoken Language Understanding (SLU) and Automatic Speech Recognition (ASR). For instance, fine-tuning SSL models for such tasks has shown significant potential, leading to improvements in the SOTA performance across challenging datasets.In contrast to existing research, this paper contributes by comparing the effectiveness of SSL approaches in the context of (i) the low-resource Spoken Tunisian Arabic Dialect and (ii) its combination with a low-resource SLU and ASR scenario, where only a few semantic annotations are available for fine-tuning. We conducted experiments using many SSL speech encoders on the TARIC-SLU dataset. We used speech encoders that were pre-trained on either monolingual or multilingual speech data. Some of them have also been refined without in-domain nor Tunisian data through a multimodal supervised teacher-student learning. The study made in this paper yields numerous significant findings that we will discuss in the paper.
2023
pdf
bib
abs
ON-TRAC Consortium Systems for the IWSLT 2023 Dialectal and Low-resource Speech Translation Tasks
Antoine Laurent
|
Souhir Gahbiche
|
Ha Nguyen
|
Haroun Elleuch
|
Fethi Bougares
|
Antoine Thiol
|
Hugo Riguidel
|
Salima Mdhaffar
|
Gaëlle Laperrière
|
Lucas Maison
|
Sameer Khurana
|
Yannick Estève
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)
This paper describes the ON-TRAC consortium speech translation systems developed for IWSLT 2023 evaluation campaign. Overall, we participated in three speech translation tracks featured in the low-resource and dialect speech translation shared tasks, namely; i) spoken Tamasheq to written French, ii) spoken Pashto to written French, and iii) spoken Tunisian to written English. All our primary submissions are based on the end-to-end speech-to-text neural architecture using a pretrained SAMU-XLSR model as a speech encoder and a mbart model as a decoder. The SAMU-XLSR model is built from the XLS-R 128 in order to generate language agnostic sentence-level embeddings. This building is driven by the LaBSE model trained on multilingual text dataset. This architecture allows us to improve the input speech representations and achieve significant improvements compared to conventional end-to-end speech translation systems.
pdf
bib
abs
ELYADATA at WojoodNER Shared Task: Data and Model-centric Approaches for Arabic Flat and Nested NER
Imen Laouirine
|
Haroun Elleuch
|
Fethi Bougares
Proceedings of ArabicNLP 2023
This paper describes our submissions to the WojoodNER shared task organized during the first ArabicNLP conference. We participated in the two proposed sub-tasks of flat and nested Named Entity Recognition (NER). Our systems were ranked first over eight and third over eleven in the Nested NER and Flat NER, respectively. All our primary submissions are based on DiffusionNER models (Shen et al., 2023), where the NER task is formulated as a boundary-denoising diffusion process. Experiments on nested WojoodNER achieves the best results with a micro F1-score of 93.73%. For the flat sub-task, our primary system was the third-best system, with a micro F1-score of 91.92%.