Fethi Bougares


2021

pdf bib
Proceedings of the Sixth Arabic Natural Language Processing Workshop
Nizar Habash | Houda Bouamor | Hazem Hajj | Walid Magdy | Wajdi Zaghouani | Fethi Bougares | Nadi Tomeh | Ibrahim Abu Farha | Samia Touileb
Proceedings of the Sixth Arabic Natural Language Processing Workshop

2020

pdf bib
Text and Speech-based Tunisian Arabic Sub-Dialects Identification
Najla Ben Abdallah | Saméh Kchaou | Fethi Bougares
Proceedings of the 12th Language Resources and Evaluation Conference

Dialect IDentification (DID) is a challenging task, and it becomes more complicated when it is about the identification of dialects that belong to the same country. Indeed, dialects of the same country are closely related and exhibit a significant overlapping at the phonetic and lexical levels. In this paper, we present our first results on a dialect classification task covering four sub-dialects spoken in Tunisia. We use the term ’sub-dialect’ to refer to the dialects belonging to the same country. We conducted our experiments aiming to discriminate between Tunisian sub-dialects belonging to four different cities: namely Tunis, Sfax, Sousse and Tataouine. A spoken corpus of 1673 utterances is collected, transcribed and freely distributed. We used this corpus to build several speech- and text-based DID systems. Our results confirm that, at this level of granularity, dialects are much better distinguishable using the speech modality. Indeed, we were able to reach an F-1 score of 93.75% using our best speech-based identification system while the F-1 score is limited to 54.16% using text-based DID on the same test set.

pdf bib
ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation Challenge Tasks at IWSLT 2020
Maha Elbayad | Ha Nguyen | Fethi Bougares | Natalia Tomashenko | Antoine Caubrière | Benjamin Lecouteux | Yannick Estève | Laurent Besacier
Proceedings of the 17th International Conference on Spoken Language Translation

This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2020, offline speech translation and simultaneous speech translation. ON-TRAC Consortium is composed of researchers from three French academic laboratories: LIA (Avignon Université), LIG (Université Grenoble Alpes), and LIUM (Le Mans Université). Attention-based encoder-decoder models, trained end-to-end, were used for our submissions to the offline speech translation track. Our contributions focused on data augmentation and ensembling of multiple models. In the simultaneous speech translation track, we build on Transformer-based wait-k models for the text-to-text subtask. For speech-to-text simultaneous translation, we attach a wait-k MT system to a hybrid ASR system. We propose an algorithm to control the latency of the ASR+MT cascade and achieve a good latency-quality trade-off on both subtasks.

pdf bib
Proceedings of the Fifth Conference on Machine Translation
Loïc Barrault | Ondřej Bojar | Fethi Bougares | Rajen Chatterjee | Marta R. Costa-jussà | Christian Federmann | Mark Fishel | Alexander Fraser | Yvette Graham | Paco Guzman | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | André Martins | Makoto Morishita | Christof Monz | Masaaki Nagata | Toshiaki Nakazawa | Matteo Negri
Proceedings of the Fifth Conference on Machine Translation

pdf bib
Findings of the First Shared Task on Lifelong Learning Machine Translation
Loïc Barrault | Magdalena Biesialska | Marta R. Costa-jussà | Fethi Bougares | Olivier Galibert
Proceedings of the Fifth Conference on Machine Translation

A lifelong learning system can adapt to new data without forgetting previously acquired knowledge. In this paper, we introduce the first benchmark for lifelong learning machine translation. For this purpose, we provide training, lifelong and test data sets for two language pairs: English-German and English-French. Additionally, we report the results of our baseline systems, which we make available to the public. The goal of this shared task is to encourage research on the emerging topic of lifelong learning machine translation.

pdf bib
Proceedings of the Fifth Arabic Natural Language Processing Workshop
Imed Zitouni | Muhammad Abdul-Mageed | Houda Bouamor | Fethi Bougares | Mahmoud El-Haj | Nadi Tomeh | Wajdi Zaghouani
Proceedings of the Fifth Arabic Natural Language Processing Workshop

2019

pdf bib
Proceedings of the Fourth Arabic Natural Language Processing Workshop
Wassim El-Hajj | Lamia Hadrich Belguith | Fethi Bougares | Walid Magdy | Imed Zitouni | Nadi Tomeh | Mahmoud El-Haj
Proceedings of the Fourth Arabic Natural Language Processing Workshop

pdf bib
LIUM-MIRACL Participation in the MADAR Arabic Dialect Identification Shared Task
Saméh Kchaou | Fethi Bougares | Lamia Hadrich-Belguith
Proceedings of the Fourth Arabic Natural Language Processing Workshop

This paper describes the joint participation of the LIUM and MIRACL Laboratories at the Arabic dialect identification challenge of the MADAR Shared Task (Bouamor et al., 2019) conducted during the Fourth Arabic Natural Language Processing Workshop (WANLP 2019). We participated to the Travel Domain Dialect Identification subtask. We built several systems and explored different techniques including conventional machine learning methods and deep learning algorithms. Deep learning approaches did not perform well on this task. We experimented several classification systems and we were able to identify the dialect of an input sentence with an F1-score of 65.41% on the official test set using only the training data supplied by the shared task organizers.

pdf bib
LIUM’s Contributions to the WMT2019 News Translation Task: Data and Systems for German-French Language Pairs
Fethi Bougares | Jane Wottawa | Anne Baillot | Loïc Barrault | Adrien Bardet
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

This paper describes the neural machine translation (NMT) systems of the LIUM Laboratory developed for the French↔German news translation task of the Fourth Conference onMachine Translation (WMT 2019). The chosen language pair is included for the first time in the WMT news translation task. We de-scribe how the training and the evaluation data was created. We also present our participation in the French↔German translation directions using self-attentional Transformer networks with small and big architectures.

pdf bib
Étude de l’apprentissage par transfert de systèmes de traduction automatique neuronaux (Study on transfer learning in neural machine translation )
Adrien Bardet | Fethi Bougares | Loïc Barrault
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts

L’apprentissage par transfert est une solution au problème de l’apprentissage de systèmes de traduction automatique neuronaux pour des paires de langues peu dotées. Dans cet article, nous proposons une analyse de cette méthode. Nous souhaitons évaluer l’impact de la quantité de données et celui de la proximité des langues impliquées pour obtenir le meilleur transfert possible. Nous prenons en compte ces deux paramètres non seulement pour une tâche de traduction “classique” mais également lorsque les corpus de données font défaut. Enfin, il s’agit de proposer une approche où volume de données et proximité des langues sont combinées afin de ne plus avoir à trancher entre ces deux éléments.

2018

pdf bib
Findings of the Third Shared Task on Multimodal Machine Translation
Loïc Barrault | Fethi Bougares | Lucia Specia | Chiraag Lala | Desmond Elliott | Stella Frank
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

We present the results from the third shared task on multimodal machine translation. In this task a source sentence in English is supplemented by an image and participating systems are required to generate a translation for such a sentence into German, French or Czech. The image can be used in addition to (or instead of) the source sentence. This year the task was extended with a third target language (Czech) and a new test set. In addition, a variant of this task was introduced with its own test set where the source sentence is given in multiple languages: English, French and German, and participating systems are required to generate a translation in Czech. Seven teams submitted 45 different systems to the two variants of the task. Compared to last year, the performance of the multimodal submissions improved, but text-only systems remain competitive.

pdf bib
LIUM-CVC Submissions for WMT18 Multimodal Translation Task
Ozan Caglayan | Adrien Bardet | Fethi Bougares | Loïc Barrault | Kai Wang | Marc Masana | Luis Herranz | Joost van de Weijer
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the multimodal Neural Machine Translation systems developed by LIUM and CVC for WMT18 Shared Task on Multimodal Translation. This year we propose several modifications to our previous multimodal attention architecture in order to better integrate convolutional features and refine them using encoder-side information. Our final constrained submissions ranked first for English→French and second for English→German language pairs among the constrained submissions according to the automatic evaluation metric METEOR.

2017

pdf bib
Sentiment Analysis of Tunisian Dialects: Linguistic Ressources and Experiments
Salima Medhaffar | Fethi Bougares | Yannick Estève | Lamia Hadrich-Belguith
Proceedings of the Third Arabic Natural Language Processing Workshop

Dialectal Arabic (DA) is significantly different from the Arabic language taught in schools and used in written communication and formal speech (broadcast news, religion, politics, etc.). There are many existing researches in the field of Arabic language Sentiment Analysis (SA); however, they are generally restricted to Modern Standard Arabic (MSA) or some dialects of economic or political interest. In this paper we are interested in the SA of the Tunisian Dialect. We utilize Machine Learning techniques to determine the polarity of comments written in Tunisian Dialect. First, we evaluate the SA systems performances with models trained using freely available MSA and Multi-dialectal data sets. We then collect and annotate a Tunisian Dialect corpus of 17.000 comments from Facebook. This corpus allows us a significant accuracy improvement compared to the best model trained on other Arabic dialects or MSA data. We believe that this first freely available corpus will be valuable to researchers working in the field of Tunisian Sentiment Analysis and similar areas.

pdf bib
Word Representations in Factored Neural Machine Translation
Franck Burlot | Mercedes García-Martínez | Loïc Barrault | Fethi Bougares | François Yvon
Proceedings of the Second Conference on Machine Translation

pdf bib
Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description
Desmond Elliott | Stella Frank | Loïc Barrault | Fethi Bougares | Lucia Specia
Proceedings of the Second Conference on Machine Translation

pdf bib
LIUM Machine Translation Systems for WMT17 News Translation Task
Mercedes García-Martínez | Ozan Caglayan | Walid Aransa | Adrien Bardet | Fethi Bougares | Loïc Barrault
Proceedings of the Second Conference on Machine Translation

pdf bib
LIUM-CVC Submissions for WMT17 Multimodal Translation Task
Ozan Caglayan | Walid Aransa | Adrien Bardet | Mercedes García-Martínez | Fethi Bougares | Loïc Barrault | Marc Masana | Luis Herranz | Joost van de Weijer
Proceedings of the Second Conference on Machine Translation

2016

pdf bib
Does Multimodality Help Human and Machine for Translation and Image Captioning?
Ozan Caglayan | Walid Aransa | Yaxing Wang | Marc Masana | Mercedes García-Martínez | Fethi Bougares | Loïc Barrault | Joost van de Weijer
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
SHEF-LIUM-NN: Sentence level Quality Estimation with Neural Network Features
Kashif Shah | Fethi Bougares | Loïc Barrault | Lucia Specia
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

2015

pdf bib
Investigating Continuous Space Language Models for Machine Translation Quality Estimation
Kashif Shah | Raymond W. M. Ng | Fethi Bougares | Lucia Specia
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
SHEF-NN: Translation Quality Estimation with Neural Networks
Kashif Shah | Varvara Logacheva | Gustavo Paetzold | Frederic Blain | Daniel Beck | Fethi Bougares | Lucia Specia
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
UMMU@QALB-2015 Shared Task: Character and Word level SMT pipeline for Automatic Error Correction of Arabic Text
Fethi Bougares | Houda Bouamor
Proceedings of the Second Workshop on Arabic Natural Language Processing

pdf bib
Incremental Adaptation Strategies for Neural Network Language Models
Alex Ter-Sarkisov | Holger Schwenk | Fethi Bougares | Loïc Barrault
Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality

pdf bib
Continuous Adaptation to User Feedback for Statistical Machine Translation
Frédéric Blain | Fethi Bougares | Amir Hazem | Loïc Barrault | Holger Schwenk
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf bib
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
Kyunghyun Cho | Bart van Merriënboer | Caglar Gulcehre | Dzmitry Bahdanau | Fethi Bougares | Holger Schwenk | Yoshua Bengio
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2012

pdf bib
Avancées dans le domaine de la transcription automatique par décodage guidé (Improvements on driven decoding system combination) [in French]
Fethi Bougares | Yannick Estève | Paul Deléglise | Mickael Rouvier | Georges Linarès
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 1: JEP

2011

pdf bib
LIUM’s systems for the IWSLT 2011 speech translation tasks
Anthony Rousseau | Fethi Bougares | Paul Deléglise | Holger Schwenk | Yannick Estève
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the three systems developed by the LIUM for the IWSLT 2011 evaluation campaign. We participated in three of the proposed tasks, namely the Automatic Speech Recognition task (ASR), the ASR system combination task (ASR_SC) and the Spoken Language Translation task (SLT), since these tasks are all related to speech translation. We present the approaches and specificities we developed on each task.

2009

pdf bib
LIG approach for IWSLT09
Fethi Bougares | Laurent Besacier | Hervé Blanchon
Proceedings of the 6th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the LIG experiments in the context of IWSLT09 evaluation (Arabic to English Statistical Machine Translation task). Arabic is a morphologically rich language, and recent experimentations in our laboratory have shown that the performance of Arabic to English SMT systems varies greatly according to the Arabic morphological segmenters applied. Based on this observation, we propose to use simultaneously multiple segmentations for machine translation of Arabic. The core idea is to keep the ambiguity of the Arabic segmentation in the system input (using confusion networks or lattices). Then, we hope that the best segmentation will be chosen during MT decoding. The mathematics of this multiple segmentation approach are given. Practical implementations in the case of verbatim text translation as well as speech translation (outside of the scope of IWSLT09 this year) are proposed. Experiments conducted in the framework of IWSLT evaluation campaign show the potential of the multiple segmentation approach. The last part of this paper explains in detail the different systems submitted by LIG at IWSLT09 and the results obtained.