Tegawendé F. Bissyandé

Also published as: Tegawendé F. Bissyandé

2026

Neural Machine Translation for French–Mooré: Adapting Large Language Models to Low-Resource Languages
Walker Stanislas Rocksane COMPAORE | Maimouna Ouattara | Rodrique Kafando | Tegawendé F. Bissyandé | Abdoul Kader Kabore | Aminata Sabane
Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)

This work focuses on neural machine translation between French and Mooré, leveraging the capabilities of Large Language Models (LLMs) in a low-resource language context. Mooré is a local language widely spoken in Burkina Faso but remains underrepresented in digital resources. Alongside Mooré, French, now a working language, remains widely used in administration, education, justice, etc. The coexistence of these two languages creates a growing demand for effective translation tools. However, Mooré, like many low-resource languages, poses significant challenges for machine translation due to the scarcity of parallel corpora and its complex morphology.The main objective of this work is to adapt LLMs for French–Mooré translation. Three pre-trained models were selected: No Language Left Behind (NLLB-200), mBART50, and AfroLM. A corpus of approximately 83,000 validated sentence pairs was compiled from an initial collection of 97,060 pairs through pre-processing, semantic filtering, and human evaluation. Specific adaptations to tokenizers and model architectures were applied to improve translation quality.The results show that the fine-tuned NLLB model outperforms the others, highlighting the importance of native language support. mBART50 achieves comparable performance after fine-tuning, while AfroLM remains less effective. Despite existing limitations, this study demonstrates the potential of fine-tuned LLMs for African low-resource languages.

pdf bib abs

Contributing to Speech-to-Speech Translation for African Low-Resource Languages : Study of French-Mooré Pair
Fayçal S. A. Ouedraogo | Maimouna Ouattara | Rodrique Kafando | Abdoul Kader Kabore | Aminata Sabane | Tegawendé F. Bissyandé
Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)

Most of African low-resource languages are primarily spoken rather than written and lack large, standardized textual resources. In many communities, low literacy rates and limited access to formal education mean that text-based translation technologies alone are insufficient for effective communication. As a result, speech-to-speech translation systems play a crucial role by enabling direct and natural interaction across languages without requiring reading or writing skills. Such systems are essential for improving access to information, public services, healthcare, and education. The goal of our work is to build powerful transcription and speech synthesis models for Mooré language. Then, these models have been used to build a cascaded voice translation system between French and Mooré, since we already got a French-Mooré machine translation model. We collected Mooré audio-text pairs, reaching a total audio duration of 150 hours. Then, We fine-tuned Orpheus-3B and XTTS-v2 for speech synthesis and Wav2Vec-Bert-2.0 for transcription task. After fine-tuning and evaluation by 36 Mooré native speakers, XTTS-v2 achieved a MOS of 4.36 out of 5 compared to 3.47 out of 5 for Orpheus-3B. The UTMOS evaluation resulted in 3.47 out of 5 for XTTS-v2 and 2.80 out of 5 for Orpheus-3B. The A/B tests revealed that the evaluators preferred XTTS-v2 Mooré audios in 77.8% of cases compared to 22.2% for Orpheus-3B. After fine-tuning on Mooré, Wav2Vec-Bert-2.0 achieved a WER of 4.24% and a CER of 1.11%. Using these models, we successfully implemented a French-Mooré Speech-to-Speech Translation system.

pdf bib abs

Small language models (SLMs) offer computationally efficient alternatives to large language models, yet their translation quality for low-resource languages (LRLs) remains severely limited. This work presents the first large-scale evaluation of SLMs across 200 languages, revealing systematic underperformance in LRLs and identifying key sources of linguistic disparity. We show that knowledge distillation from strong teacher models using predominantly monolingual LRL data substantially boosts SLM translation quality—often enabling 2B–3B models to match or surpass systems up to 70B parameters. Our study highlights three core findings: (1) a comprehensive benchmark exposing the limitations of SLMs on 200 languages; (2) evidence that LRL-focused distillation improves translation without inducing catastrophic forgetting, with full-parameter fine-tuning and decoder-only teachers outperforming LoRA and encoder–decoder approaches; and (3) consistent cross-lingual gains demonstrating the scalability and robustness of the method. These results establish an effective, low-cost pathway for improving LRL translation and provide practical guidance for deploying SLMs in truly low-resource settings.

2025

pdf bib abs

Recent advancements in large language models (LLMs) have significantly improved software development automation, including bug localization, code synthesis, program repair, and test generation. However, most prior work on program repair focuses on isolated elements, such as classes or functions, neglecting their interdependencies, which limits repair accuracy. We present SynFix, a RelationGraph-based approach that integrates LLMs with structural search and synchronization techniques for coordinated program repair across codebases. SynFix constructs a RelationGraph to capture relationships among classes, functions, variables, and their interactions (e.g., imports, inheritance, dependencies). Each RelationGraph node includes detailed code descriptions to help LLMs understand root causes and retrieve relevant contexts. By analyzing one-hop nodes in the RelationGraph, SynFixensures repairs account for dependent updates across components. Patch validation is conducted using regression tests from the SWE-bench benchmark suite. Evaluated on SWE-bench datasets, SynFix resolves 52.33% of issues in SWE-bench-lite (300 GitHub issues), 55.8% in SWE-bench-verified (500 issues), and 29.86% in SWE-bench-full (2,294 issues), outperforming baselines such as Swe-Agent, Agentless and AutoCodeRover. The codebase is available at https://anonymous.4open.science/r/AutoFix-EC86/.

pdf bib abs

Bridging Literacy Gaps in African Informal Business Management with Low-Resource Conversational Agents
Maimouna Ouattara | Abdoul Kader Kaboré | Jacques Klein | Tegawendé F. Bissyandé
Proceedings of the First Workshop on Language Models for Low-Resource Languages

Position paper: In many African countries, the informal business sector represents the backbone of the economy, providing essential livelihoods and opportunities where formal employment is limited. Despite, however, the growing adoption of digital tools, entrepreneurs in this sector often face significant challenges due to lack of literacy and language barriers. These barriers not only limit accessibility but also increase the risk of fraud and financial insecurity. This position paper explores the potential of conversational agents (CAs) adapted to low-resource languages (LRLs), focusing specifically on Mooré, a language widely spoken in Burkina Faso. By enabling natural language interactions in local languages, AI-driven conversational agents offer a promising solution to enable informal traders to manage their financial transactions independently, thus promoting greater autonomy and security in business, while providing a step towards formalization of their business. Our study examines the main challenges in developing AI for African languages, including data scarcity and linguistic diversity, and reviews viable strategies for addressing them, such as cross-lingual transfer learning and data augmentation techniques.

2024

pdf bib abs

Soft Prompt Tuning (SPT) is a parameter-efficient method for adapting pre-trained language models (PLMs) to specific tasks by inserting learnable embeddings, or soft prompts, at the input layer of the PLM, without modifying its parameters. This paper investigates the potential of SPT for cross-lingual transfer. Unlike previous studies on SPT for cross-lingual transfer that often fine-tune both the soft prompt and the model parameters, we adhere to the original intent of SPT by keeping the model parameters frozen and only training the soft prompt. This does not only reduce the computational cost and storage overhead of full-model fine-tuning, but we also demonstrate that this very parameter efficiency intrinsic to SPT can enhance cross-lingual transfer performance to linguistically distant languages. Moreover, we explore how different factors related to the prompt, such as the length or its reparameterization, affect cross-lingual transfer performance.

pdf bib abs

Code review, which aims at ensuring the overall quality and reliability of software, is a cornerstone of software development. Unfortunately, while crucial, Code review is a labor-intensive process that the research community is looking to automate. Existing automated methods rely on single input-output generative models and thus generally struggle to emulate the collaborative nature of code review. This work introduces CodeAgent, a novel multi-agent Large Language Model (LLM) system for code review automation. CodeAgent incorporates a supervisory agent, QA-Checker, to ensure that all the agents’ contributions address the initial review question. We evaluated CodeAgent on critical code review tasks: (1) detect inconsistencies between code changes and commit messages, (2) identify vulnerability introductions, (3) validate code style adherence, and (4) suggest code revisions. The results demonstrate CodeAgent’s effectiveness, contributing to a new state-of-the-art in code review automation. Our data and code are publicly available (https://github.com/Daniel4SE/codeagent).

2020

pdf bib abs

Evaluating Pretrained Transformer-based Models on the Task of Fine-Grained Named Entity Recognition
Cedric Lothritz | Kevin Allix | Lisa Veiber | Tegawendé F. Bissyandé | Jacques Klein
Proceedings of the 28th International Conference on Computational Linguistics

Named Entity Recognition (NER) is a fundamental Natural Language Processing (NLP) task and has remained an active research field. In recent years, transformer models and more specifically the BERT model developed at Google revolutionised the field of NLP. While the performance of transformer-based approaches such as BERT has been studied for NER, there has not yet been a study for the fine-grained Named Entity Recognition (FG-NER) task. In this paper, we compare three transformer-based models (BERT, RoBERTa, and XLNet) to two non-transformer-based models (CRF and BiLSTM-CNN-CRF). Furthermore, we apply each model to a multitude of distinct domains. We find that transformer-based models incrementally outperform the studied non-transformer-based models in most domains with respect to the F1 score. Furthermore, we find that the choice of domains significantly influenced the performance regardless of the respective data size or the model chosen.

Venues

LoResMT1

MOOMIN1

Fix author