Audrey Durand

2026

Active Learning with Non-Uniform Costs for African Natural Language Processing
Bonaventure F. P. Dossou | Ines Arous | Audrey Durand | Jackie Chi Kit Cheung
Findings of the Association for Computational Linguistics: EACL 2026

Labeling datasets for African languages poses substantial challenges due to the diverse settings in which annotations are collected, leading to highly variable labeling costs. These costs vary with task complexity, annotator expertise, and data availability. Yet, most active learning (AL) frameworks assume uniform annotation costs, limiting their applicability in real-world, resource-constrained scenarios. To address this, we introduce KnapsackBALD, a novel cost-aware active learning method that integrates the BatchBALD acquisition strategy with a 0-1 Knapsack optimization objective to select informative and budget-efficient samples. We evaluate KnapsackBALD on the MasakhaNEWS dataset, a multilingual news classification benchmark covering 11 African languages. Our method consistently outperforms seven strong active learning baselines, including BALD, BatchBALD, and stochastic sampling variants such as PowerBALD and Softmax-BALD, across all three cost scenarios. The performance gap widens as annotation cost imbalances become more extreme, demonstrating the robustness of KnapsackBALD in different cost settings. These findings show that when annotation costs are explicitly heterogeneous, cost-sensitive acquisition is critical for effective active learning, as demonstrated in African Languages NLP and similar settings. Our code base is open-sourced here.

2020

pdf bib abs

A Robust Self-Learning Method for Fully Unsupervised Cross-Lingual Mappings of Word Embeddings: Making the Method Robustly Reproducible as Well
Nicolas Garneau | Mathieu Godbout | David Beauchemin | Audrey Durand | Luc Lamontagne
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper, we reproduce the experiments of Artetxe et al. (2018b) regarding the robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings. We show that the reproduction of their method is indeed feasible with some minor assumptions. We further investigate the robustness of their model by introducing four new languages that are less similar to English than the ones proposed by the original paper. In order to assess the stability of their model, we also conduct a grid search over sensible hyperparameters. We then propose key recommendations that apply to any research project in order to deliver fully reproducible research.

Co-authors

Mathieu Godbout 1

Luc Lamontagne 1

Venues

Findings1
LREC1

Fix author