TOXIFRENCH: Benchmarking and Enhancing Language Models via CoT Fine-Tuning for French Toxicity Detection

Axel Delaval; Shujian Yang; Haicheng Wang; Han Qiu; Jialiang Lu

TOXIFRENCH: Benchmarking and Enhancing Language Models via CoT Fine-Tuning for French Toxicity Detection

Axel Delaval, Shujian Yang, Haicheng Wang, Han Qiu, Jialiang LU

Abstract

Detecting toxic content using language models is crucial yet challenging. While substantial progress has been made in English, toxicity detection in French remains underdeveloped, primarily due to the lack of culturally relevant, human-annotated, large-scale datasets. In this work, we release TOXIFRENCH, a dataset of 53,622 French online comments, together with a 1,388-sample balanced benchmark split for systematic evaluation. The dataset is constructed via a semi-automated annotation pipeline that reduces manual labeling to only 10% through high-confidence LLM-based pre-annotation and human verification, while ensuring statistically near-perfect alignment with human-only annotation. We then benchmark a broad range of models and uncover a counterintuitive insight: Small Language Models (SLMs) often surpass larger models in robustness and generalization on this task. Motivated by this finding, we propose a novel Chain-of-Thought (CoT) fine-tuning strategy using a dynamic weighted loss that progressively emphasizes the model’s final decision, significantly improving faithfulness. Our fine-tuned 4B model (Qwen3-4B) achieves state-of-the-art performance on the benchmark, improving its balanced accuracy by 10% over its baseline and achieving better performance than GPT-4o and DeepSeek-R1 on our benchmark, while successfully retaining cross-lingual capabilities.

Anthology ID:: 2026.findings-acl.1074
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 21354–21375
Language:
URL:: https://aclanthology.org/2026.findings-acl.1074/
DOI:
Bibkey:
Cite (ACL):: Axel Delaval, Shujian Yang, Haicheng Wang, Han Qiu, and Jialiang LU. 2026. TOXIFRENCH: Benchmarking and Enhancing Language Models via CoT Fine-Tuning for French Toxicity Detection. In Findings of the Association for Computational Linguistics: ACL 2026, pages 21354–21375, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: TOXIFRENCH: Benchmarking and Enhancing Language Models via CoT Fine-Tuning for French Toxicity Detection (Delaval et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.1074.pdf
Checklist:: 2026.findings-acl.1074.checklist.pdf

PDF Cite Search Checklist Fix data