Cheese it up: CamemBERT Outperforms Large Language Models for Identification of French Multi-word Expressions

Sergei Bagdasarov; Diego Alves; Elke Teich

Cheese it up: CamemBERT Outperforms Large Language Models for Identification of French Multi-word Expressions

Sergei Bagdasarov, Diego Alves, Elke Teich

Abstract

In recent years, language models, both encoder-only and generative, have been applied to a variety of downstream NLP tasks, includingsequence labeling tasks like automatic multi-word expression identification (MWEI). Multiple studies show that, in general, fine-tunedencoder-only models like BERT tend to outperform pretrained generative LLMs on downstream tasks (Arzideh et al., 2025; Ochoa et al.,2025; Bucher and Martini, 2024; Sebok et al., 2025). However, such comparisons are sparse for MWEI, in particular for French, in partdue to the lack of comprehensive gold-standard datasets. In this study, we address this research gap by comparing CamemBERT with gpt-oss and Qwen3 for MWEI, using the French subcorpus of the newly released PARSEME dataset. CamemBERT outperforms both LLMs by large margins in precision, recall, and F1. We complement this numerical evaluation with a qualitative analysis of prediction errors.

Anthology ID:: 2026.mwe-1.6
Volume:: Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)
Month:: March
Year:: 2026
Address:: Rabat, Marocco
Editors:: Atul Kr. Ojha, Verginica Barbu Mititelu, Mathieu Constant, Ivelina Stoyanova, A. Seza Doğruöz, Alexandre Rademaker
Venues:: MWE | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 54–60
Language:
URL:: https://aclanthology.org/2026.mwe-1.6/
DOI:
Bibkey:
Cite (ACL):: Sergei Bagdasarov, Diego Alves, and Elke Teich. 2026. Cheese it up: CamemBERT Outperforms Large Language Models for Identification of French Multi-word Expressions. In Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026), pages 54–60, Rabat, Marocco. Association for Computational Linguistics.
Cite (Informal):: Cheese it up: CamemBERT Outperforms Large Language Models for Identification of French Multi-word Expressions (Bagdasarov et al., MWE 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.mwe-1.6.pdf

PDF Cite Search Fix data