Simple Morphology, Complex Models: A Benchmark Study and Error Analysis of POS Tagging for Martinican Creole

Ludovic Mompelat


Abstract
Part-of-speech (POS) tagging is a foundational task in NLP pipelines, but its development for Creole languages remains limited due to sparse annotated data and structural divergence from high-resource languages. This paper presents the first POS tagging benchmarks for Martinican Creole (MC) as well as a linguistically motivated evaluation framework, comparing three fine-tuned transformer-based models (mBERT, XLM-Roberta, and CreoleVal). Rather than focusing solely on aggregate metrics, we perform detailed error analysis, examining model specific confusion patterns, lexical disambiguation, and out-of-vocabulary behavior. Our results yield F1 scores of 0.92 for mBERT (best on the X tag and connector distinctions), 0.91 for XLM-Roberta (strongest on numeric tags and conjunction structures), and 0.94 for CreoleVal (leading on both functional and content categories and lowest OOV error rate). We propose future directions involving model fusion, targeted and linguistically motivated annotation, and reward-guided Large Language Models data augmentation to improve our current tagger. Our linguistically grounded error analysis for MC exposes key tagging challenges and demonstrates how targeted annotation and ensemble methods can meaningfully boost accuracy in under-resourced settings.
Anthology ID:
2025.clasp-main.1
Volume:
Proceedings of the 2025 CLASP Conference on Language models And RePresentations (LARP)
Month:
September
Year:
2025
Address:
Gothenburg, Sweden
Editors:
Nikolai Ilinykh, Mattias Appelgren, Erik Lagerstedt
Venues:
CLASP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–10
Language:
URL:
https://aclanthology.org/2025.clasp-main.1/
DOI:
Bibkey:
Cite (ACL):
Ludovic Mompelat. 2025. Simple Morphology, Complex Models: A Benchmark Study and Error Analysis of POS Tagging for Martinican Creole. In Proceedings of the 2025 CLASP Conference on Language models And RePresentations (LARP), pages 1–10, Gothenburg, Sweden. Association for Computational Linguistics.
Cite (Informal):
Simple Morphology, Complex Models: A Benchmark Study and Error Analysis of POS Tagging for Martinican Creole (Mompelat, CLASP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.clasp-main.1.pdf