Escaping the Probability Trap: Mitigating Semantic Drift in Cantonese-Mandarin Translation

Yuzhi Liang, Fangqi Chen


Abstract
Fine-tuning multilingual models for low-resource dialect translation frequently encounters a “plausibility over faithfulness” dilemma, resulting in severe semantic drift on dialect-specific tokens. We term this phenomenon the “Probability Trap,” where models prioritize statistical fluency over semantic fidelity. To address this, we propose MVS-Rank (Multi-View Scoring Reranking), a generate-then-rerank framework that decouples evaluation from generation. Our method assesses translation candidates through three complementary perspectives: (1) Source-Side Faithfulness via a Reverse Translation Model to anchor semantic fidelity; (2) Local Fluency using Masked Language Models to ensure syntactic precision; and (3) Global Fluency leveraging Large Language Models to capture discourse coherence. Extensive experiments on Cantonese-Mandarin benchmarks demonstrate that MVS-Rank achieves state-of-the-art performance, significantly outperforming strong fine-tuning baselines by effectively rectifying hallucinations while maintaining high fluency.
Anthology ID:
2026.loreslm-1.41
Volume:
Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Hansi Hettiarachchi, Tharindu Ranasinghe, Alistair Plum, Paul Rayson, Ruslan Mitkov, Mohamed Gaber, Damith Premasiri, Fiona Anting Tan, Lasitha Uyangodage
Venue:
LoResLM
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
471–483
Language:
URL:
https://aclanthology.org/2026.loreslm-1.41/
DOI:
Bibkey:
Cite (ACL):
Yuzhi Liang and Fangqi Chen. 2026. Escaping the Probability Trap: Mitigating Semantic Drift in Cantonese-Mandarin Translation. In Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026), pages 471–483, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Escaping the Probability Trap: Mitigating Semantic Drift in Cantonese-Mandarin Translation (Liang & Chen, LoResLM 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.loreslm-1.41.pdf