Evaluating Pixel Language Models on Non-Standardized Languages

Alberto Muñoz-Ortiz; Verena Blaschke; Barbara Plank

Evaluating Pixel Language Models on Non-Standardized Languages

Alberto Muñoz-Ortiz, Verena Blaschke, Barbara Plank

Abstract

We explore the potential of pixel-based models for transfer learning from standard languages to dialects. These models convert text into images that are divided into patches, enabling a continuous vocabulary representation that proves especially useful for out-of-vocabulary words common in dialectal data. Using German as a case study, we compare the performance of pixel-based models to token-based models across various syntactic and semantic tasks. Our results show that pixel-based models outperform token-based models in part-of-speech tagging, dependency parsing and intent detection for zero-shot dialect evaluation by up to 26 percentage points in some scenarios, though not in Standard German. However, pixel-based models fall short in topic classification. These findings emphasize the potential of pixel-based models for handling dialectal data, though further research should be conducted to assess their effectiveness in various linguistic contexts.

Anthology ID:: 2025.coling-main.427
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6412–6419
Language:
URL:: https://aclanthology.org/2025.coling-main.427/
DOI:
Bibkey:
Cite (ACL):: Alberto Muñoz-Ortiz, Verena Blaschke, and Barbara Plank. 2025. Evaluating Pixel Language Models on Non-Standardized Languages. In Proceedings of the 31st International Conference on Computational Linguistics, pages 6412–6419, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: Evaluating Pixel Language Models on Non-Standardized Languages (Muñoz-Ortiz et al., COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-main.427.pdf

PDF Cite Search Fix data