Seeing Words Differently: Visual Embeddings for Robust English-Arabic Machine Translation

Mahdi Alshaikh Saleh; Irfan Ahmad

Seeing Words Differently: Visual Embeddings for Robust English-Arabic Machine Translation

Abstract

Context: Natural Language Processing (NLP) has become an essential field with widespread applications across domains such as Large Language Models (LLMs). One of the core applications of NLP is machine translation (MT). A major challenge in MT is handling out-of-vocabulary (OOV) words and spelling mistakes, which can lead to poor translation quality. Objective: This study compares traditional text-based embeddings with visual embeddings for English-to-Arabic translation. It investigates the effectiveness of each approach, especially in handling noisy inputs or OOV terms. Method: Using the IWSLT 2017 English-Arabic dataset, we trained a baseline transformer encoder-decoder model using standard text embeddings and compared it with models using several visual embeddings strategies, including vowel-removal preprocessing and trigram-based image rendering. The translated outputs were evaluated using BLEU scores. Results: show that although traditional BPE-based models achieve higher BLEU on clean data, visual embedding models are substantially more robust to spelling noise, retaining up to 2.4× higher BLEU scores at 50% character corruption.

Anthology ID:: 2026.abjadnlp-1.9
Volume:: Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Venues:: AbjadNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 66–74
Language:
URL:: https://aclanthology.org/2026.abjadnlp-1.9/
DOI:
Bibkey:
Cite (ACL):: Mahdi Alshaikh Saleh and Irfan Ahmad. 2026. Seeing Words Differently: Visual Embeddings for Robust English-Arabic Machine Translation. In Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script, pages 66–74, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Seeing Words Differently: Visual Embeddings for Robust English-Arabic Machine Translation (Alshaikh Saleh & Ahmad, AbjadNLP 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.abjadnlp-1.9.pdf

PDF Cite Search Fix data