Romanized to Native Malayalam Script Transliteration Using an Encoder-Decoder Framework

Bajiyo Baiju, Kavya Manohar, Leena G. Pillai, Elizabeth Sherly


Abstract
In this work, we present the development of a reverse transliteration model to convert romanized Malayalam to native script using an encoder-decoder framework built with attention-based bidirectional Long Short Term Memory (Bi-LSTM) architecture. To train the model, we have used curated and combined collection of 4.3 million transliteration pairs derived from publicly available Indic language translitertion datasets, Dakshina and Aksharantar. We evaluated the model on two different test dataset provided by IndoNLP-2025-Shared-Task that contain, (1) General typing patterns and (2) Adhoc typing patterns, respectively. On the Test Set-1, we obtained a character error rate (CER) of 7.42%. However upon Test Set-2, with adhoc typing patterns, where most vowel indicators are missing, our model gave a CER of 22.8%.
Anthology ID:
2025.indonlp-1.20
Volume:
Proceedings of the First Workshop on Natural Language Processing for Indo-Aryan and Dravidian Languages
Month:
January
Year:
2025
Address:
Abu Dhabi
Editors:
Ruvan Weerasinghe, Isuri Anuradha, Deshan Sumanathilaka
Venues:
IndoNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
174–178
Language:
URL:
https://aclanthology.org/2025.indonlp-1.20/
DOI:
Bibkey:
Cite (ACL):
Bajiyo Baiju, Kavya Manohar, Leena G. Pillai, and Elizabeth Sherly. 2025. Romanized to Native Malayalam Script Transliteration Using an Encoder-Decoder Framework. In Proceedings of the First Workshop on Natural Language Processing for Indo-Aryan and Dravidian Languages, pages 174–178, Abu Dhabi. Association for Computational Linguistics.
Cite (Informal):
Romanized to Native Malayalam Script Transliteration Using an Encoder-Decoder Framework (Baiju et al., IndoNLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.indonlp-1.20.pdf