Maria Regina Justina E. Estuar
2025
bAI-bAI: A Context-Aware Transliteration System for Baybayin Scripts
Jacob Simon D. Bernardo
|
Maria Regina Justina E. Estuar
Proceedings of the Second Workshop in South East Asian Language Processing
Baybayin, a pre-colonial writing system from the Philippines, has seen a resurgence in recent years. Research in computational linguistics has shown an increasing interest in Baybayin OCR, which focuses on the recognition and classification of script characters. However, existing studies face challenges with ambiguous Baybayin words that have multiple possible transliterations. This study introduces a disambiguation technique that employs word embeddings (WE) for contextual analysis and uses part-of-speech (POS) tagging as an initial filtering step. This approach is compared with an LLM method that prompts GPT-4o mini to determine the most appropriate transliteration given a sentence input. The proposed disambiguation process is integrated into existing Baybayin OCR systems to develop bAI-bAI, a context-aware Baybayin transliteration system capable of handling ambiguous words. Results show that incorporating POS as a filter does not significantly affect performance. The WE-Only method yields an accuracy of 77.46% and takes 5.35ms to process one sample while leveraging GPT-4o mini peaks at a higher accuracy of 90.52% but with a much longer runtime of 3280ms per sample. These findings present an opportunity to further explore and improve NLP approaches in disambiguation methods.