Reconsidering SMT Over NMT for Closely Related Languages: A Case Study of Persian-Hindi Pair

Waisullah Yousofi, Pushpak Bhattacharyya


Abstract
This paper demonstrates that Phrase-Based Statistical Machine Translation (PBSMT) can outperform Transformer-based Neural Machine Translation (NMT) in moderate-resource scenarios, specifically for structurally similar languages, Persian-Hindi pair in our case. Despite the Transformer architecture’s typical preference for large parallel corpora, our results show that PBSMT achieves a BLEU score of 66.32, significantly exceeding the Transformer-NMT score of 53.7 ingesting the same dataset.
Anthology ID:
2024.icon-1.17
Volume:
Proceedings of the 21st International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2024
Address:
AU-KBC Research Centre, Chennai, India
Editors:
Sobha Lalitha Devi, Karunesh Arora
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
149–156
Language:
URL:
https://aclanthology.org/2024.icon-1.17/
DOI:
Bibkey:
Cite (ACL):
Waisullah Yousofi and Pushpak Bhattacharyya. 2024. Reconsidering SMT Over NMT for Closely Related Languages: A Case Study of Persian-Hindi Pair. In Proceedings of the 21st International Conference on Natural Language Processing (ICON), pages 149–156, AU-KBC Research Centre, Chennai, India. NLP Association of India (NLPAI).
Cite (Informal):
Reconsidering SMT Over NMT for Closely Related Languages: A Case Study of Persian-Hindi Pair (Yousofi & Bhattacharyya, ICON 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.icon-1.17.pdf