Developing an Open-domain English-Farsi Translation System Using AFEC: Amirkabir Bilingual Farsi-English Corpus

Fattaneh Jabbari, Somayeh Bakshaei, Seyyed Mohammad Mohammadzadeh Ziabary, Shahram Khadivi


Abstract
The translation quality of Statistical Machine Translation (SMT) depends on the amount of input data especially for morphologically rich languages. Farsi (Persian) language is such a language which has few NLP resources. It also suffers from the non-standard written characters which causes a large variety in the written form of each character. Moreover, the structural difference between Farsi and English results in long range reorderings which cannot be modeled by common SMT reordering models. Here, we try to improve the existing English-Farsi SMT system focusing on these challenges first by expanding our bilingual limited-domain corpus to an open-domain one. Then, to alleviate the character variations, a new text normalization algorithm is offered. Finally, some hand-crafted rules are applied to reduce the structural differences. Using the new corpus, the experimental results showed 8.82% BLEU improvement by applying new normalization method and 9.1% BLEU when rules are used.
Anthology ID:
2012.amta-caas14.3
Volume:
Fourth Workshop on Computational Approaches to Arabic-Script-based Languages
Month:
November 1
Year:
2012
Address:
San Diego, California, USA
Editors:
Ali Farghaly, Farhad Oroumchian
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
17–23
Language:
URL:
https://aclanthology.org/2012.amta-caas14.3
DOI:
Bibkey:
Cite (ACL):
Fattaneh Jabbari, Somayeh Bakshaei, Seyyed Mohammad Mohammadzadeh Ziabary, and Shahram Khadivi. 2012. Developing an Open-domain English-Farsi Translation System Using AFEC: Amirkabir Bilingual Farsi-English Corpus. In Fourth Workshop on Computational Approaches to Arabic-Script-based Languages, pages 17–23, San Diego, California, USA. Association for Machine Translation in the Americas.
Cite (Informal):
Developing an Open-domain English-Farsi Translation System Using AFEC: Amirkabir Bilingual Farsi-English Corpus (Jabbari et al., AMTA 2012)
Copy Citation:
PDF:
https://aclanthology.org/2012.amta-caas14.3.pdf