Creating a Large-Scale Arabic to French Statistical MachineTranslation System

Saša Hasan, Anas El Isbihani, Hermann Ney


Abstract
In this work, the creation of a large-scale Arabic to French statistical machine translation system is presented. We introduce all necessary steps from corpus aquisition, preprocessing the data to training and optimizing the system and eventual evaluation. Since no corpora existed previously, we collected large amounts of data from the web. Arabic word segmentation was crucial to reduce the overall number of unknown words. We describe the phrase-based SMT system used for training and generation of the translation hypotheses. Results on the second CESTA evaluation campaign are reported. The setting was inthe medical domain. The prototype reaches a favorable BLEU score of40.8%.
Anthology ID:
L06-1237
Volume:
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Month:
May
Year:
2006
Address:
Genoa, Italy
Editors:
Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/405_pdf.pdf
DOI:
Bibkey:
Cite (ACL):
Saša Hasan, Anas El Isbihani, and Hermann Ney. 2006. Creating a Large-Scale Arabic to French Statistical MachineTranslation System. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
Cite (Informal):
Creating a Large-Scale Arabic to French Statistical MachineTranslation System (Hasan et al., LREC 2006)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/405_pdf.pdf