An Open-Source Heavily Multilingual Translation Graph Extracted from Wiktionaries and Parallel Corpora

Valérie Hanoka, Benoît Sagot


Abstract
This paper describes YaMTG (Yet another Multilingual Translation Graph), a new open-source heavily multilingual translation database (over 664 languages represented) built using several sources, namely various wiktionaries and the OPUS parallel corpora (Tiedemann, 2009). We detail the translation extraction process for 21 wiktionary language editions, and provide an evaluation of the translations contained in YaMTG.
Anthology ID:
L14-1616
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3179–3186
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/792_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Valérie Hanoka and Benoît Sagot. 2014. An Open-Source Heavily Multilingual Translation Graph Extracted from Wiktionaries and Parallel Corpora. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 3179–3186, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
An Open-Source Heavily Multilingual Translation Graph Extracted from Wiktionaries and Parallel Corpora (Hanoka & Sagot, LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/792_Paper.pdf