Building a Better Bitext for Structurally Different Languages through Self-training

Jungyeul Park, Loïc Dugast, Jeen-Pyo Hong, Chang-Uk Shin, Jeong-Won Cha


Abstract
We propose a novel method to bootstrap the construction of parallel corpora for new pairs of structurally different languages. We do so by combining the use of a pivot language and self-training. A pivot language enables the use of existing translation models to bootstrap the alignment and a self-training procedure enables to achieve better alignment, both at the document and sentence level. We also propose several evaluation methods for the resulting alignment.
Anthology ID:
W17-5601
Volume:
Proceedings of the First Workshop on Curation and Applications of Parallel and Comparable Corpora
Month:
November
Year:
2017
Address:
Taipei, Taiwan
Editors:
Haithem Afli, Chao-Hong Liu
Venue:
WS
SIG:
Publisher:
Asian Federation of Natural Language Processing
Note:
Pages:
1–10
Language:
URL:
https://aclanthology.org/W17-5601/
DOI:
Bibkey:
Cite (ACL):
Jungyeul Park, Loïc Dugast, Jeen-Pyo Hong, Chang-Uk Shin, and Jeong-Won Cha. 2017. Building a Better Bitext for Structurally Different Languages through Self-training. In Proceedings of the First Workshop on Curation and Applications of Parallel and Comparable Corpora, pages 1–10, Taipei, Taiwan. Asian Federation of Natural Language Processing.
Cite (Informal):
Building a Better Bitext for Structurally Different Languages through Self-training (Park et al., 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-5601.pdf