Log-linear Models for Uyghur Segmentation in Spoken Language Translation

Chenggang Mi, Yating Yang, Rui Dong, Xi Zhou, Lei Wang, Xiao Li, Tonghai Jiang


Abstract
To alleviate data sparsity in spoken Uyghur machine translation, we proposed a log-linear based morphological segmentation approach. Instead of learning model only from monolingual annotated corpus, this approach optimizes Uyghur segmentation for spoken translation based on both bilingual and monolingual corpus. Our approach relies on several features such as traditional conditional random field (CRF) feature, bilingual word alignment feature and monolingual suffixword co-occurrence feature. Experimental results shown that our proposed segmentation model for Uyghur spoken translation achieved 1.6 BLEU score improvements compared with the state-of-the-art baseline.
Anthology ID:
R17-1065
Volume:
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
Month:
September
Year:
2017
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
492–500
Language:
URL:
https://doi.org/10.26615/978-954-452-049-6_065
DOI:
10.26615/978-954-452-049-6_065
Bibkey:
Cite (ACL):
Chenggang Mi, Yating Yang, Rui Dong, Xi Zhou, Lei Wang, Xiao Li, and Tonghai Jiang. 2017. Log-linear Models for Uyghur Segmentation in Spoken Language Translation. In Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pages 492–500, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Log-linear Models for Uyghur Segmentation in Spoken Language Translation (Mi et al., RANLP 2017)
Copy Citation:
PDF:
https://doi.org/10.26615/978-954-452-049-6_065