Statistical Machine Transliteration Baselines for NEWS 2018

Snigdha Singhania, Minh Nguyen, Gia H. Ngo, Nancy Chen


Abstract
This paper reports the results of our trans-literation experiments conducted on NEWS 2018 Shared Task dataset. We focus on creating the baseline systems trained using two open-source, statistical transliteration tools, namely Sequitur and Moses. We discuss the pre-processing steps performed on this dataset for both the systems. We also provide a re-ranking system which uses top hypotheses from Sequitur and Moses to create a consolidated list of transliterations. The results obtained from each of these models can be used to present a good starting point for the participating teams.
Anthology ID:
W18-2410
Volume:
Proceedings of the Seventh Named Entities Workshop
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Nancy Chen, Rafael E. Banchs, Xiangyu Duan, Min Zhang, Haizhou Li
Venue:
NEWS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
74–78
Language:
URL:
https://aclanthology.org/W18-2410
DOI:
10.18653/v1/W18-2410
Bibkey:
Cite (ACL):
Snigdha Singhania, Minh Nguyen, Gia H. Ngo, and Nancy Chen. 2018. Statistical Machine Transliteration Baselines for NEWS 2018. In Proceedings of the Seventh Named Entities Workshop, pages 74–78, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
Statistical Machine Transliteration Baselines for NEWS 2018 (Singhania et al., NEWS 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-2410.pdf