Coverage and Cynicism: The AFRL Submission to the WMT 2018 Parallel Corpus Filtering Task

Grant Erdmann, Jeremy Gwinnup


Abstract
The WMT 2018 Parallel Corpus Filtering Task aims to test various methods of filtering a noisy parallel corpus, to make it useful for training machine translation systems. We describe the AFRL submissions, including their preprocessing methods and quality metrics. Numerical results indicate relative benefits of different options and show where our methods are competitive.
Anthology ID:
W18-6475
Volume:
Proceedings of the Third Conference on Machine Translation: Shared Task Papers
Month:
October
Year:
2018
Address:
Belgium, Brussels
Editors:
Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Lucia Specia, Marco Turchi, Karin Verspoor
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
872–876
Language:
URL:
https://aclanthology.org/W18-6475
DOI:
10.18653/v1/W18-6475
Bibkey:
Cite (ACL):
Grant Erdmann and Jeremy Gwinnup. 2018. Coverage and Cynicism: The AFRL Submission to the WMT 2018 Parallel Corpus Filtering Task. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 872–876, Belgium, Brussels. Association for Computational Linguistics.
Cite (Informal):
Coverage and Cynicism: The AFRL Submission to the WMT 2018 Parallel Corpus Filtering Task (Erdmann & Gwinnup, WMT 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-6475.pdf
Data
WMT 2018WMT 2018 News