Abstract
The WMT 2018 Parallel Corpus Filtering Task aims to test various methods of filtering a noisy parallel corpus, to make it useful for training machine translation systems. We describe the AFRL submissions, including their preprocessing methods and quality metrics. Numerical results indicate relative benefits of different options and show where our methods are competitive.- Anthology ID:
- W18-6475
- Volume:
- Proceedings of the Third Conference on Machine Translation: Shared Task Papers
- Month:
- October
- Year:
- 2018
- Address:
- Belgium, Brussels
- Editors:
- Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Lucia Specia, Marco Turchi, Karin Verspoor
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 872–876
- Language:
- URL:
- https://aclanthology.org/W18-6475
- DOI:
- 10.18653/v1/W18-6475
- Bibkey:
- Cite (ACL):
- Grant Erdmann and Jeremy Gwinnup. 2018. Coverage and Cynicism: The AFRL Submission to the WMT 2018 Parallel Corpus Filtering Task. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 872–876, Belgium, Brussels. Association for Computational Linguistics.
- Cite (Informal):
- Coverage and Cynicism: The AFRL Submission to the WMT 2018 Parallel Corpus Filtering Task (Erdmann & Gwinnup, WMT 2018)
- Copy Citation:
- PDF:
- https://aclanthology.org/W18-6475.pdf
- Data
- WMT 2018, WMT 2018 News
Export citation
@inproceedings{erdmann-gwinnup-2018-coverage, title = "Coverage and Cynicism: The {AFRL} Submission to the {WMT} 2018 Parallel Corpus Filtering Task", author = "Erdmann, Grant and Gwinnup, Jeremy", editor = "Bojar, Ond{\v{r}}ej and Chatterjee, Rajen and Federmann, Christian and Fishel, Mark and Graham, Yvette and Haddow, Barry and Huck, Matthias and Yepes, Antonio Jimeno and Koehn, Philipp and Monz, Christof and Negri, Matteo and N{\'e}v{\'e}ol, Aur{\'e}lie and Neves, Mariana and Post, Matt and Specia, Lucia and Turchi, Marco and Verspoor, Karin", booktitle = "Proceedings of the Third Conference on Machine Translation: Shared Task Papers", month = oct, year = "2018", address = "Belgium, Brussels", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/W18-6475", doi = "10.18653/v1/W18-6475", pages = "872--876", abstract = "The WMT 2018 Parallel Corpus Filtering Task aims to test various methods of filtering a noisy parallel corpus, to make it useful for training machine translation systems. We describe the AFRL submissions, including their preprocessing methods and quality metrics. Numerical results indicate relative benefits of different options and show where our methods are competitive.", }
<?xml version="1.0" encoding="UTF-8"?> <modsCollection xmlns="http://www.loc.gov/mods/v3"> <mods ID="erdmann-gwinnup-2018-coverage"> <titleInfo> <title>Coverage and Cynicism: The AFRL Submission to the WMT 2018 Parallel Corpus Filtering Task</title> </titleInfo> <name type="personal"> <namePart type="given">Grant</namePart> <namePart type="family">Erdmann</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Jeremy</namePart> <namePart type="family">Gwinnup</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2018-10</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <relatedItem type="host"> <titleInfo> <title>Proceedings of the Third Conference on Machine Translation: Shared Task Papers</title> </titleInfo> <name type="personal"> <namePart type="given">Ondřej</namePart> <namePart type="family">Bojar</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Rajen</namePart> <namePart type="family">Chatterjee</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Christian</namePart> <namePart type="family">Federmann</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Mark</namePart> <namePart type="family">Fishel</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Yvette</namePart> <namePart type="family">Graham</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Barry</namePart> <namePart type="family">Haddow</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Matthias</namePart> <namePart type="family">Huck</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Antonio</namePart> <namePart type="given">Jimeno</namePart> <namePart type="family">Yepes</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Philipp</namePart> <namePart type="family">Koehn</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Christof</namePart> <namePart type="family">Monz</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Matteo</namePart> <namePart type="family">Negri</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Aurélie</namePart> <namePart type="family">Névéol</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Mariana</namePart> <namePart type="family">Neves</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Matt</namePart> <namePart type="family">Post</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Lucia</namePart> <namePart type="family">Specia</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Marco</namePart> <namePart type="family">Turchi</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Karin</namePart> <namePart type="family">Verspoor</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>Association for Computational Linguistics</publisher> <place> <placeTerm type="text">Belgium, Brussels</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> </relatedItem> <abstract>The WMT 2018 Parallel Corpus Filtering Task aims to test various methods of filtering a noisy parallel corpus, to make it useful for training machine translation systems. We describe the AFRL submissions, including their preprocessing methods and quality metrics. Numerical results indicate relative benefits of different options and show where our methods are competitive.</abstract> <identifier type="citekey">erdmann-gwinnup-2018-coverage</identifier> <identifier type="doi">10.18653/v1/W18-6475</identifier> <location> <url>https://aclanthology.org/W18-6475</url> </location> <part> <date>2018-10</date> <extent unit="page"> <start>872</start> <end>876</end> </extent> </part> </mods> </modsCollection>
%0 Conference Proceedings %T Coverage and Cynicism: The AFRL Submission to the WMT 2018 Parallel Corpus Filtering Task %A Erdmann, Grant %A Gwinnup, Jeremy %Y Bojar, Ondřej %Y Chatterjee, Rajen %Y Federmann, Christian %Y Fishel, Mark %Y Graham, Yvette %Y Haddow, Barry %Y Huck, Matthias %Y Yepes, Antonio Jimeno %Y Koehn, Philipp %Y Monz, Christof %Y Negri, Matteo %Y Névéol, Aurélie %Y Neves, Mariana %Y Post, Matt %Y Specia, Lucia %Y Turchi, Marco %Y Verspoor, Karin %S Proceedings of the Third Conference on Machine Translation: Shared Task Papers %D 2018 %8 October %I Association for Computational Linguistics %C Belgium, Brussels %F erdmann-gwinnup-2018-coverage %X The WMT 2018 Parallel Corpus Filtering Task aims to test various methods of filtering a noisy parallel corpus, to make it useful for training machine translation systems. We describe the AFRL submissions, including their preprocessing methods and quality metrics. Numerical results indicate relative benefits of different options and show where our methods are competitive. %R 10.18653/v1/W18-6475 %U https://aclanthology.org/W18-6475 %U https://doi.org/10.18653/v1/W18-6475 %P 872-876
Markdown (Informal)
[Coverage and Cynicism: The AFRL Submission to the WMT 2018 Parallel Corpus Filtering Task](https://aclanthology.org/W18-6475) (Erdmann & Gwinnup, WMT 2018)
- Coverage and Cynicism: The AFRL Submission to the WMT 2018 Parallel Corpus Filtering Task (Erdmann & Gwinnup, WMT 2018)
ACL
- Grant Erdmann and Jeremy Gwinnup. 2018. Coverage and Cynicism: The AFRL Submission to the WMT 2018 Parallel Corpus Filtering Task. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 872–876, Belgium, Brussels. Association for Computational Linguistics.