Abstract
This work describes our submission to the WMT18 Parallel Corpus Filtering shared task. We use a slightly modified version of the Zipporah Corpus Filtering toolkit (Xu and Koehn, 2017), which computes an adequacy score and a fluency score on a sentence pair, and use a weighted sum of the scores as the selection criteria. This work differs from Zipporah in that we experiment with using the noisy corpus to be filtered to compute the combination weights, and thus avoids generating synthetic data as in standard Zipporah.- Anthology ID:
- W18-6479
- Volume:
- Proceedings of the Third Conference on Machine Translation: Shared Task Papers
- Month:
- October
- Year:
- 2018
- Address:
- Belgium, Brussels
- Editors:
- Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Lucia Specia, Marco Turchi, Karin Verspoor
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 896–899
- Language:
- URL:
- https://aclanthology.org/W18-6479
- DOI:
- 10.18653/v1/W18-6479
- Bibkey:
- Cite (ACL):
- Huda Khayrallah, Hainan Xu, and Philipp Koehn. 2018. The JHU Parallel Corpus Filtering Systems for WMT 2018. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 896–899, Belgium, Brussels. Association for Computational Linguistics.
- Cite (Informal):
- The JHU Parallel Corpus Filtering Systems for WMT 2018 (Khayrallah et al., WMT 2018)
- Copy Citation:
- PDF:
- https://aclanthology.org/W18-6479.pdf
Export citation
@inproceedings{khayrallah-etal-2018-jhu, title = "The {JHU} Parallel Corpus Filtering Systems for {WMT} 2018", author = "Khayrallah, Huda and Xu, Hainan and Koehn, Philipp", editor = "Bojar, Ond{\v{r}}ej and Chatterjee, Rajen and Federmann, Christian and Fishel, Mark and Graham, Yvette and Haddow, Barry and Huck, Matthias and Yepes, Antonio Jimeno and Koehn, Philipp and Monz, Christof and Negri, Matteo and N{\'e}v{\'e}ol, Aur{\'e}lie and Neves, Mariana and Post, Matt and Specia, Lucia and Turchi, Marco and Verspoor, Karin", booktitle = "Proceedings of the Third Conference on Machine Translation: Shared Task Papers", month = oct, year = "2018", address = "Belgium, Brussels", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/W18-6479", doi = "10.18653/v1/W18-6479", pages = "896--899", abstract = "This work describes our submission to the WMT18 Parallel Corpus Filtering shared task. We use a slightly modified version of the Zipporah Corpus Filtering toolkit (Xu and Koehn, 2017), which computes an adequacy score and a fluency score on a sentence pair, and use a weighted sum of the scores as the selection criteria. This work differs from Zipporah in that we experiment with using the noisy corpus to be filtered to compute the combination weights, and thus avoids generating synthetic data as in standard Zipporah.", }
<?xml version="1.0" encoding="UTF-8"?> <modsCollection xmlns="http://www.loc.gov/mods/v3"> <mods ID="khayrallah-etal-2018-jhu"> <titleInfo> <title>The JHU Parallel Corpus Filtering Systems for WMT 2018</title> </titleInfo> <name type="personal"> <namePart type="given">Huda</namePart> <namePart type="family">Khayrallah</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Hainan</namePart> <namePart type="family">Xu</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Philipp</namePart> <namePart type="family">Koehn</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2018-10</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <relatedItem type="host"> <titleInfo> <title>Proceedings of the Third Conference on Machine Translation: Shared Task Papers</title> </titleInfo> <name type="personal"> <namePart type="given">Ondřej</namePart> <namePart type="family">Bojar</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Rajen</namePart> <namePart type="family">Chatterjee</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Christian</namePart> <namePart type="family">Federmann</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Mark</namePart> <namePart type="family">Fishel</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Yvette</namePart> <namePart type="family">Graham</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Barry</namePart> <namePart type="family">Haddow</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Matthias</namePart> <namePart type="family">Huck</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Antonio</namePart> <namePart type="given">Jimeno</namePart> <namePart type="family">Yepes</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Philipp</namePart> <namePart type="family">Koehn</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Christof</namePart> <namePart type="family">Monz</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Matteo</namePart> <namePart type="family">Negri</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Aurélie</namePart> <namePart type="family">Névéol</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Mariana</namePart> <namePart type="family">Neves</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Matt</namePart> <namePart type="family">Post</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Lucia</namePart> <namePart type="family">Specia</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Marco</namePart> <namePart type="family">Turchi</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Karin</namePart> <namePart type="family">Verspoor</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>Association for Computational Linguistics</publisher> <place> <placeTerm type="text">Belgium, Brussels</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> </relatedItem> <abstract>This work describes our submission to the WMT18 Parallel Corpus Filtering shared task. We use a slightly modified version of the Zipporah Corpus Filtering toolkit (Xu and Koehn, 2017), which computes an adequacy score and a fluency score on a sentence pair, and use a weighted sum of the scores as the selection criteria. This work differs from Zipporah in that we experiment with using the noisy corpus to be filtered to compute the combination weights, and thus avoids generating synthetic data as in standard Zipporah.</abstract> <identifier type="citekey">khayrallah-etal-2018-jhu</identifier> <identifier type="doi">10.18653/v1/W18-6479</identifier> <location> <url>https://aclanthology.org/W18-6479</url> </location> <part> <date>2018-10</date> <extent unit="page"> <start>896</start> <end>899</end> </extent> </part> </mods> </modsCollection>
%0 Conference Proceedings %T The JHU Parallel Corpus Filtering Systems for WMT 2018 %A Khayrallah, Huda %A Xu, Hainan %A Koehn, Philipp %Y Bojar, Ondřej %Y Chatterjee, Rajen %Y Federmann, Christian %Y Fishel, Mark %Y Graham, Yvette %Y Haddow, Barry %Y Huck, Matthias %Y Yepes, Antonio Jimeno %Y Koehn, Philipp %Y Monz, Christof %Y Negri, Matteo %Y Névéol, Aurélie %Y Neves, Mariana %Y Post, Matt %Y Specia, Lucia %Y Turchi, Marco %Y Verspoor, Karin %S Proceedings of the Third Conference on Machine Translation: Shared Task Papers %D 2018 %8 October %I Association for Computational Linguistics %C Belgium, Brussels %F khayrallah-etal-2018-jhu %X This work describes our submission to the WMT18 Parallel Corpus Filtering shared task. We use a slightly modified version of the Zipporah Corpus Filtering toolkit (Xu and Koehn, 2017), which computes an adequacy score and a fluency score on a sentence pair, and use a weighted sum of the scores as the selection criteria. This work differs from Zipporah in that we experiment with using the noisy corpus to be filtered to compute the combination weights, and thus avoids generating synthetic data as in standard Zipporah. %R 10.18653/v1/W18-6479 %U https://aclanthology.org/W18-6479 %U https://doi.org/10.18653/v1/W18-6479 %P 896-899
Markdown (Informal)
[The JHU Parallel Corpus Filtering Systems for WMT 2018](https://aclanthology.org/W18-6479) (Khayrallah et al., WMT 2018)
- The JHU Parallel Corpus Filtering Systems for WMT 2018 (Khayrallah et al., WMT 2018)
ACL
- Huda Khayrallah, Hainan Xu, and Philipp Koehn. 2018. The JHU Parallel Corpus Filtering Systems for WMT 2018. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 896–899, Belgium, Brussels. Association for Computational Linguistics.