The AIC System for the WMT 2022 Unsupervised MT and Very Low Resource Supervised MT Task
Ahmad Shapiro, Mahmoud Salama, Omar Abdelhakim, Mohamed Fayed, Ayman Khalafallah, Noha Adly
Correct Metadata for
Abstract
This paper presents our submissions to WMT 22 shared task in the Unsupervised and Very Low Resource Supervised Machine Translation tasks. The task revolves around translating between German ↔ Upper Sorbian (de ↔ hsb), German ↔ Lower Sorbian (de ↔ dsb) and Upper Sorbian ↔ Lower Sorbian (hsb ↔ dsb) in both unsupervised and supervised manner. For the unsupervised system, we trained an unsupervised phrase-based statistical machine translation (UPBSMT) system on each pair independently. We pretrained a De-Salvic mBART model on the following languages Polish (pl), Czech (cs), German (de), Upper Sorbian (hsb), Lower Sorbian (dsb). We then fine-tuned our mBART on the synthetic parallel data generated by the (UPBSMT) model along with authentic parallel data (de ↔ pl, de ↔ cs). We further fine-tuned our unsupervised system on authentic parallel data (hsb ↔ dsb, de ↔ dsb, de ↔ hsb) to submit our supervised low-resource system.- Anthology ID:
- 2022.wmt-1.110
- Volume:
- Proceedings of the Seventh Conference on Machine Translation (WMT)
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates (Hybrid)
- Editors:
- Philipp Koehn, Loïc Barrault, Ondřej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Alexander Fraser, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Tom Kocmi, André Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri, Aurélie Névéol, Mariana Neves, Martin Popel, Marco Turchi, Marcos Zampieri
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1117–1121
- Language:
- URL:
- https://aclanthology.org/2022.wmt-1.110/
- DOI:
- 10.18653/v1/2022.wmt-1.110
- Bibkey:
- Cite (ACL):
- Ahmad Shapiro, Mahmoud Salama, Omar Abdelhakim, Mohamed Fayed, Ayman Khalafallah, and Noha Adly. 2022. The AIC System for the WMT 2022 Unsupervised MT and Very Low Resource Supervised MT Task. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 1117–1121, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
- Cite (Informal):
- The AIC System for the WMT 2022 Unsupervised MT and Very Low Resource Supervised MT Task (Shapiro et al., WMT 2022)
- Copy Citation:
- PDF:
- https://aclanthology.org/2022.wmt-1.110.pdf
Export citation
@inproceedings{shapiro-etal-2022-aic,
title = "The {AIC} System for the {WMT} 2022 Unsupervised {MT} and Very Low Resource Supervised {MT} Task",
author = "Shapiro, Ahmad and
Salama, Mahmoud and
Abdelhakim, Omar and
Fayed, Mohamed and
Khalafallah, Ayman and
Adly, Noha",
editor = {Koehn, Philipp and
Barrault, Lo{\"i}c and
Bojar, Ond{\v{r}}ej and
Bougares, Fethi and
Chatterjee, Rajen and
Costa-juss{\`a}, Marta R. and
Federmann, Christian and
Fishel, Mark and
Fraser, Alexander and
Freitag, Markus and
Graham, Yvette and
Grundkiewicz, Roman and
Guzman, Paco and
Haddow, Barry and
Huck, Matthias and
Jimeno Yepes, Antonio and
Kocmi, Tom and
Martins, Andr{\'e} and
Morishita, Makoto and
Monz, Christof and
Nagata, Masaaki and
Nakazawa, Toshiaki and
Negri, Matteo and
N{\'e}v{\'e}ol, Aur{\'e}lie and
Neves, Mariana and
Popel, Martin and
Turchi, Marco and
Zampieri, Marcos},
booktitle = "Proceedings of the Seventh Conference on Machine Translation (WMT)",
month = dec,
year = "2022",
address = "Abu Dhabi, United Arab Emirates (Hybrid)",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.wmt-1.110/",
doi = "10.18653/v1/2022.wmt-1.110",
pages = "1117--1121",
abstract = "This paper presents our submissions to WMT 22 shared task in the Unsupervised and Very Low Resource Supervised Machine Translation tasks. The task revolves around translating between German {\ensuremath{\leftrightarrow}} Upper Sorbian (de {\ensuremath{\leftrightarrow}} hsb), German {\ensuremath{\leftrightarrow}} Lower Sorbian (de {\ensuremath{\leftrightarrow}} dsb) and Upper Sorbian {\ensuremath{\leftrightarrow}} Lower Sorbian (hsb {\ensuremath{\leftrightarrow}} dsb) in both unsupervised and supervised manner. For the unsupervised system, we trained an unsupervised phrase-based statistical machine translation (UPBSMT) system on each pair independently. We pretrained a De-Salvic mBART model on the following languages Polish (pl), Czech (cs), German (de), Upper Sorbian (hsb), Lower Sorbian (dsb). We then fine-tuned our mBART on the synthetic parallel data generated by the (UPBSMT) model along with authentic parallel data (de {\ensuremath{\leftrightarrow}} pl, de {\ensuremath{\leftrightarrow}} cs). We further fine-tuned our unsupervised system on authentic parallel data (hsb {\ensuremath{\leftrightarrow}} dsb, de {\ensuremath{\leftrightarrow}} dsb, de {\ensuremath{\leftrightarrow}} hsb) to submit our supervised low-resource system."
}<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="shapiro-etal-2022-aic">
<titleInfo>
<title>The AIC System for the WMT 2022 Unsupervised MT and Very Low Resource Supervised MT Task</title>
</titleInfo>
<name type="personal">
<namePart type="given">Ahmad</namePart>
<namePart type="family">Shapiro</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Mahmoud</namePart>
<namePart type="family">Salama</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Omar</namePart>
<namePart type="family">Abdelhakim</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Mohamed</namePart>
<namePart type="family">Fayed</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Ayman</namePart>
<namePart type="family">Khalafallah</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Noha</namePart>
<namePart type="family">Adly</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<originInfo>
<dateIssued>2022-12</dateIssued>
</originInfo>
<typeOfResource>text</typeOfResource>
<relatedItem type="host">
<titleInfo>
<title>Proceedings of the Seventh Conference on Machine Translation (WMT)</title>
</titleInfo>
<name type="personal">
<namePart type="given">Philipp</namePart>
<namePart type="family">Koehn</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Loïc</namePart>
<namePart type="family">Barrault</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Ondřej</namePart>
<namePart type="family">Bojar</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Fethi</namePart>
<namePart type="family">Bougares</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Rajen</namePart>
<namePart type="family">Chatterjee</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Marta</namePart>
<namePart type="given">R</namePart>
<namePart type="family">Costa-jussà</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Christian</namePart>
<namePart type="family">Federmann</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Mark</namePart>
<namePart type="family">Fishel</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Alexander</namePart>
<namePart type="family">Fraser</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Markus</namePart>
<namePart type="family">Freitag</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Yvette</namePart>
<namePart type="family">Graham</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Roman</namePart>
<namePart type="family">Grundkiewicz</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Paco</namePart>
<namePart type="family">Guzman</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Barry</namePart>
<namePart type="family">Haddow</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Matthias</namePart>
<namePart type="family">Huck</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Antonio</namePart>
<namePart type="family">Jimeno Yepes</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Tom</namePart>
<namePart type="family">Kocmi</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">André</namePart>
<namePart type="family">Martins</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Makoto</namePart>
<namePart type="family">Morishita</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Christof</namePart>
<namePart type="family">Monz</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Masaaki</namePart>
<namePart type="family">Nagata</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Toshiaki</namePart>
<namePart type="family">Nakazawa</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Matteo</namePart>
<namePart type="family">Negri</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Aurélie</namePart>
<namePart type="family">Névéol</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Mariana</namePart>
<namePart type="family">Neves</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Martin</namePart>
<namePart type="family">Popel</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Marco</namePart>
<namePart type="family">Turchi</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Marcos</namePart>
<namePart type="family">Zampieri</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<originInfo>
<publisher>Association for Computational Linguistics</publisher>
<place>
<placeTerm type="text">Abu Dhabi, United Arab Emirates (Hybrid)</placeTerm>
</place>
</originInfo>
<genre authority="marcgt">conference publication</genre>
</relatedItem>
<abstract>This paper presents our submissions to WMT 22 shared task in the Unsupervised and Very Low Resource Supervised Machine Translation tasks. The task revolves around translating between German \ensuremathłeftrightarrow Upper Sorbian (de \ensuremathłeftrightarrow hsb), German \ensuremathłeftrightarrow Lower Sorbian (de \ensuremathłeftrightarrow dsb) and Upper Sorbian \ensuremathłeftrightarrow Lower Sorbian (hsb \ensuremathłeftrightarrow dsb) in both unsupervised and supervised manner. For the unsupervised system, we trained an unsupervised phrase-based statistical machine translation (UPBSMT) system on each pair independently. We pretrained a De-Salvic mBART model on the following languages Polish (pl), Czech (cs), German (de), Upper Sorbian (hsb), Lower Sorbian (dsb). We then fine-tuned our mBART on the synthetic parallel data generated by the (UPBSMT) model along with authentic parallel data (de \ensuremathłeftrightarrow pl, de \ensuremathłeftrightarrow cs). We further fine-tuned our unsupervised system on authentic parallel data (hsb \ensuremathłeftrightarrow dsb, de \ensuremathłeftrightarrow dsb, de \ensuremathłeftrightarrow hsb) to submit our supervised low-resource system.</abstract>
<identifier type="citekey">shapiro-etal-2022-aic</identifier>
<identifier type="doi">10.18653/v1/2022.wmt-1.110</identifier>
<location>
<url>https://aclanthology.org/2022.wmt-1.110/</url>
</location>
<part>
<date>2022-12</date>
<extent unit="page">
<start>1117</start>
<end>1121</end>
</extent>
</part>
</mods>
</modsCollection>
%0 Conference Proceedings %T The AIC System for the WMT 2022 Unsupervised MT and Very Low Resource Supervised MT Task %A Shapiro, Ahmad %A Salama, Mahmoud %A Abdelhakim, Omar %A Fayed, Mohamed %A Khalafallah, Ayman %A Adly, Noha %Y Koehn, Philipp %Y Barrault, Loïc %Y Bojar, Ondřej %Y Bougares, Fethi %Y Chatterjee, Rajen %Y Costa-jussà, Marta R. %Y Federmann, Christian %Y Fishel, Mark %Y Fraser, Alexander %Y Freitag, Markus %Y Graham, Yvette %Y Grundkiewicz, Roman %Y Guzman, Paco %Y Haddow, Barry %Y Huck, Matthias %Y Jimeno Yepes, Antonio %Y Kocmi, Tom %Y Martins, André %Y Morishita, Makoto %Y Monz, Christof %Y Nagata, Masaaki %Y Nakazawa, Toshiaki %Y Negri, Matteo %Y Névéol, Aurélie %Y Neves, Mariana %Y Popel, Martin %Y Turchi, Marco %Y Zampieri, Marcos %S Proceedings of the Seventh Conference on Machine Translation (WMT) %D 2022 %8 December %I Association for Computational Linguistics %C Abu Dhabi, United Arab Emirates (Hybrid) %F shapiro-etal-2022-aic %X This paper presents our submissions to WMT 22 shared task in the Unsupervised and Very Low Resource Supervised Machine Translation tasks. The task revolves around translating between German \ensuremathłeftrightarrow Upper Sorbian (de \ensuremathłeftrightarrow hsb), German \ensuremathłeftrightarrow Lower Sorbian (de \ensuremathłeftrightarrow dsb) and Upper Sorbian \ensuremathłeftrightarrow Lower Sorbian (hsb \ensuremathłeftrightarrow dsb) in both unsupervised and supervised manner. For the unsupervised system, we trained an unsupervised phrase-based statistical machine translation (UPBSMT) system on each pair independently. We pretrained a De-Salvic mBART model on the following languages Polish (pl), Czech (cs), German (de), Upper Sorbian (hsb), Lower Sorbian (dsb). We then fine-tuned our mBART on the synthetic parallel data generated by the (UPBSMT) model along with authentic parallel data (de \ensuremathłeftrightarrow pl, de \ensuremathłeftrightarrow cs). We further fine-tuned our unsupervised system on authentic parallel data (hsb \ensuremathłeftrightarrow dsb, de \ensuremathłeftrightarrow dsb, de \ensuremathłeftrightarrow hsb) to submit our supervised low-resource system. %R 10.18653/v1/2022.wmt-1.110 %U https://aclanthology.org/2022.wmt-1.110/ %U https://doi.org/10.18653/v1/2022.wmt-1.110 %P 1117-1121
Markdown (Informal)
[The AIC System for the WMT 2022 Unsupervised MT and Very Low Resource Supervised MT Task](https://aclanthology.org/2022.wmt-1.110/) (Shapiro et al., WMT 2022)
- The AIC System for the WMT 2022 Unsupervised MT and Very Low Resource Supervised MT Task (Shapiro et al., WMT 2022)
ACL
- Ahmad Shapiro, Mahmoud Salama, Omar Abdelhakim, Mohamed Fayed, Ayman Khalafallah, and Noha Adly. 2022. The AIC System for the WMT 2022 Unsupervised MT and Very Low Resource Supervised MT Task. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 1117–1121, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.