Terminology-Aware Segmentation and Domain Feature for the WMT19 Biomedical Translation Task
Casimiro Pio Carrino, Bardia Rafieian, Marta R. Costa-jussà, José A. R. Fonollosa
Correct Metadata for
Abstract
In this work, we give a description of the TALP-UPC systems submitted for the WMT19 Biomedical Translation Task. Our proposed strategy is NMT model-independent and relies only on one ingredient, a biomedical terminology list. We first extracted such a terminology list by labelling biomedical words in our training dataset using the BabelNet API. Then, we designed a data preparation strategy to insert the terms information at a token level. Finally, we trained the Transformer model with this terms-informed data. Our best-submitted system ranked 2nd and 3rd for Spanish-English and English-Spanish translation directions, respectively.- Anthology ID:
- W19-5418
- Volume:
- Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)
- Month:
- August
- Year:
- 2019
- Address:
- Florence, Italy
- Editors:
- Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, André Martins, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Marco Turchi, Karin Verspoor
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 151–155
- Language:
- URL:
- https://aclanthology.org/W19-5418/
- DOI:
- 10.18653/v1/W19-5418
- Bibkey:
- Cite (ACL):
- Casimiro Pio Carrino, Bardia Rafieian, Marta R. Costa-jussà, and José A. R. Fonollosa. 2019. Terminology-Aware Segmentation and Domain Feature for the WMT19 Biomedical Translation Task. In Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2), pages 151–155, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- Terminology-Aware Segmentation and Domain Feature for the WMT19 Biomedical Translation Task (Carrino et al., WMT 2019)
- Copy Citation:
- PDF:
- https://aclanthology.org/W19-5418.pdf
Export citation
@inproceedings{carrino-etal-2019-terminology,
    title = "Terminology-Aware Segmentation and Domain Feature for the {WMT}19 Biomedical Translation Task",
    author = "Carrino, Casimiro Pio  and
      Rafieian, Bardia  and
      Costa-juss{\`a}, Marta R.  and
      Fonollosa, Jos{\'e} A. R.",
    editor = "Bojar, Ond{\v{r}}ej  and
      Chatterjee, Rajen  and
      Federmann, Christian  and
      Fishel, Mark  and
      Graham, Yvette  and
      Haddow, Barry  and
      Huck, Matthias  and
      Yepes, Antonio Jimeno  and
      Koehn, Philipp  and
      Martins, Andr{\'e}  and
      Monz, Christof  and
      Negri, Matteo  and
      N{\'e}v{\'e}ol, Aur{\'e}lie  and
      Neves, Mariana  and
      Post, Matt  and
      Turchi, Marco  and
      Verspoor, Karin",
    booktitle = "Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)",
    month = aug,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W19-5418/",
    doi = "10.18653/v1/W19-5418",
    pages = "151--155",
    abstract = "In this work, we give a description of the TALP-UPC systems submitted for the WMT19 Biomedical Translation Task. Our proposed strategy is NMT model-independent and relies only on one ingredient, a biomedical terminology list. We first extracted such a terminology list by labelling biomedical words in our training dataset using the BabelNet API. Then, we designed a data preparation strategy to insert the terms information at a token level. Finally, we trained the Transformer model with this terms-informed data. Our best-submitted system ranked 2nd and 3rd for Spanish-English and English-Spanish translation directions, respectively."
}<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="carrino-etal-2019-terminology">
    <titleInfo>
        <title>Terminology-Aware Segmentation and Domain Feature for the WMT19 Biomedical Translation Task</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Casimiro</namePart>
        <namePart type="given">Pio</namePart>
        <namePart type="family">Carrino</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Bardia</namePart>
        <namePart type="family">Rafieian</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Marta</namePart>
        <namePart type="given">R</namePart>
        <namePart type="family">Costa-jussà</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">José</namePart>
        <namePart type="given">A</namePart>
        <namePart type="given">R</namePart>
        <namePart type="family">Fonollosa</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2019-08</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Ondřej</namePart>
            <namePart type="family">Bojar</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Rajen</namePart>
            <namePart type="family">Chatterjee</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Christian</namePart>
            <namePart type="family">Federmann</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Mark</namePart>
            <namePart type="family">Fishel</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Yvette</namePart>
            <namePart type="family">Graham</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Barry</namePart>
            <namePart type="family">Haddow</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Matthias</namePart>
            <namePart type="family">Huck</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Antonio</namePart>
            <namePart type="given">Jimeno</namePart>
            <namePart type="family">Yepes</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Philipp</namePart>
            <namePart type="family">Koehn</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">André</namePart>
            <namePart type="family">Martins</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Christof</namePart>
            <namePart type="family">Monz</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Matteo</namePart>
            <namePart type="family">Negri</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Aurélie</namePart>
            <namePart type="family">Névéol</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Mariana</namePart>
            <namePart type="family">Neves</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Matt</namePart>
            <namePart type="family">Post</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Marco</namePart>
            <namePart type="family">Turchi</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Karin</namePart>
            <namePart type="family">Verspoor</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Florence, Italy</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>In this work, we give a description of the TALP-UPC systems submitted for the WMT19 Biomedical Translation Task. Our proposed strategy is NMT model-independent and relies only on one ingredient, a biomedical terminology list. We first extracted such a terminology list by labelling biomedical words in our training dataset using the BabelNet API. Then, we designed a data preparation strategy to insert the terms information at a token level. Finally, we trained the Transformer model with this terms-informed data. Our best-submitted system ranked 2nd and 3rd for Spanish-English and English-Spanish translation directions, respectively.</abstract>
    <identifier type="citekey">carrino-etal-2019-terminology</identifier>
    <identifier type="doi">10.18653/v1/W19-5418</identifier>
    <location>
        <url>https://aclanthology.org/W19-5418/</url>
    </location>
    <part>
        <date>2019-08</date>
        <extent unit="page">
            <start>151</start>
            <end>155</end>
        </extent>
    </part>
</mods>
</modsCollection>
%0 Conference Proceedings %T Terminology-Aware Segmentation and Domain Feature for the WMT19 Biomedical Translation Task %A Carrino, Casimiro Pio %A Rafieian, Bardia %A Costa-jussà, Marta R. %A Fonollosa, José A. R. %Y Bojar, Ondřej %Y Chatterjee, Rajen %Y Federmann, Christian %Y Fishel, Mark %Y Graham, Yvette %Y Haddow, Barry %Y Huck, Matthias %Y Yepes, Antonio Jimeno %Y Koehn, Philipp %Y Martins, André %Y Monz, Christof %Y Negri, Matteo %Y Névéol, Aurélie %Y Neves, Mariana %Y Post, Matt %Y Turchi, Marco %Y Verspoor, Karin %S Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2) %D 2019 %8 August %I Association for Computational Linguistics %C Florence, Italy %F carrino-etal-2019-terminology %X In this work, we give a description of the TALP-UPC systems submitted for the WMT19 Biomedical Translation Task. Our proposed strategy is NMT model-independent and relies only on one ingredient, a biomedical terminology list. We first extracted such a terminology list by labelling biomedical words in our training dataset using the BabelNet API. Then, we designed a data preparation strategy to insert the terms information at a token level. Finally, we trained the Transformer model with this terms-informed data. Our best-submitted system ranked 2nd and 3rd for Spanish-English and English-Spanish translation directions, respectively. %R 10.18653/v1/W19-5418 %U https://aclanthology.org/W19-5418/ %U https://doi.org/10.18653/v1/W19-5418 %P 151-155
Markdown (Informal)
[Terminology-Aware Segmentation and Domain Feature for the WMT19 Biomedical Translation Task](https://aclanthology.org/W19-5418/) (Carrino et al., WMT 2019)
- Terminology-Aware Segmentation and Domain Feature for the WMT19 Biomedical Translation Task (Carrino et al., WMT 2019)
ACL
- Casimiro Pio Carrino, Bardia Rafieian, Marta R. Costa-jussà, and José A. R. Fonollosa. 2019. Terminology-Aware Segmentation and Domain Feature for the WMT19 Biomedical Translation Task. In Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2), pages 151–155, Florence, Italy. Association for Computational Linguistics.