AppTek’s Submission to the IWSLT 2022 Isometric Spoken Language Translation Task

Patrick Wilken, Evgeny Matusov


Abstract
To participate in the Isometric Spoken Language Translation Task of the IWSLT 2022 evaluation, constrained condition, AppTek developed neural Transformer-based systems for English-to-German with various mechanisms of length control, ranging from source-side and target-side pseudo-tokens to encoding of remaining length in characters that replaces positional encoding. We further increased translation length compliance by sentence-level selection of length-compliant hypotheses from different system variants, as well as rescoring of N-best candidates from a single system. Length-compliant back-translated and forward-translated synthetic data, as well as other parallel data variants derived from the original MuST-C training corpus were important for a good quality/desired length trade-off. Our experimental results show that length compliance levels above 90% can be reached while minimizing losses in MT quality as measured in BERT and BLEU scores.
Anthology ID:
2022.iwslt-1.34
Volume:
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)
Month:
May
Year:
2022
Address:
Dublin, Ireland (in-person and online)
Venues:
ACL | IWSLT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
369–378
Language:
URL:
https://aclanthology.org/2022.iwslt-1.34
DOI:
10.18653/v1/2022.iwslt-1.34
Bibkey:
Cite (ACL):
Patrick Wilken and Evgeny Matusov. 2022. AppTek’s Submission to the IWSLT 2022 Isometric Spoken Language Translation Task. In Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022), pages 369–378, Dublin, Ireland (in-person and online). Association for Computational Linguistics.
Cite (Informal):
AppTek’s Submission to the IWSLT 2022 Isometric Spoken Language Translation Task (Wilken & Matusov, IWSLT 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.iwslt-1.34.pdf
Data
MuST-C