Mitra: Improving Terminologically Constrained Translation Quality with Backtranslations and Flag Diacritics

Iikka Hauhio, Théo Friberg


Abstract
Terminologically constrained machine translation is a hot topic in the field of neural machine translation. One major way to categorize constrained translation methods is to divide them into “hard” constraints that are forced into the target language sentence using a special decoding algorithm, and “soft” constraints that are included in the input given to the model.We present a constrained translation pipeline that combines soft and hard constraints while being completely model-agnostic, i.e. our method can be used with any NMT or LLM model. In the “soft” part, we substitute the source language terms in the input sentence for the backtranslations of their target language equivalents. This causes the source sentence to be more similar to the intended translation, thus making it easier to translate for the model. In the “hard” part, we use a novel nondeterministic finite state transducer-based (NDFST) constraint recognition algorithm utilizing flag diacritics to force the model to use the desired target language terms.We test our model with both Finnish–English and English–Finnish real-world vocabularies. We find that our methods consistently improve the translation quality when compared to previous constrained decoding algorithms, while the improvement over unconstrained translations depends on the familiarity of the model over the subject vocabulary and the quality of the vocabulary.
Anthology ID:
2024.eamt-1.12
Volume:
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)
Month:
June
Year:
2024
Address:
Sheffield, UK
Editors:
Carolina Scarton, Charlotte Prescott, Chris Bayliss, Chris Oakley, Joanna Wright, Stuart Wrigley, Xingyi Song, Edward Gow-Smith, Rachel Bawden, Víctor M Sánchez-Cartagena, Patrick Cadwell, Ekaterina Lapshinova-Koltunski, Vera Cabarrão, Konstantinos Chatzitheodorou, Mary Nurminen, Diptesh Kanojia, Helena Moniz
Venue:
EAMT
SIG:
Publisher:
European Association for Machine Translation (EAMT)
Note:
Pages:
100–115
Language:
URL:
https://aclanthology.org/2024.eamt-1.12
DOI:
Bibkey:
Cite (ACL):
Iikka Hauhio and Théo Friberg. 2024. Mitra: Improving Terminologically Constrained Translation Quality with Backtranslations and Flag Diacritics. In Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1), pages 100–115, Sheffield, UK. European Association for Machine Translation (EAMT).
Cite (Informal):
Mitra: Improving Terminologically Constrained Translation Quality with Backtranslations and Flag Diacritics (Hauhio & Friberg, EAMT 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.eamt-1.12.pdf