Plug, Play, and Fuse: Zero-Shot Joint Decoding via Word-Level Re-ranking across Diverse Vocabularies

Sai Koneru, Matthias Huck, Miriam Exel, Jan Niehues


Abstract
Recent advancements in NLP have resulted in models with specialized strengths, such as processing multimodal inputs or excelling in specific domains. However, real-world tasks, like multimodal translation, often require a combination of these strengths, such as handling both translation and image processing. While individual translation and vision models are powerful, they typically lack the ability to perform both tasks in a single system. Combining these models poses challenges, particularly due to differences in their vocabularies, which limit the effectiveness of traditional ensemble methods to post-generation techniques like N-best list re-ranking. In this work, we propose a novel zero-shot ensembling strategy that allows for the integration of different models during the decoding phase without the need for additional training. Our approach re-ranks beams during decoding by combining scores at the word level, using heuristics to predict when a word is completed. We demonstrate the effectiveness of this method in machine translation scenarios, showing that it enables the generation of translations that are both speech- and image-aware while also improving overall translation quality.
Anthology ID:
2024.wmt-1.133
Volume:
Proceedings of the Ninth Conference on Machine Translation
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1467–1481
Language:
URL:
https://aclanthology.org/2024.wmt-1.133
DOI:
Bibkey:
Cite (ACL):
Sai Koneru, Matthias Huck, Miriam Exel, and Jan Niehues. 2024. Plug, Play, and Fuse: Zero-Shot Joint Decoding via Word-Level Re-ranking across Diverse Vocabularies. In Proceedings of the Ninth Conference on Machine Translation, pages 1467–1481, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Plug, Play, and Fuse: Zero-Shot Joint Decoding via Word-Level Re-ranking across Diverse Vocabularies (Koneru et al., WMT 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.wmt-1.133.pdf