Amin Farajian
2024
Tower v2: Unbabel-IST 2024 Submission for the General MT Shared Task
Ricardo Rei
|
Jose Pombal
|
Nuno M. Guerreiro
|
João Alves
|
Pedro Henrique Martins
|
Patrick Fernandes
|
Helena Wu
|
Tania Vaz
|
Duarte Alves
|
Amin Farajian
|
Sweta Agrawal
|
Antonio Farinhas
|
José G. C. De Souza
|
André Martins
Proceedings of the Ninth Conference on Machine Translation
In this work, we present Tower v2, an improved iteration of the state-of-the-art open-weight Tower models, and the backbone of our submission to the WMT24 General Translation shared task. Tower v2 introduces key improvements including expanded language coverage, enhanced data quality, and increased model capacity up to 70B parameters. Our final submission combines these advancements with quality-aware decoding strategies, selecting translations based on multiple translation quality signals. The resulting system demonstrates significant improvement over previous versions, outperforming closed commercial systems like GPT-4o, Claude 3.5, and DeepL even at a smaller 7B scale.
Findings of the WMT 2024 Shared Task on Chat Translation
Wafaa Mohammed
|
Sweta Agrawal
|
Amin Farajian
|
Vera Cabarrão
|
Bryan Eikema
|
Ana C Farinha
|
José G. C. De Souza
Proceedings of the Ninth Conference on Machine Translation
This paper presents the findings from the third edition of the Chat Translation Shared Task. As with previous editions, the task involved translating bilingual customer support conversations, specifically focusing on the impact of conversation context in translation quality and evaluation. We also include two new language pairs: English-Korean and English-Dutch, in addition to the set of language pairs from previous editions: English-German, English-French, and English-Brazilian Portuguese.We received 22 primary submissions and 32 contrastive submissions from eight teams, with each language pair having participation from at least three teams. We evaluated the systems comprehensively using both automatic metrics and human judgments via a direct assessment framework.The official rankings for each language pair were determined based on human evaluation scores, considering performance in both translation directions—agent and customer. Our analysis shows that while the systems excelled at translating individual turns, there is room for improvement in overall conversation-level translation quality.