Arkady Arkhangorodsky
2025
Command-A-Translate: Raising the Bar of Machine Translation with Difficulty Filtering
Tom Kocmi | Arkady Arkhangorodsky | Alexandre Berard | Phil Blunsom | Samuel Cahyawijaya | Théo Dehaze | Marzieh Fadaee | Nicholas Frosst | Matthias Galle | Aidan Gomez | Nithya Govindarajan | Wei-Yin Ko | Julia Kreutzer | Kelly Marchisio | Ahmet Üstün | Sebastian Vincent | Ivan Zhang
Proceedings of the Tenth Conference on Machine Translation
Tom Kocmi | Arkady Arkhangorodsky | Alexandre Berard | Phil Blunsom | Samuel Cahyawijaya | Théo Dehaze | Marzieh Fadaee | Nicholas Frosst | Matthias Galle | Aidan Gomez | Nithya Govindarajan | Wei-Yin Ko | Julia Kreutzer | Kelly Marchisio | Ahmet Üstün | Sebastian Vincent | Ivan Zhang
Proceedings of the Tenth Conference on Machine Translation
We present Command A Translate, an LLMbased machine translation model built off Cohere’s Command A. It reaches state-of-the-art machine translation quality via direct preference optimization. Our meticulously designed data preparation pipeline emphasizes robust quality control and a novel difficulty filtering – a key innovation that distinguishes Command A Translate. Furthermore, we extend our model and participate at WMT with a system (CommandA-WMT) that uses two models and post-editing steps of step-by-step reasoning and limited Minimum Bayes Risk decoding.
2021
MeetDot: Videoconferencing with Live Translation Captions
Arkady Arkhangorodsky | Christopher Chu | Scot Fang | Yiqi Huang | Denglin Jiang | Ajay Nagesh | Boliang Zhang | Kevin Knight
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Arkady Arkhangorodsky | Christopher Chu | Scot Fang | Yiqi Huang | Denglin Jiang | Ajay Nagesh | Boliang Zhang | Kevin Knight
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
We present MeetDot, a videoconferencing system with live translation captions overlaid on screen. The system aims to facilitate conversation between people who speak different languages, thereby reducing communication barriers between multilingual participants. Currently, our system supports speech and captions in 4 languages and combines automatic speech recognition (ASR) and machine translation (MT) in a cascade. We use the re-translation strategy to translate the streamed speech, resulting in caption flicker. Additionally, our system has very strict latency requirements to have acceptable call quality. We implement several features to enhance user experience and reduce their cognitive load, such as smooth scrolling captions and reducing caption flicker. The modular architecture allows us to integrate different ASR and MT services in our backend. Our system provides an integrated evaluation suite to optimize key intrinsic evaluation metrics such as accuracy, latency and erasure. Finally, we present an innovative cross-lingual word-guessing game as an extrinsic evaluation metric to measure end-to-end system performance. We plan to make our system open-source for research purposes.
Findings of the 2021 Conference on Machine Translation (WMT21)
Farhad Akhbardeh | Arkady Arkhangorodsky | Magdalena Biesialska | Ondřej Bojar | Rajen Chatterjee | Vishrav Chaudhary | Marta R. Costa-jussa | Cristina España-Bonet | Angela Fan | Christian Federmann | Markus Freitag | Yvette Graham | Roman Grundkiewicz | Barry Haddow | Leonie Harter | Kenneth Heafield | Christopher Homan | Matthias Huck | Kwabena Amponsah-Kaakyire | Jungo Kasai | Daniel Khashabi | Kevin Knight | Tom Kocmi | Philipp Koehn | Nicholas Lourie | Christof Monz | Makoto Morishita | Masaaki Nagata | Ajay Nagesh | Toshiaki Nakazawa | Matteo Negri | Santanu Pal | Allahsera Auguste Tapo | Marco Turchi | Valentin Vydrin | Marcos Zampieri
Proceedings of the Sixth Conference on Machine Translation
Farhad Akhbardeh | Arkady Arkhangorodsky | Magdalena Biesialska | Ondřej Bojar | Rajen Chatterjee | Vishrav Chaudhary | Marta R. Costa-jussa | Cristina España-Bonet | Angela Fan | Christian Federmann | Markus Freitag | Yvette Graham | Roman Grundkiewicz | Barry Haddow | Leonie Harter | Kenneth Heafield | Christopher Homan | Matthias Huck | Kwabena Amponsah-Kaakyire | Jungo Kasai | Daniel Khashabi | Kevin Knight | Tom Kocmi | Philipp Koehn | Nicholas Lourie | Christof Monz | Makoto Morishita | Masaaki Nagata | Ajay Nagesh | Toshiaki Nakazawa | Matteo Negri | Santanu Pal | Allahsera Auguste Tapo | Marco Turchi | Valentin Vydrin | Marcos Zampieri
Proceedings of the Sixth Conference on Machine Translation
This paper presents the results of the newstranslation task, the multilingual low-resourcetranslation for Indo-European languages, thetriangular translation task, and the automaticpost-editing task organised as part of the Con-ference on Machine Translation (WMT) 2021.In the news task, participants were asked tobuild machine translation systems for any of10 language pairs, to be evaluated on test setsconsisting mainly of news stories. The taskwas also opened up to additional test suites toprobe specific aspects of translation.
2020
DiDi Labs’ End-to-end System for the IWSLT 2020 Offline Speech TranslationTask
Arkady Arkhangorodsky | Yiqi Huang | Amittai Axelrod
Proceedings of the 17th International Conference on Spoken Language Translation
Arkady Arkhangorodsky | Yiqi Huang | Amittai Axelrod
Proceedings of the 17th International Conference on Spoken Language Translation
This paper describes the system that was submitted by DiDi Labs to the offline speech translation task for IWSLT 2020. We trained an end-to-end system that translates audio from English TED talks to German text, without producing intermediate English text. We use the S-Transformer architecture and train using the MuSTC dataset. We also describe several additional experiments that were attempted, but did not yield improved results.
Search
Fix author
Co-authors
- Yiqi Huang 2
- Kevin Knight 2
- Tom Kocmi 2
- Ajay Nagesh 2
- Farhad Akhbardeh 1
- Kwabena Amponsah-Kaakyire 1
- Amittai Axelrod 1
- Magdalena Biesialska 1
- Phil Blunsom 1
- Ondřej Bojar 1
- Alexandre Bérard 1
- Samuel Cahyawijaya 1
- Rajen Chatterjee 1
- Vishrav Chaudhary 1
- Christopher Chu 1
- Marta R. Costa-jussà 1
- Théo Dehaze 1
- Cristina España-Bonet 1
- Marzieh Fadaee 1
- Angela Fan 1
- Scot Fang 1
- Christian Federmann 1
- Markus Freitag 1
- Nicholas Frosst 1
- Matthias Gallé 1
- Aidan Gomez 1
- Nithya Govindarajan 1
- Yvette Graham 1
- Roman Grundkiewicz 1
- Barry Haddow 1
- Leonie Harter 1
- Kenneth Heafield 1
- Christopher Homan 1
- Matthias Huck 1
- Denglin Jiang 1
- Jungo Kasai 1
- Daniel Khashabi 1
- Wei-Yin Ko 1
- Philipp Koehn 1
- Julia Kreutzer 1
- Nicholas Lourie 1
- Kelly Marchisio 1
- Christof Monz 1
- Makoto Morishita 1
- Masaaki Nagata 1
- Toshiaki Nakazawa 1
- Matteo Negri 1
- Santanu Pal 1
- Allahsera Auguste Tapo 1
- Marco Turchi 1
- Sebastian Vincent 1
- Valentin Vydrin 1
- Marcos Zampieri 1
- Boliang Zhang 1
- Ivan Zhang 1
- Ahmet Üstün 1