Dmitriy Genzel


2021

pdf bib
Improving Speech Translation by Understanding and Learning from the Auxiliary Text Translation Task
Yun Tang | Juan Pino | Xian Li | Changhan Wang | Dmitriy Genzel
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Pretraining and multitask learning are widely used to improve the speech translation performance. In this study, we are interested in training a speech translation model along with an auxiliary text translation task. We conduct a detailed analysis to understand the impact of the auxiliary task on the primary task within the multitask learning framework. Our analysis confirms that multitask learning tends to generate similar decoder representations from different modalities and preserve more information from the pretrained text translation modules. We observe minimal negative transfer effect between the two tasks and sharing more parameters is helpful to transfer knowledge from the text task to the speech task. The analysis also reveals that the modality representation difference at the top decoder layers is still not negligible, and those layers are critical for the translation quality. Inspired by these findings, we propose three methods to improve translation quality. First, a parameter sharing and initialization strategy is proposed to enhance information sharing between the tasks. Second, a novel attention-based regularization is proposed for the encoders and pulls the representations from different modalities closer. Third, an online knowledge distillation is proposed to enhance the knowledge transfer from the text to the speech task. Our experiments show that the proposed approach improves translation performance by more than 2 BLEU over a strong baseline and achieves state-of-the-art results on the MuST-C English-German, English-French and English-Spanish language pairs.

2020

pdf bib
Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 2: User Track)
Janice Campbell | Dmitriy Genzel | Ben Huyck | Patricia O’Neill-Brown
Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 2: User Track)

bib
Machine Translation quality across demographic dialectal variation in Social Media.
Adithya Renduchintala | Dmitriy Genzel
Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 2: User Track)

pdf bib
TICO-19: the Translation Initiative for COvid-19
Antonios Anastasopoulos | Alessandro Cattelan | Zi-Yi Dou | Marcello Federico | Christian Federmann | Dmitriy Genzel | Franscisco Guzmán | Junjie Hu | Macduff Hughes | Philipp Koehn | Rosie Lazar | Will Lewis | Graham Neubig | Mengmeng Niu | Alp Öktem | Eric Paquin | Grace Tang | Sylwia Tur
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020

The COVID-19 pandemic is the worst pandemic to strike the world in over a century. Crucial to stemming the tide of the SARS-CoV-2 virus is communicating to vulnerable populations the means by which they can protect themselves. To this end, the collaborators forming the Translation Initiative for COvid-19 (TICO-19) have made test and development data available to AI and MT researchers in 35 different languages in order to foster the development of tools and resources for improving access to information about COVID-19 in these languages. In addition to 9 high-resourced, ”pivot” languages, the team is targeting 26 lesser resourced languages, in particular languages of Africa, South Asia and South-East Asia, whose populations may be the most vulnerable to the spread of the virus. The same data is translated into all of the languages represented, meaning that testing or development can be done for any pairing of languages in the set. Further, the team is converting the test and development data into translation memories (TMXs) that can be used by localizers from and to any of the languages.

2010

pdf bib
“Poetic” Statistical Machine Translation: Rhyme and Meter
Dmitriy Genzel | Jakob Uszkoreit | Franz Och
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Automatically Learning Source-side Reordering Rules for Large Scale Machine Translation
Dmitriy Genzel
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

2009

pdf bib
Creating a High-Quality Machine Translation System for a Low-Resource Language: Yiddish
Dmitriy Genzel | Klaus Macherey | Jakob Uszkoreit
Proceedings of Machine Translation Summit XII: Papers

2005

pdf bib
Inducing a Multilingual Dictionary from a Parallel Multitext in Related Languages
Dmitriy Genzel
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

2003

pdf bib
Variation of Entropy and Parse Trees of Sentences as a Function of the Sentence Number
Dmitriy Genzel | Eugene Charniak
Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing

2002

pdf bib
Entropy Rate Constancy in Text
Dmitriy Genzel | Eugene Charniak
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics