Danni Liu - ACL Anthology

Danni Liu

2025

PsyAdvisor: A Plug-and-Play Strategy Advice Planner with Proactive Questioning in Psychological Conversations
Yuxin Hu | Danni Liu | Bo Liu | Yida Chen | Jiuxin Cao | Yan Liu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Proactive questioning is essential in psychological conversations as it helps uncover deeper issues and unspoken concerns. Current psychological LLMs are constrained by passive response mechanisms, limiting their capacity to deploy proactive strategies for psychological counseling. To bridge this gap, we first develop the ProPsyC (Proactive Psychological Conversation) dataset, a multi-turn conversation dataset with interpretive labels including strategy decision logic and reaction attribution. Based on ProPsyC, we propose PsyAdvisor by supervised fine-tuning, a plug-and-play proactive questioning strategy planner that empowers psychological LLMs to initiate well-timed questioning through strategic prompting. Experimental results demonstrate that psychological LLMs integrated with PsyAdvisor substantially improve proactive questioning capacity, conversation depth, and response quality.Furthermore, PsyAdvisor shows promising potential in assisting novice counselors by providing strategy recommendations. This study provides new optimization directions for psychological conversation systems and offers valuable insights for future research on proactive questioning mechanisms in psychological LLMs.

Middle-Layer Representation Alignment for Cross-Lingual Transfer in Fine-Tuned LLMs
Danni Liu | Jan Niehues
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

While large language models demonstrate remarkable capabilities at task-specific applications through fine-tuning, extending these benefits across diverse languages is essential for broad accessibility. However, effective cross-lingual transfer is hindered by LLM performance gaps across languages and the scarcity of fine-tuning data in many languages. Through analysis of LLM internal representations from over 1,000+ language pairs, we discover that middle layers exhibit the strongest potential for cross-lingual alignment. Building on this finding, we propose a middle-layer alignment objective integrated into task-specific training. Our experiments on slot filling, machine translation, and structured text generation show consistent improvements in cross-lingual transfer, especially to lower-resource languages. The method is robust to the choice of alignment languages and generalizes to languages unseen during alignment. Furthermore, we show that separately trained alignment modules can be merged with existing task-specific modules, improving cross-lingual capabilities without full re-training. The code is provided in the supplementary materials.

KIT’s Low-resource Speech Translation Systems for IWSLT2025: System Enhancement with Synthetic Data and Model Regularization
Zhaolin Li | Yining Liu | Danni Liu | Tuan Nam Nguyen | Enes Yavuz Ugan | Tu Anh Dinh | Carlos Mullov | Alexander Waibel | Jan Niehues
Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025)

This paper presents KIT’s submissions to the IWSLT 2025 low-resource track. We develop both cascaded systems, consisting of Automatic Speech Recognition (ASR) and Machine Translation (MT) models, and end-to-end (E2E) Speech Translation (ST) systems for three language pairs: Bemba, North Levantine Arabic, and Tunisian Arabic into English. Building upon pre-trained models, we fine-tune our systems with different strategies to utilize resources efficiently. This study further explores system enhancement with synthetic data and model regularization. Specifically, we investigate MT-augmented ST by generating translations from ASR data using MT models. For North Levantine, which lacks parallel ST training data, a system trained solely on synthetic data slightly surpasses the cascaded system trained on real data. We also explore augmentation using text-to-speech models by generating synthetic speech from MT data, demonstrating the benefits of synthetic data in improving both ASR and ST performance for Bemba. Additionally, we apply intra-distillation to enhance model performance. Our experiments show that this approach consistently improves results across ASR, MT, and ST tasks, as well as across different pre-trained models. Finally, we apply Minimum Bayes Risk decoding to combine the cascaded and end-to-end systems, achieving an improvement of approximately 1.5 BLEU points.

Findings of the IWSLT 2025 Evaluation Campaign
Idris Abdulmumin | Victor Agostinelli | Tanel Alumäe | Antonios Anastasopoulos | Luisa Bentivogli | Ondřej Bojar | Claudia Borg | Fethi Bougares | Roldano Cattoni | Mauro Cettolo | Lizhong Chen | William Chen | Raj Dabre | Yannick Estève | Marcello Federico | Mark Fishel | Marco Gaido | Dávid Javorský | Marek Kasztelnik | Fortuné Kponou | Mateusz Krubiński | Tsz Kin Lam | Danni Liu | Evgeny Matusov | Chandresh Kumar Maurya | John P. McCrae | Salima Mdhaffar | Yasmin Moslem | Kenton Murray | Satoshi Nakamura | Matteo Negri | Jan Niehues | Atul Kr. Ojha | John E. Ortega | Sara Papi | Pavel Pecina | Peter Polák | Piotr Połeć | Ashwin Sankar | Beatrice Savoldi | Nivedita Sethiya | Claytone Sikasote | Matthias Sperber | Sebastian Stüker | Katsuhito Sudoh | Brian Thompson | Marco Turchi | Alex Waibel | Patrick Wilken | Rodolfo Zevallos | Vilém Zouhar | Maike Züfle
Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025)

This paper presents the outcomes of the shared tasks conducted at the 22nd International Workshop on Spoken Language Translation (IWSLT). The workshop addressed seven critical challenges in spoken language translation: simultaneous and offline translation, automatic subtitling and dubbing, model compression, speech-to-speech translation, dialect and low-resource speech translation, and Indic languages. The shared tasks garnered significant participation, with 32 teams submitting their runs. The field’s growing importance is reflected in the increasing diversity of shared task organizers and contributors to this overview paper, representing a balanced mix of industrial and academic institutions. This broad participation demonstrates the rising prominence of spoken language translation in both research and practical applications.

Conditions for Catastrophic Forgetting in Multilingual Translation
Danni Liu | Jan Niehues
Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025)

Fine-tuning multilingual foundation models on specific languages often induces catastrophic forgetting, degrading performance on languages unseen in fine-tuning. While this phenomenon is widely-documented, the literature presents fragmented results about when forgetting occurs. To address this ambiguity, we conduct a systematic empirical study using machine translation as a testbed to identify the conditions that trigger catastrophic forgetting in multilingual fine-tuning. Through controlled experiments across different model architectures, data scales, and fine-tuning approaches, we reveal that the relative scale between model and data size is a primary determinant of forgetting. Moreover, we demonstrate that a model’s instruction-following ability is more critical for retaining multilingual knowledge than its architecture. Contrary to assumptions, parameter-efficient fine-tuning offers no clear advantage over full fine-tuning in mitigating forgetting. Lastly, we show that cross-lingual alignment can mitigate forgetting while also facilitating positive transfer to unseen target languages.

How do Multimodal Foundation Models Encode Text and Speech? An Analysis of Cross-Lingual and Cross-Modal Representations
Hyunji Lee | Danni Liu | Supriti Sinhamahapatra | Jan Niehues
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)

Multimodal foundation models aim to create a unified representation space that abstracts away from surface features like language syntax or modality differences. To investigate this, we study the internal representations of three recent models, analyzing the model activations from semantically equivalent sentences across languages in the text and speech modalities. Our findings reveal that: 1) Cross-modal representations converge over model layers, except in the initial layers specialized at text and speech processing. 2) Length adaptation is crucial for reducing the cross-modal gap between text and speech, although current approaches’ effectiveness is primarily limited to high-resource languages. 3) Speech exhibits larger cross-lingual differences than text. 4) For models not explicitly trained for modality-agnostic representations, the modality gap is more prominent than the language gap.

2024

Evaluating and Training Long-Context Large Language Models for Question Answering on Scientific Papers
Lukas Hilgert | Danni Liu | Jan Niehues
Proceedings of the 1st Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U)

With the number of scientific papers published every year growing and current large language models (LLMs) showing state-of-the-art performance on natural language processing (NLP) tasks, we ask the question if LLMs could be utilized to answer questions on scientific papers.We investigate how well state-of-the-art large language models (LLMs) can answer questions on scientific paper by experimenting with long-context versions of the LLaMA 2 model and evaluating and training on the Qasper dataset.We analyze how well the LLMs handle longer papers and questions that can only be answered by accessing information from far out paragraphs. During our experiments, we see that the performance of these LLMs drops with growing length and position of relevant information.We employ different measures from simple prompts to chain-of-thought prompts and zero-shot usage to fine-tuning with QLoRA.While we still observe a performance loss with increased context length, our measures reduce the effects of this flaw, and we can achieve F₁ scores similar to bigger models like GPT-4.

How Transferable are Attribute Controllers on Pretrained Multilingual Translation Models?
Danni Liu | Jan Niehues
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Customizing machine translation models to comply with desired attributes (e.g., formality or grammatical gender) is a well-studied topic. However, most current approaches rely on (semi-)supervised data with attribute annotations. This data scarcity bottlenecks democratizing such customization possibilities to a wider range of languages, particularly lower-resource ones. This gap is out of sync with recent progress in pretrained massively multilingual translation models. In response, we transfer the attribute controlling capabilities to languages without attribute-annotated data with an NLLB-200 model as a foundation. Inspired by techniques from controllable generation, we employ a gradient-based inference-time controller to steer the pretrained model. The controller transfers well to zero-shot conditions, as it is operates on pretrained multilingual representations and is attribute- rather than language-specific. With a comprehensive comparison to finetuning-based control, we demonstrate that, despite finetuning’s clear dominance in supervised settings, the gap to inference-time control closes when moving to zero-shot conditions, especially with new and distant target languages. The latter also shows stronger domain robustness. We further show that our inference-time control complements finetuning. Moreover, a human evaluation on a real low-resource language, Bengali, confirms our findings. Our code is in the supplementary material.

Benchmarking Diffusion Models for Machine Translation
Yunus Demirag | Danni Liu | Jan Niehues
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop

Diffusion models have recently shown great potential on many generative tasks.In this work, we explore diffusion models for machine translation (MT).We adapt two prominent diffusion-based text generation models, Diffusion-LM and DiffuSeq, to perform machine translation.As the diffusion models generate non-autoregressively (NAR),we draw parallels to NAR machine translation models.With a comparison to conventional Transformer-based translation models, as well as to the Levenshtein Transformer,an established NAR MT model,we show that the multimodality problem that limits NAR machine translation performance is also a challenge to diffusion models.We demonstrate that knowledge distillation from an autoregressive model improves the performance of diffusion-based MT.A thorough analysis on the translation quality of inputs of different lengths shows that the diffusion models struggle more on long-range dependencies than other models.

With the rapid development of Large Language Models (LLMs), it is crucial to have benchmarks which can evaluate the ability of LLMs on different domains. One common use of LLMs is performing tasks on scientific topics, such as writing algorithms, querying databases or giving mathematical proofs. Inspired by the way university students are evaluated on such tasks, in this paper, we propose SciEx - a benchmark consisting of university computer science exam questions, to evaluate LLMs’ ability on solving scientific tasks. SciEx is (1) multilingual, containing both English and German exams, and (2) multi-modal, containing questions that involve images, and (3) contains various types of freeform questions with different difficulty levels, due to the nature of university exams. We evaluate the performance of various state-of-the-art LLMs on our new benchmark. Since SciEx questions are freeform, it is not straightforward to evaluate LLM performance. Therefore, we provide human expert grading of the LLM outputs on SciEx. We show that the free-form exams in SciEx remain challenging for the current LLMs, where the best LLM only achieves 59.4% exam grade on average. We also provide detailed comparisons between LLM performance and student performance on SciEx. To enable future evaluation of new LLMs, we propose using LLM-as-a-judge to grade the LLM answers on SciEx. Our experiments show that, although they do not perform perfectly on solving the exams, LLMs are decent as graders, achieving 0.948 Pearson correlation with expert grading.

Optimizing Rare Word Accuracy in Direct Speech Translation with a Retrieval-and-Demonstration Approach
Siqi Li | Danni Liu | Jan Niehues
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Direct speech translation (ST) models often struggle with rare words. Incorrect translation of these words can have severe consequences, impacting translation quality and user trust. While rare word translation is inherently challenging for neural models due to sparse learning signals, real-world scenarios often allow access to translations of past recordings on similar topics. To leverage these valuable resources, we propose a retrieval-and-demonstration approach to enhance rare word translation accuracy in direct ST models. First, we adapt existing ST models to incorporate retrieved examples for rare word translation, which allows the model to benefit from prepended examples, similar to in-context learning. We then develop a cross-modal (speech-to-speech, speech-to-text, text-to-text) retriever to locate suitable examples. We demonstrate that standard ST models can be effectively adapted to leverage examples for rare word translation, improving rare word translation accuracy over the baseline by 17.6% with gold examples and 8.5% with retrieved examples. Moreover, our speech-to-speech retrieval approach outperforms other modalities and exhibits higher robustness to unseen speakers. Our code is publicly available.

Blending LLMs into Cascaded Speech Translation: KIT’s Offline Speech Translation System for IWSLT 2024
Sai Koneru | Thai Binh Nguyen | Ngoc-Quan Pham | Danni Liu | Zhaolin Li | Alexander Waibel | Jan Niehues
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)

Large Language Models (LLMs) are currently under exploration for various tasks, including Automatic Speech Recognition (ASR), Machine Translation (MT), and even End-to-End Speech Translation (ST). In this paper, we present KIT’s offline submission in the constrained + LLM track by incorporating recently proposed techniques that can be added to any cascaded speech translation. Specifically, we integrate Mistral-7B into our system to enhance it in two ways. Firstly, we refine the ASR outputs by utilizing the N-best lists generated by our system and fine-tuning the LLM to predict the transcript accurately. Secondly, we refine the MT outputs at the document level by fine-tuning the LLM, leveraging both ASR and MT predictions to improve translation quality. We find that integrating the LLM into the ASR and MT systems results in an absolute improvement of 0.3% in Word Error Rate and 0.65% in COMET for tst2019 test set. In challenging test sets with overlapping speakers and background noise, we find that integrating LLM is not beneficial due to poor ASR performance. Here, we use ASR with chunked long-form decoding to improve context usage that may be unavailable when transcribing with Voice Activity Detection segmentation alone.

The KIT Speech Translation Systems for IWSLT 2024 Dialectal and Low-resource Track
Zhaolin Li | Enes Yavuz Ugan | Danni Liu | Carlos Mullov | Tu Anh Dinh | Sai Koneru | Alexander Waibel | Jan Niehues
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)

This paper presents KIT’s submissions to the IWSLT 2024 dialectal and low-resource track. In this work, we build systems for translating into English from speech in Maltese, Bemba, and two Arabic dialects Tunisian and North Levantine. Under the unconstrained condition, we leverage the pre-trained multilingual models by fine-tuning them for the target language pairs to address data scarcity problems in this track. We build cascaded and end-to-end speech translation systems for different language pairs and show the cascaded system brings slightly better overall performance. Besides, we find utilizing additional data resources boosts speech recognition performance but slightly harms machine translation performance in cascaded systems. Lastly, we show that Minimum Bayes Risk is effective in improving speech translation performance by combining the cascaded and end-to-end systems, bringing a consistent improvement of around 1 BLUE point.

Recent Highlights in Multilingual and Multimodal Speech Translation
Danni Liu | Jan Niehues
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)

Speech translation has witnessed significant progress driven by advancements in modeling techniques and the growing availability of training data. In this paper, we highlight recent advances in two ongoing research directions in ST: scaling the models to 1) many translation directions (multilingual ST) and 2) beyond the text output modality (multimodal ST). We structure this review by examining the sequential stages of a model’s development lifecycle: determining training resources, selecting model architecture, training procedures, evaluation metrics, and deployment considerations. We aim to highlight recent developments in each stage, with a particular focus on model architectures (dedicated speech translation models and LLM-based general-purpose model) and training procedures (task-specific vs. task-invariant approaches). Based on the reviewed advancements, we identify and discuss ongoing challenges within the field of speech translation.

Language-Independent Representations Improve Zero-Shot Summarization
Vladimir Solovyev | Danni Liu | Jan Niehues
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)

Finetuning pretrained models on downstream generation tasks often leads to catastrophic forgetting in zero-shot conditions. In this work, we focus on summarization and tackle the problem through the lens of language-independent representations. After training on monolingual summarization, we perform zero-shot transfer to new languages or language pairs. We first show naively finetuned models are highly language-specific in both output behavior and internal representations, resulting in poor zero-shot performance. Next, we propose query-key (QK) finetuning to decouple task-specific knowledge from the pretrained language generation abilities. Then, after showing downsides of the standard adversarial language classifier, we propose a balanced variant that more directly enforces language-agnostic representations. Moreover, our qualitative analyses show removing source language identity correlates to zero-shot summarization performance. Our code is openly available.

2023

End-to-End Evaluation for Low-Latency Simultaneous Speech Translation
Christian Huber | Tu Anh Dinh | Carlos Mullov | Ngoc-Quan Pham | Thai Binh Nguyen | Fabian Retkowski | Stefan Constantin | Enes Ugan | Danni Liu | Zhaolin Li | Sai Koneru | Jan Niehues | Alexander Waibel
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

The challenge of low-latency speech translation has recently draw significant interest in the research community as shown by several publications and shared tasks. Therefore, it is essential to evaluate these different approaches in realistic scenarios. However, currently only specific aspects of the systems are evaluated and often it is not possible to compare different approaches. In this work, we propose the first framework to perform and evaluate the various aspects of low-latency speech translation under realistic conditions. The evaluation is carried out in an end-to-end fashion. This includes the segmentation of the audio as well as the run-time of the different components. Secondly, we compare different approaches to low-latency speech translation using this framework. We evaluate models with the option to revise the output as well as methods with fixed output. Furthermore, we directly compare state-of-the-art cascaded as well as end-to-end systems. Finally, the framework allows to automatically evaluate the translation quality as well as latency and also provides a web interface to show the low-latency model outputs to the user.

KIT’s Multilingual Speech Translation System for IWSLT 2023
Danni Liu | Thai Binh Nguyen | Sai Koneru | Enes Yavuz Ugan | Ngoc-Quan Pham | Tuan Nam Nguyen | Tu Anh Dinh | Carlos Mullov | Alexander Waibel | Jan Niehues
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

Many existing speech translation benchmarks focus on native-English speech in high-quality recording conditions, which often do not match the conditions in real-life use-cases. In this paper, we describe our speech translation system for the multilingual track of IWSLT 2023, which focuses on the translation of scientific conference talks. The test condition features accented input speech and terminology-dense contents. The tasks requires translation into 10 languages of varying amounts of resources. In absence of training data from the target domain, we use a retrieval-based approach (kNN-MT) for effective adaptation (+0.8 BLEU for speech translation). We also use adapters to easily integrate incremental training data from data augmentation, and show that it matches the performance of re-training. We observe that cascaded systems are more easily adaptable towards specific target domains, due to their separate modules. Our cascaded speech system outperforms its end-to-end counterpart on scientific talk translation, although their performance remains similar on TED talks.

Towards Efficient Simultaneous Speech Translation: CUNI-KIT System for Simultaneous Track at IWSLT 2023
Peter Polák | Danni Liu | Ngoc-Quan Pham | Jan Niehues | Alexander Waibel | Ondřej Bojar
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

In this paper, we describe our submission to the Simultaneous Track at IWSLT 2023. This year, we continue with the successful setup from the last year, however, we adopt the latest methods that further improve the translation quality. Additionally, we propose a novel online policy for attentional encoder-decoder models. The policy prevents the model to generate translation beyond the current speech input by using an auxiliary CTC output layer. We show that the proposed simultaneous policy can be applied to both streaming blockwise models and offline encoder-decoder models. We observe significant improvements in quality (up to 1.1 BLEU) and the computational footprint (up to 45% relative RTF).

2022

Effective combination of pretrained models - KIT@IWSLT2022
Ngoc-Quan Pham | Tuan Nam Nguyen | Thai-Binh Nguyen | Danni Liu | Carlos Mullov | Jan Niehues | Alexander Waibel
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

Pretrained models in acoustic and textual modalities can potentially improve speech translation for both Cascade and End-to-end approaches. In this evaluation, we aim at empirically looking for the answer by using the wav2vec, mBART50 and DeltaLM models to improve text and speech translation models. The experiments showed that the presence of these models together with an advanced audio segmentation method results in an improvement over the previous end-to-end system by up to 7 BLEU points. More importantly, the experiments showed that given enough data and modeling capacity to overcome the training difficulty, we can outperform even very competitive Cascade systems. In our experiments, this gap can be as large as 2.0 BLEU points, the same gap that the Cascade often led over the years.

CUNI-KIT System for Simultaneous Speech Translation Task at IWSLT 2022
Peter Polák | Ngoc-Quan Pham | Tuan Nam Nguyen | Danni Liu | Carlos Mullov | Jan Niehues | Ondřej Bojar | Alexander Waibel
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

In this paper, we describe our submission to the Simultaneous Speech Translation at IWSLT 2022. We explore strategies to utilize an offline model in a simultaneous setting without the need to modify the original model. In our experiments, we show that our onlinization algorithm is almost on par with the offline setting while being 3x faster than offline in terms of latency on the test set. We also show that the onlinized offline model outperforms the best IWSLT2021 simultaneous system in medium and high latency regimes and is almost on par in the low latency regime. We make our system publicly available.

Learning an Artificial Language for Knowledge-Sharing in Multilingual Translation
Danni Liu | Jan Niehues
Proceedings of the Seventh Conference on Machine Translation (WMT)

The cornerstone of multilingual neural translation is shared representations across languages. Given the theoretically infinite representation power of neural networks, semantically identical sentences are likely represented differently. While representing sentences in the continuous latent space ensures expressiveness, it introduces the risk of capturing of irrelevant features which hinders the learning of a common representation. In this work, we discretize the encoder output latent space of multilingual models by assigning encoder states to entries in a codebook,which in effect represents source sentences in a new artificial language. This discretization process not only offers a new way to interpret the otherwise black-box model representations,but, more importantly, gives potential for increasing robustness in unseen testing conditions. We validate our approach on large-scale experiments with realistic data volumes and domains. When tested in zero-shot conditions, our approach is competitive with two strong alternatives from the literature. We also use the learned artificial language to analyze model behavior, and discover that using a similar bridge language increases knowledge-sharing among the remaining languages.

2021

Improving Zero-Shot Translation by Disentangling Positional Information
Danni Liu | Jan Niehues | James Cross | Francisco Guzmán | Xian Li
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Multilingual neural machine translation has shown the capability of directly translating between language pairs unseen in training, i.e. zero-shot translation. Despite being conceptually attractive, it often suffers from low output quality. The difficulty of generalizing to new translation directions suggests the model representations are highly specific to those language pairs seen in training. We demonstrate that a main factor causing the language-specific representations is the positional correspondence to input tokens. We show that this can be easily alleviated by removing residual connections in an encoder layer. With this modification, we gain up to 18.5 BLEU points on zero-shot translation while retaining quality on supervised directions. The improvements are particularly prominent between related languages, where our proposed model outperforms pivot-based translation. Moreover, our approach allows easy integration of new languages, which substantially expands translation coverage. By thorough inspections of the hidden layer outputs, we show that our approach indeed leads to more language-independent representations.

Unsupervised Machine Translation On Dravidian Languages
Sai Koneru | Danni Liu | Jan Niehues
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

Unsupervised Neural Machine translation (UNMT) is beneficial especially for under-resourced languages such as from the Dravidian family. They learn to translate between the source and target, relying solely on only monolingual corpora. However, UNMT systems fail in scenarios that occur often when dealing with low resource languages. Recent works have achieved state-of-the-art results by adding auxiliary parallel data with similar languages. In this work, we focus on unsupervised translation between English and Kannada by using limited amounts of auxiliary data between English and other Dravidian languages. We show that transliteration is essential in unsupervised translation between Dravidian languages, as they do not share a common writing system. We explore several model architectures that use the auxiliary data in order to maximize knowledge sharing and enable UNMT for dissimilar language pairs. We show from our experiments it is crucial for Kannada and reference languages to be similar. Further, we propose a method to measure language similarity to choose the most beneficial reference languages.

Maastricht University’s Multilingual Speech Translation System for IWSLT 2021
Danni Liu | Jan Niehues
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

This paper describes Maastricht University’s participation in the IWSLT 2021 multilingual speech translation track. The task in this track is to build multilingual speech translation systems in supervised and zero-shot directions. Our primary system is an end-to-end model that performs both speech transcription and translation. We observe that the joint training for the two tasks is complementary especially when the speech translation data is scarce. On the source and target side, we use data augmentation and pseudo-labels respectively to improve the performance of our systems. We also introduce an ensembling technique that consistently improves the quality of transcriptions and translations. The experiments show that the end-to-end system is competitive with its cascaded counterpart especially in zero-shot conditions.

Maastricht University’s Large-Scale Multilingual Machine Translation System for WMT 2021
Danni Liu | Jan Niehues
Proceedings of the Sixth Conference on Machine Translation

We present our development of the multilingual machine translation system for the large-scale multilingual machine translation task at WMT 2021. Starting form the provided baseline system, we investigated several techniques to improve the translation quality on the target subset of languages. We were able to significantly improve the translation quality by adapting the system towards the target subset of languages and by generating synthetic data using the initial model. Techniques successfully applied in zero-shot multilingual machine translation (e.g. similarity regularizer) only had a minor effect on the final translation performance.

2020

Adapting End-to-End Speech Recognition for Readable Subtitles
Danni Liu | Jan Niehues | Gerasimos Spanakis
Proceedings of the 17th International Conference on Spoken Language Translation

Automatic speech recognition (ASR) systems are primarily evaluated on transcription accuracy. However, in some use cases such as subtitling, verbatim transcription would reduce output readability given limited screen size and reading time. Therefore, this work focuses on ASR with output compression, a task challenging for supervised approaches due to the scarcity of training data. We first investigate a cascaded system, where an unsupervised compression model is used to post-edit the transcribed speech. We then compare several methods of end-to-end speech recognition under output length constraints. The experiments show that with limited data far less than needed for training a model from scratch, we can adapt a Transformer-based ASR model to incorporate both transcription and compression capabilities. Furthermore, the best performance in terms of WER and ROUGE scores is achieved by explicitly modeling the length constraints within the end-to-end ASR system.

Co-authors

Tuan-Nam Nguyen 4

Thai-Binh Nguyen 4

Ondřej Bojar 3

Enes Yavuz Ugan 3

Idris Abdulmumin 1

Victor Agostinelli 1

Tanel Alumäe 1

Antonios Anastasopoulos 1

Michael Beigl 1

Luisa Bentivogli 1

Fethi Bougares 1

Leonard Bärmann 1

Klemens Böhm 1

Roldano Cattoni 1

Mauro Cettolo 1

Stefan Constantin 1

Carsten Dachsbacher 1

Yunus Demirag 1

Yannick Estève 1

Marcello Federico 1

Francisco Guzmán 1

Lukas Hilgert 1

Christian Huber 1

Dávid Javorský 1

Marek Kasztelnik 1

Fortuné Kponou 1

Mateusz Krubiński 1

Nathan Lerzer 1

Evgeny Matusov 1

Chandresh Kumar Maurya 1

John Philip McCrae 1

Salima Mdhaffar 1

Yasmin Moslem 1

Kenton Murray 1

Satoshi Nakamura 1

Atul Kr. Ojha 1

John E. Ortega 1

Fabian Peller-Konrad 1

Piotr Połeć 1

Fabian Retkowski 1

Tobias Röddiger 1

Ashwin Sankar 1

Beatrice Savoldi 1

Nivedita Sethiya 1

Claytone Sikasote 1

Supriti Sinhamahapatra 1

Vladimir Solovyev 1

Gerasimos Spanakis 1

Matthias Sperber 1

Rainer Stiefelhagen 1

Sebastian Stüker 1

Katsuhito Sudoh 1

Brian Thompson 1

Patrick Wilken 1

Rodolfo Zevallos 1

Vilém Zouhar 1

Venues

DravidianLangTech1