Beomseok Lee

2025

Speech Foundation Models and Crowdsourcing for Efficient, High-Quality Data Collection
Beomseok Lee | Marco Gaido | Ioan Calapodescu | Laurent Besacier | Matteo Negri
Proceedings of the 31st International Conference on Computational Linguistics

While crowdsourcing is an established solution for facilitating and scaling the collection of speech data, the involvement of non-experts necessitates protocols to ensure final data quality. To reduce the costs of these essential controls, this paper investigates the use of Speech Foundation Models (SFMs) to automate the validation process, examining for the first time the cost/quality trade-off in data acquisition. Experiments conducted on French, German, and Korean data demonstrate that SFM-based validation has the potential to reduce reliance on human validation, resulting in an estimated cost saving of over 40.0% without degrading final data quality. These findings open new opportunities for more efficient, cost-effective, and scalable speech data acquisition.

pdf bib abs

NAVER LABS Europe Submission to the Instruction-following Track
Beomseok Lee | Marcely Zanon Boito | Laurent Besacier | Ioan Calapodescu
Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025)

In this paper we describe NAVER LABS Europe submission to the instruction-following speech processing short track at IWSLT 2025. We participate in the constrained settings, developing systems that can simultaneously perform ASR, ST, and SQA tasks from English speech input into the following target languages: Chinese, Italian, and German. Our solution leverages two pretrained modules: (1) a speech-to-LLM embedding projector trained using representations from the SeamlessM4T-v2-large speech encoder; and (2) LoRA adapters trained on text data on top of Llama-3.1-8B-Instruct. These modules are jointly loaded and further instruction-tuned for 1K steps on multilingual and multimodal data to form our final system submitted for evaluation.

2024

pdf bib abs

XDetox: Text Detoxification with Token-Level Toxicity Explanations
Beomseok Lee | Hyunwoo Kim | Keon Kim | Yong Suk Choi
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Methods for mitigating toxic content through masking and infilling often overlook the decision-making process, leading to either insufficient or excessive modifications of toxic tokens. To address this challenge, we propose XDetox, a novel method that integrates token-level toxicity explanations with the masking and infilling detoxification process. We utilized this approach with two strategies to enhance the performance of detoxification. First, identifying toxic tokens to improve the quality of masking. Second, selecting the regenerated sentence by re-ranking the least toxic sentence among candidates. Our experimental results show state-of-the-art performance across four datasets compared to existing detoxification methods. Furthermore, human evaluations indicate that our method outperforms baselines in both fluency and toxicity reduction. These results demonstrate the effectiveness of our method in text detoxification.

2022

pdf bib abs

Language Model Augmented Monotonic Attention for Simultaneous Translation
Sathish Reddy Indurthi | Mohd Abbas Zaidi | Beomseok Lee | Nikhil Kumar Lakumarapu | Sangha Kim
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

The state-of-the-art adaptive policies for Simultaneous Neural Machine Translation (SNMT) use monotonic attention to perform read/write decisions based on the partial source and target sequences. The lack of sufficient information might cause the monotonic attention to take poor read/write decisions, which in turn negatively affects the performance of the SNMT model. On the other hand, human translators make better read/write decisions since they can anticipate the immediate future words using linguistic information and domain knowledge. In this work, we propose a framework to aid monotonic attention with an external language model to improve its decisions. Experiments on MuST-C English-German and English-French speech-to-text translation tasks show the future information from the language model improves the state-of-the-art monotonic multi-head attention model further.

2020

pdf bib abs

End-to-End Simultaneous Translation System for IWSLT2020 Using Modality Agnostic Meta-Learning
Hou Jeung Han | Mohd Abbas Zaidi | Sathish Reddy Indurthi | Nikhil Kumar Lakumarapu | Beomseok Lee | Sangha Kim
Proceedings of the 17th International Conference on Spoken Language Translation

In this paper, we describe end-to-end simultaneous speech-to-text and text-to-text translation systems submitted to IWSLT2020 online translation challenge. The systems are built by adding wait-k and meta-learning approaches to the Transformer architecture. The systems are evaluated on different latency regimes. The simultaneous text-to-text translation achieved a BLEU score of 26.38 compared to the competition baseline score of 14.17 on the low latency regime (Average latency ≤ 3). The simultaneous speech-to-text system improves the BLEU score by 7.7 points over the competition baseline for the low latency regime (Average Latency ≤ 1000).

pdf bib abs

End-to-End Offline Speech Translation System for IWSLT 2020 using Modality Agnostic Meta-Learning
Nikhil Kumar Lakumarapu | Beomseok Lee | Sathish Reddy Indurthi | Hou Jeung Han | Mohd Abbas Zaidi | Sangha Kim
Proceedings of the 17th International Conference on Spoken Language Translation

In this paper, we describe the system submitted to the IWSLT 2020 Offline Speech Translation Task. We adopt the Transformer architecture coupled with the meta-learning approach to build our end-to-end Speech-to-Text Translation (ST) system. Our meta-learning approach tackles the data scarcity of the ST task by leveraging the data available from Automatic Speech Recognition (ASR) and Machine Translation (MT) tasks. The meta-learning approach combined with synthetic data augmentation techniques improves the model performance significantly and achieves BLEU scores of 24.58, 27.51, and 27.61 on IWSLT test 2015, MuST-C test, and Europarl-ST test sets respectively.

Co-authors

Marcely Zanon Boito 1

Venues

Fix author