Bokai Yu
2024
Added Toxicity Mitigation at Inference Time for Multimodal and Massively Multilingual Translation
Marta Costa-jussà
|
David Dale
|
Maha Elbayad
|
Bokai Yu
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)
Machine translation models sometimes lead to added toxicity: translated outputs may contain more toxic content that the original input. In this paper, we introduce MinTox, a novel pipeline to automatically identify and mitigate added toxicity at inference time, without further model training. MinTox leverages a multimodal (speech and text) toxicity classifier that can scale across languages.We demonstrate the capabilities of MinTox when applied to SEAMLESSM4T, a multi-modal and massively multilingual machine translation system. MinTox significantly reduces added toxicity: across all domains, modalities and language directions, 25% to95% of added toxicity is successfully filtered out, while preserving translation quality
Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue Agents
Bandhav Veluri
|
Benjamin N Peloquin
|
Bokai Yu
|
Hongyu Gong
|
Shyamnath Gollakota
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Despite broad interest in modeling spoken dialogue agents, most approaches are inherently “half-duplex” – restricted to turn-based interaction with responses requiring explicit prompting by the user or implicit tracking of interruption or silence events. Human dialogue, by contrast, is “full-duplex” allowing for rich synchronicity in the form of quick and dynamic turn-taking, overlapping speech, and backchanneling. Technically, the challenge of achieving full-duplex dialogue with LLMs lies in modeling synchrony as pre-trained LLMs do not have a sense of “time”. To bridge this gap, we propose Synchronous LLMs for full-duplex spoken dialogue modeling. We design a novel mechanism to integrate time information into Llama3-8b so that they run synchronously with the real-world clock. We also introduce a training recipe that uses 212k hours of synthetic spoken dialogue data generated from text dialogue data to create a model that generates meaningful and natural spoken dialogue, with just 2k hours of real-world spoken dialogue data. Synchronous LLMs outperform state-of-the-art in dialogue meaningfulness while maintaining naturalness. Finally, we demonstrate the model’s ability to participate in full-duplex dialogue by simulating interaction between two agents trained on different datasets, while considering Internet-scale latencies of up to 240 ms.
Search
Fix data
Co-authors
- Marta Costa-jussà 1
- David Dale 1
- Maha Elbayad 1
- Shyamnath Gollakota 1
- Hongyu Gong 1
- show all...