Tianduo Wang

2025

From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition
Tianduo Wang | Lu Xu | Wei Lu | Shanbo Cheng
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Recent advances in Automatic Speech Recognition (ASR) have been largely fueled by massive speech corpora. However, extending coverage to diverse languages with limited resources remains a formidable challenge. This paper introduces Speech Back-Translation, a a scalable pipeline that improves multilingual ASR models by converting large-scale text corpora into synthetic speech via off-the-shelf text-to-speech (TTS) models. We demonstrate that just tens of hours of real transcribed speech can effectively train TTS models to generate synthetic speech at hundreds of times the original volume while maintaining high quality. To evaluate synthetic speech quality, we develop an intelligibility-based assessment framework and establish clear thresholds for when synthetic data benefits ASR training. Using Speech Back-Translation, we generate more than 500,000 hours of synthetic speech in ten languages and continue pre-training Whisper-large-v3, achieving average transcription error reductions of over 30%. These results highlight the scalability and effectiveness of Speech Back-Translation for enhancing multilingual ASR systems.

2024

pdf bib abs

Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning
Tianduo Wang | Shichen Li | Wei Lu
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Teaching small-scale language models to perform math reasoning is a valuable yet challenging task. Besides obtaining labeled data from human experts, one of the most common ways to collect high-quality data is by sampling from a larger and more powerful language model. Although previous works have demonstrated the effectiveness of this method, such a knowledge distillation paradigm can be costly and unstable, especially considering that many large language models, such as GPT-4, are closed-sourced, proprietary, and their behaviors are unpredictable. In this work, to avoid relying on outputs from large models, we demonstrate that the reasoning abilities of small-scale language models can be enhanced through self-training, which involves training models with their own outputs. We also show that the vanilla self-training can be further augmented by an alignment algorithm, direct preference optimization (DPO). We empirically found that models trained with the DPO objective are capable of making better generations that largely benefit multi-turn self-training. The experiments show our models outperform the state-of-the-art models with comparable sizes on a series of downstream math reasoning tasks with minimal resource requirements.

2023

pdf bib abs

Learning Multi-Step Reasoning by Solving Arithmetic Tasks
Tianduo Wang | Wei Lu
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Mathematical reasoning is regarded as a necessary ability for Language Models (LMs). Recent works demonstrate large LMs’ impressive performance in solving math problems. The success is attributed to their Chain-of-Thought (CoT) reasoning abilities, i.e., the ability to decompose complex questions into step-by-step reasoning chains, but such ability seems only to emerge from models with abundant parameters. This work investigates how to incorporate relatively small LMs with the capabilities of multi-step reasoning. We propose to inject such abilities by continually pre-training LMs on a synthetic dataset MsAT which is composed of Multi-step Arithmetic Tasks. Our experiments on four math word problem datasets show the effectiveness of the proposed method in enhancing LMs’ math reasoning abilities.

2022

pdf bib abs

Differentiable Data Augmentation for Contrastive Sentence Representation Learning
Tianduo Wang | Wei Lu
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Fine-tuning a pre-trained language model via the contrastive learning framework with a large amount of unlabeled sentences or labeled sentence pairs is a common way to obtain high-quality sentence representations. Although the contrastive learning framework has shown its superiority on sentence representation learning over previous methods, the potential of such a framework is under-explored so far due to the simple method it used to construct positive pairs. Motivated by this, we propose a method that makes hard positives from the original training examples. A pivotal ingredient of our approach is the use of prefix that attached to a pre-trained language model, which allows for differentiable data augmentation during contrastive learning. Our method can be summarized in two steps: supervised prefix-tuning followed by joint contrastive fine-tuning with unlabeled or labeled examples. Our experiments confirm the effectiveness of our data augmentation approach. The proposed method yields significant improvements over existing methods under both semi-supervised and supervised settings. Our experiments under a low labeled data setting also show that our method is more label-efficient than the state-of-the-art contrastive learning methods.

Co-authors

Venues

ACL2
EMNLP2

Fix author