Label-Synchronous Neural Transducer for E2E Simultaneous Speech Translation

Keqi Deng; Phil Woodland

doi:10.18653/v1/2024.acl-long.448

Label-Synchronous Neural Transducer for E2E Simultaneous Speech Translation

Abstract

While the neural transducer is popular for online speech recognition, simultaneous speech translation (SST) requires both streaming and re-ordering capabilities. This paper presents the LS-Transducer-SST, a label-synchronous neural transducer for SST, which naturally possesses these two properties. The LS-Transducer-SST dynamically decides when to emit translation tokens based on an Auto-regressive Integrate-and-Fire (AIF) mechanism. A latency-controllable AIF is also proposed, which can control the quality-latency trade-off either only during decoding, or it can be used in both decoding and training. The LS-Transducer-SST can naturally utilise monolingual text-only data via its prediction network which helps alleviate the key issue of data sparsity for E2E SST. During decoding, a chunk-based incremental joint decoding technique is designed to refine and expand the search space. Experiments on the Fisher-CallHome Spanish (Es-En) and MuST-C En-De data show that the LS-Transducer-SST gives a better quality-latency trade-off than existing popular methods. For example, the LS-Transducer-SST gives a 3.1/2.9 point BLEU increase (Es-En/En-De) relative to CAAT at a similar latency and a 1.4 s reduction in average lagging latency with similar BLEU scores relative to Wait-k.

Anthology ID:: 2024.luhme-long.448
Volume:: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8235–8251
Language:
URL:: https://aclanthology.org/2024.luhme-long.448/
DOI:: 10.18653/v1/2024.acl-long.448
Bibkey:
Cite (ACL):: Keqi Deng and Phil Woodland. 2024. Label-Synchronous Neural Transducer for E2E Simultaneous Speech Translation. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8235–8251, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: Label-Synchronous Neural Transducer for E2E Simultaneous Speech Translation (Deng & Woodland, ACL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.acl-long.448.pdf

PDF Cite Search Fix data