Implicit Memory Transformer for Computationally Efficient Simultaneous Speech Translation

Matthew Raffel; Lizhong Chen

doi:10.18653/v1/2023.findings-acl.816

Implicit Memory Transformer for Computationally Efficient Simultaneous Speech Translation

Abstract

Simultaneous speech translation is an essential communication task difficult for humans whereby a translation is generated concurrently with oncoming speech inputs. For such a streaming task, transformers using block processing to break an input sequence into segments have achieved state-of-the-art performance at a reduced cost. Current methods to allow information to propagate across segments, including left context and memory banks, have faltered as they are both insufficient representations and unnecessarily expensive to compute. In this paper, we propose an Implicit Memory Transformer that implicitly retains memory through a new left context method, removing the need to explicitly represent memory with memory banks. We generate the left context from the attention output of the previous segment and include it in the keys and values of the current segment’s attention calculation. Experiments on the MuST-C dataset show that the Implicit Memory Transformer provides a substantial speedup on the encoder forward pass with nearly identical translation quality when compared with the state-of-the-art approach that employs both left context and memory banks.

Anthology ID:: 2023.findings-acl.816
Volume:: Findings of the Association for Computational Linguistics: ACL 2023
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12900–12907
Language:
URL:: https://aclanthology.org/2023.findings-acl.816/
DOI:: 10.18653/v1/2023.findings-acl.816
Bibkey:
Cite (ACL):: Matthew Raffel and Lizhong Chen. 2023. Implicit Memory Transformer for Computationally Efficient Simultaneous Speech Translation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 12900–12907, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Implicit Memory Transformer for Computationally Efficient Simultaneous Speech Translation (Raffel & Chen, Findings 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.findings-acl.816.pdf
Video:: https://aclanthology.org/2023.findings-acl.816.mp4

PDF Cite Search Video Fix data