Bridging the Granularity Gap for Acoustic Modeling

Chen Xu, Yuhao Zhang, Chengbo Jiao, Xiaoqian Liu, Chi Hu, Xin Zeng, Tong Xiao, Anxiang Ma, Huizhen Wang, Jingbo Zhu


Abstract
While Transformer has become the de-facto standard for speech, modeling upon the fine-grained frame-level features remains an open challenge of capturing long-distance dependencies and distributing the attention weights. We propose Progressive Down-Sampling (PDS) which gradually compresses the acoustic features into coarser-grained units containing more complete semantic information, like text-level representation. In addition, we develop a representation fusion method to alleviate information loss that occurs inevitably during high compression. In this way, we compress the acoustic features into 1/32 of the initial length while achieving better or comparable performances on the speech recognition task. And as a bonus, it yields inference speedups ranging from 1.20x to 1.47x.By reducing the modeling burden, we also achieve competitive results when training on the more challenging speech translation task.
Anthology ID:
2023.findings-acl.688
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10816–10833
Language:
URL:
https://aclanthology.org/2023.findings-acl.688
DOI:
10.18653/v1/2023.findings-acl.688
Bibkey:
Cite (ACL):
Chen Xu, Yuhao Zhang, Chengbo Jiao, Xiaoqian Liu, Chi Hu, Xin Zeng, Tong Xiao, Anxiang Ma, Huizhen Wang, and Jingbo Zhu. 2023. Bridging the Granularity Gap for Acoustic Modeling. In Findings of the Association for Computational Linguistics: ACL 2023, pages 10816–10833, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Bridging the Granularity Gap for Acoustic Modeling (Xu et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.688.pdf