Speech Sense Disambiguation: Tackling Homophone Ambiguity in End-to-End Speech Translation

Tengfei Yu, Xuebo Liu, Liang Ding, Kehai Chen, Dacheng Tao, Min Zhang


Abstract
End-to-end speech translation (ST) presents notable disambiguation challenges as it necessitates simultaneous cross-modal and cross-lingual transformations. While word sense disambiguation is an extensively investigated topic in textual machine translation, the exploration of disambiguation strategies for ST models remains limited. Addressing this gap, this paper introduces the concept of speech sense disambiguation (SSD), specifically emphasizing homophones - words pronounced identically but with different meanings. To facilitate this, we first create a comprehensive homophone dictionary and an annotated dataset rich with homophone information established based on speech-text alignment. Building on this unique dictionary, we introduce AmbigST, an innovative homophone-aware contrastive learning approach that integrates a homophone-aware masking strategy. Our experiments on different MuST-C and CoVoST ST benchmarks demonstrate that AmbigST sets new performance standards. Specifically, it achieves SOTA results on BLEU scores for English to German, Spanish, and French ST tasks, underlining its effectiveness in reducing speech sense ambiguity. Data, code and scripts are freely available at https://github.com/ytf-philp/AmbigST.
Anthology ID:
2024.acl-long.435
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8020–8035
Language:
URL:
https://aclanthology.org/2024.acl-long.435
DOI:
Bibkey:
Cite (ACL):
Tengfei Yu, Xuebo Liu, Liang Ding, Kehai Chen, Dacheng Tao, and Min Zhang. 2024. Speech Sense Disambiguation: Tackling Homophone Ambiguity in End-to-End Speech Translation. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8020–8035, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Speech Sense Disambiguation: Tackling Homophone Ambiguity in End-to-End Speech Translation (Yu et al., ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-long.435.pdf