Centroid-Based Efficient Minimum Bayes Risk Decoding

Hiroyuki Deguchi, Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe, Hideki Tanaka, Masao Utiyama


Abstract
Minimum Bayes risk (MBR) decoding achieved state-of-the-art translation performance by using COMET, a neural metric that has a high correlation with human evaluation.However, MBR decoding requires quadratic time since it computes the expected score between a translation hypothesis and all reference translations.We propose centroid-based MBR (CBMBR) decoding to improve the speed of MBR decoding.Our method clusters the reference translations in the feature space, and then calculates the score using the centroids of each cluster.The experimental results show that our CBMBR not only improved the decoding speed of the expected score calculation 5.7 times, but also outperformed vanilla MBR decoding in translation quality by up to 0.5 COMET in the WMT’22 EnJa, EnDe, EnZh, and WMT’23 EnJa translation tasks.
Anthology ID:
2024.findings-acl.654
Volume:
Findings of the Association for Computational Linguistics ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand and virtual meeting
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11009–11018
Language:
URL:
https://aclanthology.org/2024.findings-acl.654
DOI:
Bibkey:
Cite (ACL):
Hiroyuki Deguchi, Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe, Hideki Tanaka, and Masao Utiyama. 2024. Centroid-Based Efficient Minimum Bayes Risk Decoding. In Findings of the Association for Computational Linguistics ACL 2024, pages 11009–11018, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
Centroid-Based Efficient Minimum Bayes Risk Decoding (Deguchi et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-acl.654.pdf