Speeding up Transformer Decoding via an Attention Refinement Network

Speeding up Transformer Decoding via an Attention Refinement Network Kaixin Wu author Yue Zhang author Bojie Hu author Tong Zhang author 2022-10 text Proceedings of the 29th International Conference on Computational Linguistics Nicoletta Calzolari editor Chu-Ren Huang editor Hansaem Kim editor James Pustejovsky editor Leo Wanner editor Key-Sun Choi editor Pum-Mo Ryu editor Hsin-Hsi Chen editor Lucia Donatelli editor Heng Ji editor Sadao Kurohashi editor Patrizia Paggio editor Nianwen Xue editor Seokhwan Kim editor Younggyun Hahm editor Zhong He editor Tony Kyungil Lee editor Enrico Santus editor Francis Bond editor Seung-Hoon Na editor International Committee on Computational Linguistics Gyeongju, Republic of Korea conference publication wu-etal-2022-speeding https://aclanthology.org/2022.coling-1.453/ 2022-10 5109 5118