Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Linear Transformers with Learnable Kernel Functions are Better In-Context Models Yaroslav Aksenov author Nikita Balagansky author Sofia Lo Cicero Vaina author Boris Shaposhnikov author Alexey Gorbatovski author Daniil Gavrilov author 2024-08 text Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Lun-Wei Ku editor Andre Martins editor Vivek Srikumar editor Association for Computational Linguistics Bangkok, Thailand conference publication aksenov-etal-2024-linear 10.18653/v1/2024.acl-long.518 https://aclanthology.org/2024.acl-long.518/ 2024-08 9584 9597