Vector-Vector-Matrix Architecture: A Novel Hardware-Aware Framework for Low-Latency Inference in NLP Applications

Vector-Vector-Matrix Architecture: A Novel Hardware-Aware Framework for Low-Latency Inference in NLP Applications Matthew Khoury author Rumen Dangovski author Longwu Ou author Preslav Nakov author Yichen Shen author Li Jing author 2020-11 text Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) Bonnie Webber editor Trevor Cohn editor Yulan He editor Yang Liu editor Association for Computational Linguistics Online conference publication khoury-etal-2020-vector 10.18653/v1/2020.emnlp-main.640 https://aclanthology.org/2020.emnlp-main.640/ 2020-11 7975 7984