An Architecture for Accelerated Large-Scale Inference of Transformer-Based Language Models Amir Ganiev author Colton Chapin author Anderson De Andrade author Chen Liu author 2021-06 text Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers Young-bum Kim editor Yunyao Li editor Owen Rambow editor Association for Computational Linguistics Online conference publication ganiev-etal-2021-architecture 10.18653/v1/2021.naacl-industry.21 https://aclanthology.org/2021.naacl-industry.21/ 2021-06 163 169