TCRA-LLM: Token Compression Retrieval Augmented Large Language Model for Inference Cost Reduction Junyi Liu author Liangzhi Li author Tong Xiang author Bowen Wang author Yiming Qian author 2023-12 text Findings of the Association for Computational Linguistics: EMNLP 2023 Houda Bouamor editor Juan Pino editor Kalika Bali editor Association for Computational Linguistics Singapore conference publication liu-etal-2023-tcra 10.18653/v1/2023.findings-emnlp.655 https://aclanthology.org/2023.findings-emnlp.655/ 2023-12 9796 9810