Marjan Slavkovski


2023

pdf bib
Gatekeeper to save COGS and improve efficiency of Text Prediction
Nidhi Tiwari | Sneha Kola | Milos Milunovic | Si-qing Chen | Marjan Slavkovski
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track

The text prediction (TP) workflow calls a Large Language Model (LLM), almost, after every character to get subsequent sequence of characters, till user accepts a suggestion. The confidence score of the prediction is commonly used for filtering the results to ensure that only correct predictions are shown to user. As LLMs require massive amounts of computation and storage, such an approach incurs network and high execution cost. So, we propose a Model gatekeeper (GK) to stop the LLM calls that will result in incorrect predictions at client application level itself. This way a GK can save cost of model inference and improve user experience by not showing the incorrect predictions. We demonstrate that use of a model gatekeeper saved approx 46.6% of COGS for TP, at the cost of approx 4.5% loss in character saving. Use of GK also improved the efficiency (suggestion rate) of TP model by 73%.