SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant?

Xudong Lu; Haohao Gao; Renshou Wu; Shuai Ren; Xiaoxin Chen; Hongsheng Li; Fangyuan Li

doi:10.18653/v1/2025.emnlp-main.194

SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant?

Xudong Lu, Haohao Gao, Renshou Wu, Shuai Ren, Xiaoxin Chen, Hongsheng Li, Fangyuan Li

Abstract

Large Language Models (LLMs) have become integral to daily life, especially advancing as intelligent assistants through on-device deployment on smartphones. However, existing LLM evaluation benchmarks predominantly focus on objective tasks like mathematics and coding in English, which do not necessarily reflect the practical use cases of on-device LLMs in real-world mobile scenarios, especially for Chinese users. To address these gaps, we introduce **SmartBench**, the first benchmark designed to evaluate the capabilities of on-device LLMs in Chinese mobile contexts. We analyze functionalities provided by representative smartphone manufacturers and divide them into five categories: text summarization, text Q&A, information extraction, content creation, and notification management, further detailed into 20 specific tasks. For each task, we construct high-quality datasets comprising 50 to 200 question-answer pairs that reflect everyday mobile interactions, and we develop automated evaluation criteria tailored for these tasks. We conduct comprehensive evaluations of on-device LLMs and MLLMs using SmartBench and also assess their performance after quantized deployment on real smartphone NPUs. Our contributions provide a standardized framework for evaluating on-device LLMs in Chinese, promoting further development and optimization in this critical area. Code and data will be available at https://github.com/vivo-ai-lab/SmartBench.

Anthology ID:: 2025.emnlp-main.194
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3906–3931
Language:
URL:: https://aclanthology.org/2025.emnlp-main.194/
DOI:: 10.18653/v1/2025.emnlp-main.194
Bibkey:
Cite (ACL):: Xudong Lu, Haohao Gao, Renshou Wu, Shuai Ren, Xiaoxin Chen, Hongsheng Li, and Fangyuan Li. 2025. SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant?. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 3906–3931, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant? (Lu et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.194.pdf
Checklist:: 2025.emnlp-main.194.checklist.pdf

PDF Cite Search Checklist Fix data