Fengxian Ji
2026
FineState-Bench: Benchmarking State-Conditioned Grounding for Fine-grained GUI State Setting
Fengxian Ji | Jingpu Yang | Zirui Song | Yuanxi Wang | Zhexuan Cui | Yuke Li | Qian Jiang | Xiuying Chen
Findings of the Association for Computational Linguistics: ACL 2026
Fengxian Ji | Jingpu Yang | Zirui Song | Yuanxi Wang | Zhexuan Cui | Yuke Li | Qian Jiang | Xiuying Chen
Findings of the Association for Computational Linguistics: ACL 2026
Despite the rapid progress of large vision-language models (LVLMs), fine-grained, state-conditioned GUI interaction remains challenging. Current evaluations offer limited coverage, imprecise target-state definitions, and an overreliance on final-task success, obscuring where and why agents fail.To address this gap, we introduce FineState-Bench, a benchmark that evaluates whether an agent can correctly ground an instruction to the intended UI control and reach the exact target state.FineState-Bench comprises 2,209 instances across desktop, web, and mobile platforms, spanning four interaction families and 23 UI component types, with each instance explicitly specifying an exact target state for fine-grained state setting.We further propose FineState-Metrics, a four-stage diagnostic pipeline with stage-wise success rates: Localization Success Rate (SR@Loc), Interaction Success Rate (SR@Int), Exact State Success Rate at Locate (ES-SR@Loc), and Exact State Success Rate at Interact (ES-SR@Int), and a plug-and-play Visual Diagnostic Assistant (VDA) that generates a Description and a bounding-box Localization Hint to diagnose visual grounding reason via controlled w/ vs. w/o comparisons.On FineState-Bench, exact goal-state success remains low: ES-SR@Int peaks at 32.8% on Web and 22.8% on average across platforms. With VDA localization hints, Gemini-2.5-Flash gains +14.9 ES-SR@Int points, suggesting substantial headroom from improved visual grounding, yet overall accuracy is still insufficient for reliable fine-grained state-conditioned interaction Github.
ServImage: An Image Generation and Editing Benchmark from Real-world Commercial Imaging Services
Fengxian Ji | Jingpu Yang | Zirui Song | Lang Gao | Junhong Liang | Zhenhao Chen | Jinghui Zhang | Xiuying Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Fengxian Ji | Jingpu Yang | Zirui Song | Lang Gao | Junhong Liang | Zhenhao Chen | Jinghui Zhang | Xiuying Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent image generation and editing models demonstrate robust adherence to instructions and high visual quality on academic benchmarks.However, their performance on paid, real-world design projects remains uncertain. We introduce ServImage, a benchmark that explicitly correlates model outputs with economic value in commercial design projects. ServImage consists of (i) ServImageBench: a dataset of 1.07k paid commercial design tasks and 2.05k designer deliverables totaling over $295k, covering portrait, product, and digital content, along with 33k candidate images and 33k human annotations.(ii) ServImageScore: an integrated scoring system that combines three quality dimensions: baseline requirements fulfilment, visual execution quality, and commercial necessity satisfaction. These three dimensions are designed to characterize the factors that drive human payment decisions and indicate whether an image is commercially acceptable.(iii) ServImageModel: under this scoring system, we propose a payment prediction model trained on the human-annotated candidate images, achieving 82.00% accuracy in predicting human payment decisions and producing calibrated payment probabilities.ServImage provides a comprehensive foundation for assessing the commercial viability of image generation models and offers a scalable resource for future research on economically grounded vision systems Github.
FinReporting: An Agentic Workflow for Localized Reporting of Cross-Jurisdiction Financial Disclosure
Fan Zhang | Mingzi Song | Rania Elbadry | Yankai Chen | Shaobo Wang | Yixi Zhou | Xunwen Zheng | Yueru He | Yuyang Dai | Georgi Nenkov Georgiev | Ayesha Gull | Muhammad Usman Safder | Fan Wu | Liyuan Meng | Fengxian Ji | Junning Zhao | Xueqing Peng | Jimin Huang | YU Chen | Xue Liu | Preslav Nakov | Zhuohan Xie
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Fan Zhang | Mingzi Song | Rania Elbadry | Yankai Chen | Shaobo Wang | Yixi Zhou | Xunwen Zheng | Yueru He | Yuyang Dai | Georgi Nenkov Georgiev | Ayesha Gull | Muhammad Usman Safder | Fan Wu | Liyuan Meng | Fengxian Ji | Junning Zhao | Xueqing Peng | Jimin Huang | YU Chen | Xue Liu | Preslav Nakov | Zhuohan Xie
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Financial reporting systems increasingly leverage Large Language Models (LLMs) to extract and summarize corporate disclosures. However, most existing approaches assume a single-market setting and overlook structural differences across jurisdictions. Variations in accounting taxonomies, tagging infrastructures (e.g., XBRL vs. PDF), and aggregation conventions introduce substantial challenges for semantic alignment and reliable verification. Here, we aim to bridge this gap. We present FinReporting, an agentic workflow for localized cross-jurisdiction financial reporting. The system constructs a unified canonical ontology spanning the income statement, balance sheet, and cash flow statement, and decomposes reporting into auditable stages, including filing acquisition, extraction, canonical mapping, and anomaly logging. Rather than treating LLMs as free-form generators, FinReporting employs them as constrained verifiers operating under explicit decision rules with evidence grounding.Evaluated on annual filings from the USA, Japan, and China, FinReporting improves consistency and reliability under heterogeneous reporting regimes. We further release an interactive demo that enables cross-market inspection and supports structured export of localized financial statements. Our demo is available at https://huggingface.co/spaces/BoomQ/FinReporting-Demo. A video describing our system is available at https://www.youtube.com/watch?v=f65jdEL31Kk.
Search
Fix author
Co-authors
- Xiuying Chen 2
- Zirui Song 2
- Jingpu Yang 2
- YU Chen 1
- Yankai Chen 1
- Zhenhao Chen 1
- Zhexuan Cui 1
- Yuyang Dai 1
- Rania Elbadry 1
- Lang Gao 1
- Georgi Nenkov Georgiev 1
- Ayesha Gull 1
- Yueru He 1
- Jimin Huang 1
- Qian Jiang 1
- Yuke Li 1
- Junhong Liang 1
- Xue Liu 1
- Liyuan Meng 1
- Preslav Nakov 1
- Xueqing Peng 1
- Muhammad Usman Safder 1
- Mingzi Song 1
- Shaobo Wang 1
- Yuanxi Wang 1
- Fan Wu 1
- Zhuohan Xie 1
- Fan Zhang 1
- Jinghui Zhang 1
- Junning Zhao 1
- Xunwen Zheng 1
- Yixi Zhou 1