2025
pdf
bib
abs
APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model Prompts
Honghua Dong
|
Qidong Su
|
Yubo Gao
|
Zhaoyu Li
|
Yangjun Ruan
|
Gennady Pekhimenko
|
Chris J. Maddison
|
Xujie Si
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) have become increasingly capable of handling diverse tasks with the aid of well-crafted prompts and integration of external tools, but as task complexity rises, the workflow involving LLMs can be complicated and thus challenging to implement and maintain. To address this challenge, we propose APPL, A Prompt Programming Language that acts as a bridge between computer programs and LLMs, allowing seamless embedding of prompts into Python functions, and vice versa. APPL provides an intuitive and Python-native syntax, an efficient parallelized runtime with asynchronous semantics, and a tracing module supporting effective failure diagnosis and replaying without extra costs. We demonstrate that APPL programs are intuitive, concise, and efficient through representative scenarios including Chain-of-Thought with self-consistency (CoT-SC) and ReAct tool-use agent. We further use LLMs to judge the language design between APPL and previous work, where the results indicate that codes written in APPL are more readable and intuitive. Our code, tutorial and documentation are available at https://github.com/appl-team/appl.
pdf
bib
abs
PhysicsArena: The First Multimodal Physics Reasoning Benchmark Exploring Variable, Process, and Solution Dimensions
Song Dai
|
Yibo Yan
|
Jiamin Su
|
Zihao Dongfang
|
Yubo Gao
|
Yonghua Hei
|
Jungang Li
|
Junyan Zhang
|
Sicheng Tao
|
Zhuoran Gao
|
Xuming Hu
Findings of the Association for Computational Linguistics: EMNLP 2025
Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in diverse reasoning tasks, yet their application to complex physics reasoning remains underexplored. Physics reasoning presents unique challenges, requiring grounding in physical conditions and the interpretation of multimodal information. Current physics benchmarks are limited, often focusing on text-only inputs or solely on problem-solving, thereby overlooking the critical intermediate steps of variable identification and process formulation. To address these limitations, we introduce **PhysicsArena, the first multimodal physics reasoning benchmark designed to holistically evaluate MLLMs across three critical dimensions: variable identification, physical process formulation, and solution derivation.** PhysicsArena aims to provide a comprehensive platform for assessing and advancing the multimodal physics reasoning abilities of MLLMs.
pdf
bib
abs
Do BERT-Like Bidirectional Models Still Perform Better on Text Classification in the Era of LLMs?
Junyan Zhang
|
Yiming Huang
|
Shuliang Liu
|
Yubo Gao
|
Xuming Hu
Findings of the Association for Computational Linguistics: EMNLP 2025
The rapid adoption of LLMs has overshadowed the potential advantages of traditional BERT-like models in text classification. This study challenges the prevailing “LLM-centric” trend by systematically comparing three category methods, *i.e.,* BERT-like models fine-tuning, LLM internal state utilization, and LLM zero-shot inference across six challenging datasets. Our findings reveal that BERT-like models often outperform LLMs. We further categorize datasets into three types, perform PCA and probing experiments, and identify task-specific model strengths: BERT-like models excel in pattern-driven tasks, while LLMs dominate those requiring deep semantics or world knowledge. Subsequently, we conducted experiments on a broader range of text classification tasks to demonstrate the generalizability of our findings. We further investigated how the relative performance of different models varies under different levels of data availability. Finally, based on these findings, we propose **TaMAS**, a fine-grained task selection strategy, advocating for a nuanced, task-driven approach over a one-size-fits-all reliance on LLMs. Code is available at [https://github.com/jyzhang2002/TaMAS-TextClass](https://github.com/jyzhang2002/TaMAS-TextClass).