Yuchi Wang

2025

pdf bib abs
Rethinking Semantic Parsing for Large Language Models: Enhancing LLM Performance with Semantic Hints
Kaikai An | Shuzheng Si | Helan Hu | Haozhe Zhao | Yuchi Wang | Qingyan Guo | Baobao Chang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Semantic Parsing aims to capture the meaning of a sentence and convert it into a logical, structured form. Previous studies show that semantic parsing enhances the performance of smaller models (e.g., BERT) on downstream tasks. However, it remains unclear whether the improvements extend similarly to LLMs. In this paper, our empirical findings reveal that, unlike smaller models, directly adding semantic parsing results into LLMs reduces their performance. To overcome this, we propose SENSE, a novel prompting approach that embeds semantic hints within the prompt. Experiments show that SENSE consistently improves LLMs’ performance across various tasks, highlighting the potential of integrating semantic information to improve LLM capabilities.

pdf bib abs
Modeling Interactions Between Stocks Using LLM-Enhanced Graphs for Volume Prediction
Zhiyu Xu | Yi Liu | Yuchi Wang | Ruihan Bao | Keiko Harimoto | Xu Sun
Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal)

Accurate trading volume prediction is essential for portfolio optimization, market regulation, and financial risk control. An effective method for predicting trading volume involves building a graph to model relations between stock. Recent research has enhanced these models by integrating stock news to improve forecasting ability. However, existing approaches primarily integrate news data as auxiliary features for nodes in Graph Neural Networks (GNNs), overlooking the relational information between stocks embedded in news. To address this, we propose LLM-Enhanced Dynamic Graph Neural Network (LED-GNN), a framework that constructs dynamic graphs using inter-stock relationships extracted from news via a large language model (LLM)-centered pipeline, combined with graphs learned from historical price-volume data. A dynamic GNN then processes these graphs to generate predictions. Evaluated on a real-world dataset, TOPIX, with Reuters Financial News, LED-GNN consistently outperformed all baseline models, achieving a 2% improvement over the strongest baseline.

pdf bib abs
Proxy Tuning for Financial Sentiment Analysis: Overcoming Data Scarcity and Computational Barriers
Yuxiang Wang | Yuchi Wang | Yi Liu | Ruihan Bao | Keiko Harimoto | Xu Sun
Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal)

Financial sentiment analysis plays a pivotal role in the financial domain. However, the task remains challenging due to the nuanced nature of financial sentiment, the need for high interpretability, and the scarcity of high-quality datasets. To address these issues, we leverage recent advancements in large language models (LLMs) and propose to adapt proxy tuning for financial sentiment analysis. Proxy tuning efficiently transfers knowledge from a pre-trained expert model to a controllable base model by incorporating logit differences, steering the base model toward the desired sentiment representation. Our method offers significant advantages: (1) it is training-free, reducing computational demands and data dependency; (2) it achieves promising performance, with a 36.67% improvement over the base model and over 90% of the tuned model’s performance; and (3) it is highly adaptable, functioning in a plug-and-play manner without requiring access to model architectures or weights. These results demonstrate the potential of proxy tuning as an efficient and practical solution for financial sentiment analysis in data-scarce scenarios.

2024

We present PCA-Bench, a multimodal decision-making benchmark for evaluating the integrated capabilities of Multimodal Large Language Models (MLLMs). Departing from previous benchmarks focusing on simplistic tasks and individual model capability, PCA-Bench introduces three complex scenarios: autonomous driving, domestic robotics, and open-world games. Given task instructions and diverse contexts, the model is required to seamlessly integrate multiple capabilities of Perception, Cognition, and Action in a reasoning chain to make accurate decisions. Moreover, PCA-Bench features error localization capabilities, scrutinizing model inaccuracies in areas such as perception, knowledge, or reasoning. This enhances the reliability of deploying MLLMs. To balance accuracy and efficiency in evaluation, we propose PCA-Eval, an automatic evaluation protocol, and assess 10 prevalent MLLMs. The results reveal significant performance disparities between open-source models and powerful proprietary models like GPT-4 Vision. To address this, we introduce Embodied-Instruction-Evolution (EIE), an automatic framework for synthesizing instruction tuning examples in multimodal embodied environments. EIE generates 7,510 training examples in PCA-Bench and enhances the performance of open-source MLLMs, occasionally surpassing GPT-4 Vision (+3% in decision accuracy), thereby validating the effectiveness of EIE. Our findings suggest that robust MLLMs like GPT4-Vision show promise for decision-making in embodied agents, opening new avenues for MLLM research. All benchmark data and evaluation code are made public.

pdf bib abs
LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?
Yuchi Wang | Shuhuai Ren | Rundong Gao | Linli Yao | Qingyan Guo | Kaikai An | Jianhong Bai | Xu Sun
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Diffusion models have exhibited remarkable capabilities in text-to-image generation. However, their performance in image-to-text generation, specifically image captioning, has lagged behind Auto-Regressive (AR) models, casting doubt on their applicability for such tasks. In this work, we revisit diffusion models, highlighting their capacity for holistic context modeling and parallel decoding. With these benefits, diffusion models can alleviate the inherent limitations of AR methods, including their slow inference speed, error propagation, and unidirectional constraints. Furthermore, we identify the prior underperformance of diffusion models stemming from the absence of an effective latent space for image-text alignment, and the discrepancy between continuous diffusion processes and discrete textual data. In response, we introduce a novel architecture, LaDiC, which utilizes a split BERT to create a dedicated latent space for captions and integrates a regularization module to manage varying text lengths. Our framework also includes a diffuser for semantic image-to-text conversion and a Back&Refine technique to enhance token interactivity during inference. LaDiC achieves state-of-the-art performance for diffusion-based methods on the MS COCO dataset with 38.2 BLEU@4 and 126.2 CIDEr, demonstrating exceptional performance without pre-training or ancillary modules. This indicates strong competitiveness with AR models, revealing the previously untapped potential of diffusion models in image-to-text generation.