Zeyi Wen

2025

pdf bib abs
EcoTune: Token-Efficient Multi-Fidelity Hyperparameter Optimization for Large Language Model Inference
Yuebin Xu | Zhiyi Chen | Zeyi Wen
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Tuning inference hyperparameters, such as temperature and maximum output tokens, on downstream tasks can enhance inference performance. However, directly applying hyperparameter optimization to these hyperparameters is token-expensive. Multi-fidelity optimization improves HPO efficiency with low-fidelity evaluations, but its static scheduling strategies ignore token consumption, leading to high costs. To address these limitations, we propose a token-efficient multi-fidelity optimization method, which enhances inference performance and minimizes token usage. Our method is empowered by (i) a token-based fidelity definition with explicit token cost modeling on configurations; (ii) a novel Token-Aware Expected Improvement acquisition function that selects configurations based on performance gain per token; and (iii) a dynamic fidelity scheduling mechanism that adapts to real-time budget status. We evaluate our method on LLaMA-2 and LLaMA-3 series across MMLU, Humaneval, MedQA, and OpenBookQA. Our method improves over the HELM leaderboard by 7.1%, 24.3%, 21.9%, and 4.6%, respectively. Compared to existing multi-fidelity HPO baselines, our method reduces token consumption by over 80% while maintaining or surpassing performance, demonstrating the state-of-the-art token efficiency for inference-time optimization.

pdf bib abs
SEAL: Structure and Element Aware Learning Improves Long Structured Document Retrieval
Xinhao Huang | Zhibo Ren | Yipeng Yu | Ying Zhou | Zulong Chen | Zeyi Wen
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

In long structured document retrieval, existing methods typically fine-tune pre-trained language models (PLMs) using contrastive learning on datasets lacking explicit structural information. This practice suffers from two critical issues: 1) current methods fail to leverage structural features and element-level semantics effectively, and 2) the lack of datasets containing structural metadata. To bridge these gaps, we propose SEAL, a novel contrastive learning framework. It leverages structure-aware learning to preserve semantic hierarchies and masked element alignment for fine-grained semantic discrimination. Furthermore, we release StructDocRetrieval, a long structured document retrieval dataset with rich structural annotations. Extensive experiments on both the released and industrial datasets across various modern PLMs, and online A/B testing demonstrate consistent improvements, boosting NDCG@10 from 73.96% to 77.84% on BGE-M3. The resources are available at https://github.com/xinhaoH/SEAL.

Talent search is a cornerstone of modern recruitment systems, yet existing approaches often struggle to capture nuanced job-specific preferences, model recruiter behavior at a fine-grained level, and mitigate noise from subjective human judgments. We present a novel framework that enhances talent search effectiveness and delivers substantial business value through two key innovations: (i) leveraging LLMs to extract fine-grained recruitment signals from job descriptions and historical hiring data, and (ii) employing a role-aware multi-gate MoE network to capture behavioral differences across recruiter roles. To further reduce noise, we introduce a multi-task learning module that jointly optimizes click-through rate (CTR), conversion rate (CVR), and resume matching relevance. Experiments on real-world recruitment data and online A/B testing show relative AUC gains of 1.70% (CTR) and 5.97% (CVR), and a 17.29% lift in click-through conversion rate. These improvements reduce dependence on external sourcing channels, enabling an estimated annual cost saving of millions of CNY.

pdf bib abs
Evaluating Small Language Models for News Summarization: Implications and Factors Influencing Performance
Borui Xu | Yao Chen | Zeyi Wen | Weiguo Liu | Bingsheng He
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

The increasing demand for efficient summarization tools in resource-constrained environments highlights the need for effective solutions. While large language models (LLMs) deliver superior summarization quality, their high computational resource requirements limit practical use applications. In contrast, small language models (SLMs) present a more accessible alternative, capable of real-time summarization on edge devices. However, their summarization capabilities and comparative performance against LLMs remain underexplored. This paper addresses this gap by presenting a comprehensive evaluation of 19 SLMs for news summarization across 2,000 news samples, focusing on relevance, coherence, factual consistency, and summary length. Our findings reveal significant variations in SLM performance, with top-performing models such as Phi3-Mini and Llama3.2-3B-Ins achieving results comparable to those of 70B LLMs while generating more concise summaries. Notably, SLMs are better suited for simple prompts, as overly complex prompts may lead to a decline in summary quality. Additionally, our analysis indicates that instruction tuning does not consistently enhance the news summarization capabilities of SLMs. This research not only contributes to the understanding of SLMs but also provides practical insights for researchers seeking efficient summarization solutions that balance performance and resource use.

2024

pdf bib abs
Exploiting Careful Design of SVM Solution for Aspect-term Sentiment Analysis
Hanfeng Liu | Minping Chen | Zhenya Zheng | Zeyi Wen
Findings of the Association for Computational Linguistics: EMNLP 2024

Aspect-term sentiment analysis (ATSA) identifies fine-grained sentiments towards specific aspects of the text. While pre-trained language models (PLMs) have set the state-of-the-art (SOTA) for ATSA, they are resource-intensive due to their large model sizes, restricting their wide applications to resource-constrained scenarios. Conversely, conventional machine learning methods, such as Support Vector Machines (SVMs), offer the benefit of less resource requirement but have lower predictive accuracy. This paper introduces an innovative pipeline, termed SVM-ATSA, which bridges the gap between the accuracy of SVM-based methods and the efficiency of PLM-based methods. To improve the feature expression of SVMs and better adapt to the ATSA task, SVM-ATSA decomposes the learning problem into multiple view subproblems, and dynamically selects as well as constructs features with reinforcement learning. The experimental results demonstrate that SVM-ATSA surpasses SOTA PLM-based methods in predictive accuracy while maintaining a faster inference speed and significantly reducing the number of model parameters.

Co-authors

Bing Xu 1

Venues

Fix author