Yu Lei
2026
ReaGeo: Reasoning-Enhanced End-to-End Geocoding with LLMs
Jian Cui | Zhiyuan Ren | Desheng Weng | Yongqi Zhao | Gong Wenbin | Yu Lei | Zhenning Dong
Findings of the Association for Computational Linguistics: ACL 2026
Jian Cui | Zhiyuan Ren | Desheng Weng | Yongqi Zhao | Gong Wenbin | Yu Lei | Zhenning Dong
Findings of the Association for Computational Linguistics: ACL 2026
This paper proposes ReaGeo, an end-to-end geocoding framework based on large language models, designed to overcome the limitations of traditional multi-stage approaches that rely on text or vector similarity retrieval over geographic databases, including workflow complexity, error propagation, and heavy dependence on structured geographic knowledge bases. The method converts geographic coordinates into geohash sequences, reformulating the coordinate prediction task as a text generation problem, and introduces a Chain-of-Thought mechanism to enhance the model’s reasoning over spatial relationships. Furthermore, reinforcement learning with a distance-deviation-based reward is applied to optimize the generation accuracy. Comprehensive experiments show that ReaGeo can accurately handle explicit address queries in single-point predictions and effectively resolve vague relative location queries. In addition, the model demonstrates strong predictive capability for non-point geometric regions, highlighting its versatility and generalization ability in geocoding tasks.
A Unified Framework for Modeling Heterogeneous Financial Data via Dual-Granularity Prompting
Yu Lei | Zixuan Wang | Yiqing Feng | Junru Zhang | Yahui Li | Liu Chu | Wang Tongyao | Dongyang Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Yu Lei | Zixuan Wang | Yiqing Feng | Junru Zhang | Yahui Li | Liu Chu | Wang Tongyao | Dongyang Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Recent industrial credit scoring models remain heavily reliant on manually tuned statistical learning methods. Despite their potential, deep learning architectures have struggled to consistently outperform traditional statistical models in industrial credit scoring, largely due to the complexity of heterogeneous financial data and the challenge of modeling evolving creditworthiness. To bridge this gap, we introduce FinLangNet, a novel framework that reformulates credit scoring as a multi-scale sequential learning problem. FinLangNet processes heterogeneous financial data through a dual-module architecture that combines tabular feature extraction with temporal sequence modeling, generating probability distributions of users’ future financial behaviors across multiple time horizons. A key innovation is our dual-prompt mechanism within the sequential module, which introduces learnable prompts operating at both feature-level granularity for capturing fine-grained temporal patterns and user-level granularity for aggregating holistic risk profiles. Notably, real world deployment yielded a 6.3 pp improvement in KS, along with a 9.9% reduction in bad debt rate.
2025
COPR: Continual Human Preference Learning via Optimal Policy Regularization
Han Zhang | Lin Gui | Yu Lei | Yuanzhao Zhai | Yehong Zhang | Zhuo Zhang | Yulan He | Hui Wang | Yue Yu | Kam-Fai Wong | Bin Liang | Ruifeng Xu
Findings of the Association for Computational Linguistics: ACL 2025
Han Zhang | Lin Gui | Yu Lei | Yuanzhao Zhai | Yehong Zhang | Zhuo Zhang | Yulan He | Hui Wang | Yue Yu | Kam-Fai Wong | Bin Liang | Ruifeng Xu
Findings of the Association for Computational Linguistics: ACL 2025
Reinforcement Learning from Human Feedback (RLHF) is effective for aligning Large Language Models (LLMs) with human preferences. However, RLHF’s complex process limits its ability to continually learn human feedback, making it impractical for real-world applications where the deployed model continuously receives feedback from users. The non-RL-based method, such as Direct Preference Optimization (DPO), is not primitively favorable for Continual Learning (CL). We observe that when combined with Experiment Relay (ER) for CL, DPO tends to significantly widen the gap in the probability of human-preferred and dispreferred responses. Consequently, this diminishes the diversity in model generation, potentially leading to model collapse. To overcome the above challenges, we propose the Continual Optimal Policy Regularization (COPR), a novel non-RL offline method to convert the historical optimal policies into optimization constraints when continually learning new preferences. We first derive a moderate reward function from the pairwise ranking loss and then use the moderate reward to calculate a new sampling distribution to construct novel learning objectives and constraints. We also provide formal proof of the learnability of COPR. The experimental results show that COPR outperforms strong CL baselines on our proposed benchmark, in terms of reward-based, GPT-4 evaluations and human assessment.
2016
Content-based Influence Modeling for Opinion Behavior Prediction
Chengyao Chen | Zhitao Wang | Yu Lei | Wenjie Li
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Chengyao Chen | Zhitao Wang | Yu Lei | Wenjie Li
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Nowadays, social media has become a popular platform for companies to understand their customers. It provides valuable opportunities to gain new insights into how a person’s opinion about a product is influenced by his friends. Though various approaches have been proposed to study the opinion formation problem, they all formulate opinions as the derived sentiment values either discrete or continuous without considering the semantic information. In this paper, we propose a Content-based Social Influence Model to study the implicit mechanism underlying the change of opinions. We then apply the learned model to predict users’ future opinions. The advantages of the proposed model is the ability to handle the semantic information and to learn two influence components including the opinion influence of the content information and the social relation factors. In the experiments conducted on Twitter datasets, our model significantly outperforms other popular opinion formation models.
2015
Learning to Adapt Credible Knowledge in Cross-lingual Sentiment Analysis
Qiang Chen | Wenjie Li | Yu Lei | Xule Liu | Yanxiang He
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Qiang Chen | Wenjie Li | Yu Lei | Xule Liu | Yanxiang He
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Search
Fix author
Co-authors
- Wenjie Li 2
- Chengyao Chen 1
- Qiang Chen 1
- Liu Chu 1
- Jian Cui 1
- Zhenning Dong 1
- Yiqing Feng 1
- Lin Gui 1
- Yanxiang He 1
- Yulan He 1
- Dongyang Li 1
- Yahui Li 1
- Bin Liang (梁斌) 1
- Xule Liu 1
- Zhiyuan Ren 1
- Wang Tongyao 1
- Hui Wang 1
- Zhitao Wang 1
- Zixuan Wang 1
- Gong Wenbin 1
- Desheng Weng 1
- Kam-Fai Wong 1
- Ruifeng Xu (徐睿峰) 1
- Yue Yu 1
- Yuanzhao Zhai 1
- Han Zhang 1
- Junru Zhang 1
- Yehong Zhang 1
- Zhuo Zhang 1
- Yongqi Zhao 1