Jingyuan Ma
2025
Overview of CCL25-Eval Task 1: The Fifth Spatial Cognition Evaluation (SpaCE2025)
Yuhang Qin | Liming Xiao | Nan Hu | Sirui Deng | Jingyuan Ma | Hyang Cui | Zihan Zhang | Chi Hsu Tsai | Jinkun Ding | Sumin Kang | Zhifang Sui | Weidong Zhan
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
Yuhang Qin | Liming Xiao | Nan Hu | Sirui Deng | Jingyuan Ma | Hyang Cui | Zihan Zhang | Chi Hsu Tsai | Jinkun Ding | Sumin Kang | Zhifang Sui | Weidong Zhan
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
"The Fifth Spatial Cognition Evaluation (SpaCE2025) presents a benchmark aimed at evaluating the spatial semantic understanding and reasoning capabilities of Large Language Models(LLMs), primarily in Chinese.It consists of five subtasks: (1) Retrieving Spatial Referents(RSR), (2) Detecting Spatial Semantic Anomalies (DSA), (3) Recognizing Synonymous SpatialExpression (RSE), (4) Spatial Position Reasoning (SPR) in Chinese, and (5) SPR in English. The fourth and fifth subtask share the same content and structure, differing only in language, and are designed to assess the cross-linguistic spatial reasoning capability of LLMs. A total of 12 teams submitted their final results, and the best-performing team achieved an accuracy of 0.7931. The results suggest that while LLMs are capable of handling basic spatial semantic understanding tasks such as RSR, their performance on more complex tasks, such as DSA and RSE, still re-quires improvement. Additionally, finetuning methods that effectively activate LLMs’ reasoning ability are essential to improve their performance."
2024
A Survey on In-context Learning
Qingxiu Dong | Lei Li | Damai Dai | Ce Zheng | Jingyuan Ma | Rui Li | Heming Xia | Jingjing Xu | Zhiyong Wu | Baobao Chang | Xu Sun | Lei Li | Zhifang Sui
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Qingxiu Dong | Lei Li | Damai Dai | Ce Zheng | Jingyuan Ma | Rui Li | Heming Xia | Jingjing Xu | Zhiyong Wu | Baobao Chang | Xu Sun | Lei Li | Zhifang Sui
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
With the increasing capabilities of large language models (LLMs), in-context learning (ICL) has emerged as a new paradigm for natural language processing (NLP), where LLMs make predictions based on contexts augmented with a few examples. It has been a significant trend to explore ICL to evaluate and extrapolate the ability of LLMs. In this paper, we aim to survey and summarize the progress and challenges of ICL. We first present a formal definition of ICL and clarify its correlation to related studies. Then, we organize and discuss advanced techniques, including training strategies, prompt designing strategies, and related analysis. Additionally, we explore various ICL application scenarios, such as data engineering and knowledge updating. Finally, we address the challenges of ICL and suggest potential directions for further research. We hope that our work can encourage more research on uncovering how ICL works and improving ICL.
Be a Multitude to Itself: A Prompt Evolution Framework for Red Teaming
Rui Li | Peiyi Wang | Jingyuan Ma | Di Zhang | Lei Sha | Zhifang Sui
Findings of the Association for Computational Linguistics: EMNLP 2024
Rui Li | Peiyi Wang | Jingyuan Ma | Di Zhang | Lei Sha | Zhifang Sui
Findings of the Association for Computational Linguistics: EMNLP 2024
Large Language Models (LLMs) have gained increasing attention for their remarkable capacity, alongside concerns about safety arising from their potential to produce harmful content. Red teaming aims to find prompts that could elicit harmful responses from LLMs, and is essential to discover and mitigate safety risks before real-world deployment. However, manual red teaming is both time-consuming and expensive, rendering it unscalable. In this paper, we propose RTPE, a scalable evolution framework to evolve red teaming prompts across both breadth and depth dimensions, facilitating the automatic generation of numerous high-quality and diverse red teaming prompts. Specifically, in-breadth evolving employs a novel enhanced in-context learning method to create a multitude of quality prompts, whereas in-depth evolving applies customized transformation operations to enhance both content and form of prompts, thereby increasing diversity. Extensive experiments demonstrate that RTPE surpasses existing representative automatic red teaming methods on both attack success rate and diversity. In addition, based on 4,800 red teaming prompts created by RTPE, we further provide a systematic analysis of 8 representative LLMs across 8 sensitive topics.
ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors
Zhexin Zhang | Yida Lu | Jingyuan Ma | Di Zhang | Rui Li | Pei Ke | Hao Sun | Lei Sha | Zhifang Sui | Hongning Wang | Minlie Huang
Findings of the Association for Computational Linguistics: EMNLP 2024
Zhexin Zhang | Yida Lu | Jingyuan Ma | Di Zhang | Rui Li | Pei Ke | Hao Sun | Lei Sha | Zhifang Sui | Hongning Wang | Minlie Huang
Findings of the Association for Computational Linguistics: EMNLP 2024
The safety of Large Language Models (LLMs) has gained increasing attention in recent years, but there still lacks a comprehensive approach for detecting safety issues within LLMs’ responses in an aligned, customizable and explainable manner. In this paper, we propose ShieldLM, an LLM-based safety detector, which aligns with common safety standards, supports customizable detection rules, and provides explanations for its decisions. To train ShieldLM, we compile a large bilingual dataset comprising 14,387 query-response pairs, annotating the safety of responses based on various safety standards. Through extensive experiments, we demonstrate that ShieldLM surpasses strong baselines across four test sets, showcasing remarkable customizability and explainability. Besides performing well on standard detection datasets, ShieldLM has also been shown to be effective as a safety evaluator for advanced LLMs. ShieldLM is released at https://github.com/thu-coai/ShieldLM to support accurate and explainable safety detection under various safety standards.
Search
Fix author
Co-authors
- Zhifang Sui 4
- Rui Li 3
- Lei Sha 2
- Di Zhang 2
- Baobao Chang (常宝宝) 1
- Hyang Cui 1
- Damai Dai 1
- Sirui Deng 1
- Jinkun Ding 1
- Qingxiu Dong 1
- Nan Hu 1
- Minlie Huang 1
- Sumin Kang 1
- Pei Ke 1
- Lei Li 1
- Lei Li 1
- Yida Lu 1
- Yuhang Qin 1
- Xu Sun 1
- Hao Sun 1
- Chi Hsu Tsai 1
- Peiyi Wang (王培懿) 1
- Hongning Wang 1
- Zhiyong Wu 1
- Heming Xia 1
- Liming Xiao 1
- Jingjing Xu 1
- Weidong Zhan (詹卫东) 1
- Zhexin Zhang 1
- Zihan Zhang 1
- Ce Zheng 1