Binyang Li - ACL Anthology

Binyang Li

2025

基于检索增强思维提示的汉语框架语义解析方法
Yingxu Li | Tao Chen | Yize Li | Binyang Li
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)

"汉语框架语义解析基于框架语义学理论,旨在通过识别句子中词语所激活的语义框架, 分析句子中各个成分的语义角色, 从而揭示语言背后的深层语义结构,进一步更好地抽取事件关系和语境信息。大语言模型出现后,其强大的通用文本理解与生成能力被广泛应用于各种自然语言处理任务中。然而,当前大语言模型在汉语框架语义解析任务中存在推理路径简单、准确率过低的不足,尤其在思维链的逻辑连贯性和检索增强生成的深度应用上存在欠缺。为此,本文提出了一种面向汉语框架语义解析的思维提示方法。该方法结合检索增强生成(RAG)与链式思维(CoT)技术,引导大语言模型完成汉语框架语义解析任务。我们在CFN2.1数据集上的实验结果表明,与最好方法相比,该方法的框架识别准确率提升13.52%,论元识别F1提升2.24%,角色识别F1提升5.09%。"

uir-cis at SemEval-2025 Task 3: Detection of Hallucinations in Generated Text
Jia Huang | Shuli Zhao | Yaru Zhao | Tao Chen | Weijia Zhao | Hangui Lin | Yiyang Chen | Binyang Li
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

The widespread deployment of large language models (LLMs) across diverse domains has underscored the critical need to ensure the credibility and accuracy of their generated content, particularly in the presence of hallucinations. These hallucinations can severely compromise both the practical performance of models and the security of their applications. In response to this issue, SemEval-2025 Task 3 Mu-SHROOM: Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes introduces a more granular task for hallucination detection. This task seeks to identify hallucinations in text, accurately locate hallucinated segments, and assess their credibility. In this paper, we present a three-stage method for fine-grained hallucination detection and localization. First, we transform the text into a triplet representation, facilitating more precise hallucination analysis. Next, we leverage a large language model to generate fact-reference texts that correspond to the triplets. Finally, we employ a fact alignment strategy to identify and localize hallucinated segments by evaluating the semantic consistency between the extracted triplets and the generated reference texts. We evaluate our method on the unlabelled test set across all languages in Task 3, demonstrating strong detection performance and validating its effectiveness in multilingual contexts.

T²: An Adaptive Test-Time Scaling Strategy for Contextual Question Answering
Zhengyi Zhao | Shubo Zhang | Zezhong Wang | Huimin Wang | Yutian Zhao | Bin Liang | Yefeng Zheng | Binyang Li | Kam-Fai Wong | Xian Wu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Recent advances in large language models have demonstrated remarkable performance on Contextual Question Answering (CQA). However, prior approaches typically employ elaborate reasoning strategies regardless of question complexity, leading to low adaptability. Recent efficient test-time scaling methods introduce budget constraints or early stop mechanisms to avoid overthinking for straightforward questions. But they add human bias to the reasoning process and fail to leverage models’ inherent reasoning capabilities. To address these limitations, we present T²: Think-to-Think, a novel framework that dynamically adapts reasoning depth based on question complexity. T² leverages the insight that if an LLM can effectively solve similar questions using specific reasoning strategies, it can apply the same strategy to the original question. This insight enables to adoption of concise reasoning for straightforward questions while maintaining detailed analysis for complex problems. T² works through four key steps: decomposing questions into structural elements, generating similar examples with candidate reasoning strategies, evaluating these strategies against multiple criteria, and applying the most appropriate strategy to the original question. Experimental evaluation across seven diverse CQA benchmarks demonstrates that T² not only achieves higher accuracy than baseline methods but also reduces computational overhead by up to 25.2%.

MemeReaCon: Probing Contextual Meme Understanding in Large Vision-Language Models
Zhengyi Zhao | Shubo Zhang | Yuxi Zhang | Yanxi Zhao | Yifan Zhang | Zezhong Wang | Huimin Wang | Yutian Zhao | Bin Liang | Yefeng Zheng | Binyang Li | Kam-Fai Wong | Xian Wu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Memes have emerged as a popular form of multimodal online communication, where their interpretation heavily depends on the specific context in which they appear. Current approaches predominantly focus on isolated meme analysis, either for harmful content detection or standalone interpretation, overlooking a fundamental challenge: the same meme can express different intents depending on its conversational context. This oversight creates an evaluation gap: although humans intuitively recognize how context shapes meme interpretation, Large Vision Language Models (LVLMs) can hardly understand context-dependent meme intent. To address this critical limitation, we introduce MemeReaCon, a novel benchmark specifically designed to evaluate how LVLMs understand memes in their original context. We collected memes from five different Reddit communities, keeping each meme’s image, the post text, and user comments together. We carefully labeled how the text and meme work together, what the poster intended, how the meme is structured, and how the community responded. Our tests with leading LVLMs show a clear weakness: models either fail to interpret critical information in the contexts, or overly focus on visual details while overlooking communicative purpose. MemeReaCon thus serves both as a diagnostic tool exposing current limitations and as a challenging benchmark to drive development toward more sophisticated LVLMs of the context-aware understanding.

2024

MUCH: A Multimodal Corpus Construction for Conversational Humor Recognition Based on Chinese Sitcom
Hongyu Guo | Wenbo Shang | Xueyao Zhang | Shubo Zhang | Xu Han | Binyang Li
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Conversational humor is the key to capturing dialogue semantics and dialogue comprehension, which is usually generated in multiple modalities, such as linguistic rhetoric (textual modality), exaggerated facial expressions or movements (visual modality), and quirky intonation (acoustic modality). However, existing multimodal corpora for conversation humor are coarse-grained, and the modality is insufficient to support the conversational humor recognition task. This paper designed an annotation scheme for multimodal humor datasets, and constructed a corpus based on a Chinese sitcom for conversational humor recognition, named MUCH. The MUCH corpus consists of 34,804 utterances in total, and 7,079 of them are humorous. We employed both unimodal and multimodal methods to test our MUCH corpus. Experimental results showed that the multimodal approach could achieve 75.94% in terms of F1-score and surpassed the performance of most unimodal methods, which demonstrated that the MUCH corpus was effective for multimodal humor recognition tasks.

UIR-ISC at SemEval-2024 Task 3: Textual Emotion-Cause Pair Extraction in Conversations
Hongyu Guo | Xueyao Zhang | Yiyang Chen | Lin Deng | Binyang Li
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

The goal of Emotion Cause Pair Extraction (ECPE) is to explore the causes of emotion changes and what causes a certain emotion. This paper proposes a three-step learning approach for the task of Textual Emotion-Cause Pair Extraction in Conversations in SemEval-2024 Task 3, named ECSP. We firstly perform data preprocessing operations on the original dataset to construct negative samples. Secondly, we use a pre-trained model to construct token sequence representations with contextual information to obtain emotion prediction. Thirdly, we regard the textual emotion-cause pair extraction task as a machine reading comprehension task, and fine-tune two pre-trained models, RoBERTa and SpanBERT. Our results have achieved good results in the official rankings, ranking 3rd under the strict match with the Strict F1-score of 15.18%, which further shows that our system has a robust performance.

2023

UIRISC at SemEval-2023 Task 10: Explainable Detection of Online Sexism by Ensembling Fine-tuning Language Models
Tianyun Zhong | Runhui Song | Xunyuan Liu | Juelin Wang | Boya Wang | Binyang Li
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

Under the umbrella of anonymous social networks, many women have suffered from abuse, discrimination, and other sexist expressions online. However, exsiting methods based on keyword filtering and matching performed poorly on online sexism detection, which lacked the capability to identify implicit stereotypes and discrimination. Therefore, this paper proposes a System of Ensembling Fine-tuning Models (SEFM) at SemEval-2023 Task 10: Explainable Detection of Online Sexism. We firstly use four task-adaptive pre-trained language models to flag all texts. Secondly, we alleviate the data imbalance from two perspectives: over-sampling the labelled data and adjusting the loss function. Thirdly, we add indicators and feedback modules to enhance the overall performance. Our system attained macro F1 scores of 0.8538, 0.6619, and 0.4641 for Subtask A, B, and C, respectively. Our system exhibited strong performance across multiple tasks, with particularly noteworthy performance in Subtask B. Comparison experiments and ablation studies demonstrate the effectiveness of our system.

System Report for CCL23-Eval Task 3: UIR-ISC Pre-trained Language Medel for Chinese Frame Semantic Parsing
Yingxuan Guan | Xunyuan Liu | Lu Zhang | Zexian Xie | Binyang Li
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)

“Chinese Frame Semantic Parsing (CFSP) is a semantic parsing task based on Chinese FrameNet(CFN). This paper presents a solution for CCL2023-Eval Task 3. We first attempt various pre-trained models for different sub-tasks. Then, we explore multiple approaches to solving eachtask from the perspectives of feature engineering, model structure, and other tricks. Finally,we provide prospects for the task and propose potential alternative solutions. We conductedextensive comparative experiments to validate the effectiveness of our system. Introduction”

Arthur Caplan at SemEval-2023 Task 4: Enhancing Human Value Detection through Fine-tuned Pre-trained Models
Xianxian Song | Jinhui Zhao | Ruiqi Cao | Linchi Sui | Binyang Li | Tingyue Guan
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

The computational identification of human values is a novel and challenging research that holds the potential to offer valuable insights into the nature of human behavior and cognition. This paper presents the methodology adopted by the Arthur-Caplan research team for the SemEval-2023 Task 4, which entailed the detection of human values behind arguments. The proposed system integrates BERT, ERNIE2.0, RoBERTA and XLNet models with fine tuning. Experimental results show that the macro F1 score of our system achieved 0.512, which overperformed baseline methods by 9.2% on the test set.

2022

Social Bot-Aware Graph Neural Network for Early Rumor Detection
Zhen Huang | Zhilong Lv | Xiaoyun Han | Binyang Li | Menglong Lu | Dongsheng Li
Proceedings of the 29th International Conference on Computational Linguistics

Early rumor detection is a key challenging task to prevent rumors from spreading widely. Sociological research shows that social bots’ behavior in the early stage has become the main reason for rumors’ wide spread. However, current models do not explicitly distinguish genuine users from social bots, and their failure in identifying rumors timely. Therefore, this paper aims at early rumor detection by accounting for social bots’ behavior, and presents a Social Bot-Aware Graph Neural Network, named SBAG. SBAG firstly pre-trains a multi-layer perception network to capture social bot features, and then constructs multiple graph neural networks by embedding the features to model the early propagation of posts, which is further used to detect rumors. Extensive experiments on three benchmark datasets show that SBAG achieves significant improvements against the baselines and also identifies rumors within 3 hours while maintaining more than 90% accuracy.

“I Know Who You Are”: Character-Based Features for Conversational Humor Recognition in Chinese
Wenbo Shang | Jiangjiang Zhao | Zezhong Wang | Binyang Li | Fangchun Yang | Kam-Fai Wong
Findings of the Association for Computational Linguistics: EMNLP 2022

Humor plays an important role in our daily life, as it is an essential and fascinating element in the communication between persons. Therefore, how to recognize punchlines from the dialogue, i.e. conversational humor recognition, has attracted much interest of computational linguistics communities. However, most existing work attempted to understand the conversational humor by analyzing the contextual information of the dialogue, but neglected the character of the interlocutor, such as age, gender, occupation, and so on. For instance, the same utterance could bring out humorous from a serious person, but may be a plain expression from a naive person. To this end, this paper proposes a Character Fusion Conversational Humor Recognition model (CFCHR) to explore character information to recognize conversational humor. CFCHR utilizes a multi-task learning framework that unifies two highly pertinent tasks, i.e., character extraction and punchline identification. Based on deep neural networks, we trained both tasks jointly by sharing weight to extract the common and task-invariant features while each task could still learn its task-specific features. Experiments were conducted on Chinese sitcoms corpus, which consisted of 12,677 utterances from 22 characters. The experimental results demonstrated that CFCHR could achieve 33.08% improvements in terms of F1-score over some strong baselines, and proved the effectiveness of the character information to identify the punchlines.

2020

CHIME: Cross-passage Hierarchical Memory Network for Generative Review Question Answering
Junru Lu | Gabriele Pergola | Lin Gui | Binyang Li | Yulan He
Proceedings of the 28th International Conference on Computational Linguistics

We introduce CHIME, a cross-passage hierarchical memory network for question answering (QA) via text generation. It extends XLNet introducing an auxiliary memory module consisting of two components: the context memory collecting cross-passage evidences, and the answer memory working as a buffer continually refining the generated answers. Empirically, we show the efficacy of the proposed architecture in the multi-passage generative QA, outperforming the state-of-the-art baselines with better syntactically well-formed answers and increased precision in addressing the questions of the AmazonQA review dataset. An additional qualitative analysis revealed the interpretability introduced by the memory module.

2019

Context-aware Embedding for Targeted Aspect-based Sentiment Analysis
Bin Liang | Jiachen Du | Ruifeng Xu | Binyang Li | Hejiao Huang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Attention-based neural models were employed to detect the different aspects and sentiment polarities of the same target in targeted aspect-based sentiment analysis (TABSA). However, existing methods do not specifically pre-train reasonable embeddings for targets and aspects in TABSA. This may result in targets or aspects having the same vector representations in different contexts and losing the context-dependent information. To address this problem, we propose a novel method to refine the embeddings of targets and aspects. Such pivotal embedding refinement utilizes a sparse coefficient vector to adjust the embeddings of target and aspect from the context. Hence the embeddings of targets and aspects can be refined from the highly correlative words instead of using context-independent or randomly initialized vectors. Experiment results on two benchmark datasets show that our approach yields the state-of-the-art performance in TABSA task.

Early Rumour Detection
Kaimin Zhou | Chang Shu | Binyang Li | Jey Han Lau
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Rumours can spread quickly through social media, and malicious ones can bring about significant economical and social impact. Motivated by this, our paper focuses on the task of rumour detection; particularly, we are interested in understanding how early we can detect them. Although there are numerous studies on rumour detection, few are concerned with the timing of the detection. A successfully-detected malicious rumour can still cause significant damage if it isn’t detected in a timely manner, and so timing is crucial. To address this, we present a novel methodology for early rumour detection. Our model treats social media posts (e.g. tweets) as a data stream and integrates reinforcement learning to learn the number minimum number of posts required before we classify an event as a rumour. Experiments on Twitter and Weibo demonstrate that our model identifies rumours earlier than state-of-the-art systems while maintaining a comparable accuracy.

2018

ISCLAB at SemEval-2018 Task 1: UIR-Miner for Affect in Tweets
Meng Li | Zhenyuan Dong | Zhihao Fan | Kongming Meng | Jinghua Cao | Guanqi Ding | Yuhan Liu | Jiawei Shan | Binyang Li
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper presents a UIR-Miner system for emotion and sentiment analysis evaluation in Twitter in SemEval 2018. Our system consists of three main modules: preprocessing module, stacking module to solve the intensity prediction of emotion and sentiment, LSTM network module to solve multi-label classification, and the hierarchical attention network module for solving emotion and sentiment classification problem. According to the metrics of SemEval 2018, our system gets the final scores of 0.636, 0.531, 0.731, 0.708, and 0.408 on 5 subtasks, respectively.

The UIR Uncertainty Corpus for Chinese: Annotating Chinese Microblog Corpus for Uncertainty Identification from Social Media
Binyang Li | Jun Xiang | Le Chen | Xu Han | Xiaoyan Yu | Ruifeng Xu | Tengjiao Wang | Kam-fai Wong
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

ACE: Automatic Colloquialism, Typographical and Orthographic Errors Detection for Chinese Language
Shichao Dong | Gabriel Pui Cheong Fung | Binyang Li | Baolin Peng | Ming Liao | Jia Zhu | Kam-fai Wong
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

We present a system called ACE for Automatic Colloquialism and Errors detection for written Chinese. ACE is based on the combination of N-gram model and rule-base model. Although it focuses on detecting colloquial Cantonese (a dialect of Chinese) at the current stage, it can be extended to detect other dialects. We chose Cantonese becauase it has many interesting properties, such as unique grammar system and huge colloquial terms, that turn the detection task extremely challenging. We conducted experiments using real data and synthetic data. The results indicated that ACE is highly reliable and effective.

2015

Overview of Topic-based Chinese Message Polarity Classification in SIGHAN 2015
Xiangwen Liao | Binyang Li | Liheng Xu
Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing

UIR-PKU: Twitter-OpinMiner System for Sentiment Analysis in Twitter at SemEval 2015
Xu Han | Binyang Li | Jing Ma | Yuxiao Zhang | Gaoyan Ou | Tengjiao Wang | Kam-fai Wong
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

Web Information Mining and Decision Support Platform for the Modern Service Industry
Binyang Li | Lanjun Zhou | Zhongyu Wei | Kam-fai Wong | Ruifeng Xu | Yunqing Xia
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations

The CUHK Discourse TreeBank for Chinese: Annotating Explicit Discourse Connectives for the Chinese TreeBank
Lanjun Zhou | Binyang Li | Zhongyu Wei | Kam-Fai Wong
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The lack of open discourse corpus for Chinese brings limitations for many natural language processing tasks. In this work, we present the first open discourse treebank for Chinese, namely, the Discourse Treebank for Chinese (DTBC). At the current stage, we annotated explicit intra-sentence discourse connectives, their corresponding arguments and senses for all 890 documents of the Chinese Treebank 5. We started by analysing the characteristics of discourse annotation for Chinese, adapted the annotation scheme of Penn Discourse Treebank 2 (PDTB2) to Chinese language while maintaining the compatibility as far as possible. We made adjustments to 3 essential aspects according to the previous study of Chinese linguistics. They are sense hierarchy, argument scope and semantics of arguments. Agreement study showed that our annotation scheme could achieve highly reliable results.

Exploiting Community Emotion for Microblog Event Detection
Gaoyan Ou | Wei Chen | Tengjiao Wang | Zhongyu Wei | Binyang Li | Dongqing Yang | Kam-Fai Wong
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

Is Twitter A Better Corpus for Measuring Sentiment Similarity?
Shi Feng | Le Zhang | Binyang Li | Daling Wang | Ge Yu | Kam-Fai Wong
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

An Empirical Study on Uncertainty Identification in Social Media Context
Zhongyu Wei | Junwen Chen | Wei Gao | Binyang Li | Lanjun Zhou | Yulan He | Kam-Fai Wong
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

Cross-Lingual Identification of Ambiguous Discourse Connectives for Resource-Poor Language
Lanjun Zhou | Wei Gao | Binyang Li | Zhongyu Wei | Kam-Fai Wong
Proceedings of COLING 2012: Posters

2011

Unsupervised Discovery of Discourse Relations for Eliminating Intra-sentence Polarity Ambiguities
Lanjun Zhou | Binyang Li | Wei Gao | Zhongyu Wei | Kam-Fai Wong
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

A Unified Graph Model for Sentence-Based Opinion Retrieval
Binyang Li | Lanjun Zhou | Shi Feng | Kam-Fai Wong
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

Co-authors

Bin Liang (梁斌) 3

Tengjiao Wang 3

Ruifeng Xu (徐睿峰) 3

Zhenyuan Dong 1

Gabriel Pui Cheong Fung 1

Yingxuan Guan 1

Xiangwen Liao 1

Kongming Meng 1

Gabriele Pergola 1

Xianxian Song 1

Fangchun Yang 1

Dongqing Yang 1

Ge Yu (于戈) 1

Jiangjiang Zhao 1

Tianyun Zhong 1

Venues