Liang Yang (杨亮) - ACL Anthology

Liang Yang

Also published as: 亮杨

2025

pdf bib abs
Sheep’s Skin, Wolf’s Deeds: Are LLMs Ready for Metaphorical Implicit Hate Speech?
Jingjie Zeng | Liang Yang | Zekun Wang | Yuanyuan Sun | Hongfei Lin
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Implicit hate speech has become a significant challenge for online platforms, as it often avoids detection by large language models (LLMs) due to its indirectly expressed hateful intent. This study identifies the limitations of LLMs in detecting implicit hate speech, particularly when disguised as seemingly harmless expressions in a rhetorical device. To address this challenge, we employ a Jailbreaking strategy and Energy-based Constrained Decoding techniques, and design a small model for measuring the energy of metaphorical rhetoric. This approach can lead to LLMs generating metaphorical implicit hate speech. Our research reveals that advanced LLMs, like GPT-4o, frequently misinterpret metaphorical implicit hate speech, and fail to prevent its propagation effectively. Even specialized models, like ShieldGemma and LlamaGuard, demonstrate inadequacies in blocking such content, often misclassifying it as harmless speech. This work points out the vulnerability of current LLMs to implicit hate speech, and emphasizes the improvements to address hate speech threats better.

pdf bib abs
It’s Not Bragging If You Can Back It Up: Can LLMs Understand Braggings?
Jingjie Zeng | Huayang Li | Liang Yang | Yuanyuan Sun | Hongfei Lin
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Bragging, as a pervasive social-linguistic phenomenon, reflects complex human interaction patterns. However, the understanding and generation of appropriate bragging behavior in large language models (LLMs) remains underexplored. In this paper, we propose a comprehensive study that combines analytical and controllable approaches to examine bragging in LLMs. We design three tasks, bragging recognition, bragging explanation, and bragging generation, along with novel evaluation metrics to assess the models’ ability to identify bragging intent, social appropriateness, and account for context sensitivity. Our analysis reveals the challenges of bragging in the social context, such as recognizing bragging and responding appropriately with bragging in conversation. This work provides new insights into how LLMs process bragging and highlights the need for more research on generating contextually appropriate behavior in LLMs.

Large Language Models (LLMs) have become essential for offensive language detection, yet their ability to handle annotation disagreement remains underexplored. Disagreement samples, which arise from subjective interpretations, pose a unique challenge due to their ambiguous nature. Understanding how LLMs process these cases, particularly their confidence levels, can offer insight into their alignment with human annotators. This study systematically evaluates the performance of multiple LLMs in detecting offensive language at varying levels of annotation agreement. We analyze binary classification accuracy, examine the relationship between model confidence and human disagreement, and explore how disagreement samples influence model decision-making during few-shot learning and instruction fine-tuning. Our findings reveal that LLMs struggle with low-agreement samples, often exhibiting overconfidence in these ambiguous cases. However, utilizing disagreement samples in training improves both detection accuracy and model alignment with human judgment. These insights provide a foundation for enhancing LLM-based offensive language detection in real-world moderation tasks.

The proliferation of hate speech has caused significant harm to society. The intensity and directionality of hate are closely tied to the target and argument it is associated with. However, research on hate speech detection in Chinese has lagged behind, and existing datasets lack span-level fine-grained annotations. Furthermore, the lack of research on Chinese hateful slang poses a significant challenge. In this paper, we provide two valuable fine-grained Chinese hate speech detection research resources. First, we construct a Span-level Target-Aware Toxicity Extraction dataset (STATE ToxiCN), which is the first span-level Chinese hate speech dataset. Secondly, we evaluate the span-level hate speech detection performance of existing models using STATE ToxiCN. Finally, we conduct the first study on Chinese hateful slang and evaluate the ability of LLMs to understand hate semantics. Our work contributes valuable resources and insights to advance span-level hate speech detection in Chinese.

pdf bib abs
Sarcasm-R1: Enhancing Sarcasm Detection through Focused Reasoning
Qi Yang | Jingjie Zeng | Liang Yang | Kai Ma | Hongfei Lin
Findings of the Association for Computational Linguistics: EMNLP 2025

Sarcasm detection is a crucial yet challenging task in natural language processing. Existing methods primarily rely on supervised learning or prompt engineering, which often struggle to capture the complex reasoning process required for effective sarcasm detection. This paper proposes a novel approach that decomposes sarcasm detection into three fundamental dimensions: language, context, and emotion, meticulously modeling the sarcasm reasoning process. To enhance the quality of reasoning, we employ reinforcement learning algorithms and design customized reward models for each dimension. We utilize five widely used sarcasm detection datasets and annotate the sarcasm reasoning process from these three dimensions to improve the performance of the reward models. Experiments demonstrate that our method outperforms state-of-the-art baseline methods in most cases. Additionally, we observe the central role of emotional contrast in sarcasm detection. Our research provides empirical insights into the mechanism of sarcasm, emphasizing that emotional contrast is at its core, supported by linguistic and contextual cues.

pdf bib abs
Human-Inspired Obfuscation for Model Unlearning: Local and Global Strategies with Hyperbolic Representations
Zekun Wang | Jingjie Zeng | Yingxu Li | Liang Yang | Hongfei Lin
Findings of the Association for Computational Linguistics: EMNLP 2025

Large language models (LLMs) achieve remarkable performance across various domains, largely due to training on massive datasets. However, this also raises growing concerns over the exposure of sensitive and private information, making model unlearning increasingly critical.However, existing methods often struggle to balance effective forgetting with maintaining model utility. In this work, we propose HyperUnlearn, a human-inspired unlearning framework. We construct two types of fuzzy data—local and global—to simulate forgetting, and represent them in hyperbolic and Euclidean spaces, respectively. Unlearning is performed on a model with frozen early layers to isolate forgetting and preserve useful knowledge.Experiments demonstrate that HyperUnlearn effectively forgets sensitive content while maintaining the model’s language understanding, fluency, and benchmark performance, offering a practical trade-off between forgetting and capability preservation.

pdf bib abs
Commonality and Individuality! Integrating Humor Commonality with Speaker Individuality for Humor Recognition
Haohao Zhu | Junyu Lu | Zeyuan Zeng | Zewen Bai | Xiaokun Zhang | Liang Yang | Hongfei Lin
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Humor recognition aims to identify whether a specific speaker’s text is humorous. Current methods for humor recognition mainly suffer from two limitations: (1) they solely focus on one aspect of humor commonalities, ignoring the multifaceted nature of humor; and (2) they typically overlook the critical role of speaker individuality, which is essential for a comprehensive understanding of humor expressions. To bridge these gaps, we introduce the Commonality and Individuality Incorporated Network for Humor Recognition (CIHR), a novel model designed to enhance humor recognition by integrating multifaceted humor commonalities with the distinctive individuality of speakers. The CIHR features a Humor Commonality Analysis module that explores various perspectives of multifaceted humor commonality within user texts, and a Speaker Individuality Extraction module that captures both static and dynamic aspects of a speaker’s profile to accurately model their distinctive individuality. Additionally, Static and Dynamic Fusion modules are introduced to effectively incorporate the humor commonality with speaker’s individuality in the humor recognition process. Extensive experiments demonstrate the effectiveness of CIHR, underscoring the importance of concurrently addressing both multifaceted humor commonality and distinctive speaker individuality in humor recognition.

pdf bib abs
DUTtask10 at SemEval-2025 Task 10: ThoughtFlow: Hierarchical Narrative Classification via Stepwise Prompting
Du Py | Huayang Li | Liang Yang | Zhang Shaowu
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This paper describes our system for SemEval-2025 Task 10: Hierarchical Narrative Classification. We propose a two-step hierarchical approach that combines generative reasoning and fine-tuning for sub-narrative classification. The main techniques of our system are: 1) leveraging a large pre-trained model to generate a reasoning process for better context understanding, 2) fine-tuning the model for precise sub-narrative categorization, 3) using a multi-label classification strategy for more accurate sub-narrative identification, and 4) incorporating data augmentation to increase the diversity and robustness of the training data. Our system ranked 1st in Subtask 2 for Hindi, achieving an F1 macro coarse score of 0.56900 and an F1 samples score of 0.53500. The results demonstrate the effectiveness of our approach in classifying narratives and sub-narratives in a multilingual setting, with the additional benefit of enhanced model performance through data augmentation.

pdf bib abs
DUTir at SemEval-2025 Task 4: Optimized Fine-Tuning of Linear Layers for Balanced Knowledge Forgetting and Retention
Zekun Wang | Jingjie Zeng | Yingxu Li | Liang Yang | Hongfei Lin
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This paper describes our system used in SemEval-2025 Task 4: Unlearning sensitive content from Large Language Models. In this work, we propose a method for controlling the fine-tuning of a model’s linear layers, referred to as CTL-Finetune (Control-Tuned Linear Fine-tuning). The goal of our method is to allow the model to forget specific information while preserving the knowledge it needs to retain. The method consists of four main components: 1) shuffling data labels, 2) shuffling label gradient calculation, 3) determination of control layers, and 4) fine-tuning using a combination of gradient ascent and gradient descent. Experimental results demonstrate that our approach effectively enables the model to forget targeted knowledge while minimizing the impact on retained information, thus maintaining the model’s overall performance.

pdf bib abs
DUTJBD at SemEval-2025 Task 3: A Range of Approaches for Predicting Hallucination Generation in Models
Shengdi Yin | Zekun Wang | Liang Yang | Hongfei Lin
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This paper details the various methods we explored.Thank you.

pdf bib abs
Zhoumou at SemEval-2025 Task 1: Leveraging Multimodal Data Augmentation and Large Language Models for Enhanced Idiom Understanding
Yingzhou Zhao | Bowen Guan | Liang Yang | Hongfei Lin
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This paper elaborates on my task content regarding Semeval 2025 Task 1 Subtask A. Please refer to it.

pdf bib abs
dutir914 at SemEval-2025 Task 1: An integrated approach for Multimodal Idiomaticity Representations
Yanan Wang | Dailin Li | Yicen Tian | Bo Zhang | Wang Jian | Liang Yang
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

SemEval-2025 Task 1 introduces multimodal datasets for idiomatic expression representation. Subtask A focuses on ranking images based on potentially idiomatic noun compounds in given sentences. Idiom comprehension demands the fusion of visual and auditory elements with contextual semantics, yet existing datasets exhibit phrase-image discordance and culture-specific opacity, impeding cross-modal semantic alignment. To address these challenges, we propose an integrated approach that combines data augmentation and model fine-tuning in subtask A. First, we construct two idiom datasets by generating visual metaphors for idiomatic expressions to fine-tune the CLIP model. Next, We propose a three-stage multimodal chain-of-thought method, fine-tuning Qwen2.5-VL-7B-Instruct to generate rationales and perform inference, alongside zero-shot experiments with Qwen2.5-VL-72B-Instruct. Finally, we integrate the output of different models through a voting mechanism to enhance the accuracy of multimodal semantic matching. This approach achieves {textbf{0.92}} accuracy on the Portuguese test set and {textbf{0.93}} on the English test set, ranking {textbf{3rd}} and {textbf{4th}}, respectively. The implementation code is publicly available here{footnote{{url{ https://github.com/wyn1015/semeval}}}}.

2024

The rapid development of social media has led to an increase in online harassment and offensive speech, posing significant challenges for effective content moderation. Existing automated detection models often exhibit a bias towards predicting offensive speech based on specific vocabulary, which not only compromises model fairness but also potentially exacerbates biases against vulnerable and minority groups. Addressing these issues, this paper proposes a bias self-awareness and data self-iteration framework for mitigating model biases. This framework aims to “giving control back to models: enabling offensive language detection models to autonomously identify and mitigate biases” through bias self-awareness algorithms and self-iterative data augmentation method. Experimental results demonstrate that the proposed framework effectively reduces the false positive rate of models in both in-distribution and out-of-distribution tests, enhances model accuracy and fairness, and shows promising performance improvements in detecting offensive speech on larger-scale datasets.

Disclaimer: Samples in this paper may be harmful and cause discomfort! Patronizing and condescending language (PCL) is a form of speech directed at vulnerable groups. As an essential branch of toxic language, this type of language exacerbates conflicts and confrontations among Internet communities and detrimentally impacts disadvantaged groups. Traditional pre-trained language models (PLMs) perform poorly in detecting PCL due to its implicit toxicity traits like hypocrisy and false sympathy. With the rise of large language models (LLMs), we can harness their rich emotional semantics to establish a paradigm for exploring implicit toxicity. In this paper, we introduce PclGPT, a comprehensive LLM benchmark designed specifically for PCL. We collect, annotate, and integrate the Pcl-PT/SFT dataset, and then develop a bilingual PclGPT-EN/CN model group through a comprehensive pre-training and supervised fine-tuning staircase process to facilitate implicit toxic detection. Group detection results and fine-grained detection from PclGPT and other models reveal significant variations in the degree of bias in PCL towards different vulnerable groups, necessitating increased societal attention to protect them.

pdf bib abs
Exploring the Capability of Multimodal LLMs with Yonkoma Manga: The YManga Dataset and Its Challenging Tasks
Qi Yang | Jingjie Zeng | Liang Yang | Zhihao Yang | Hongfei Lin
Findings of the Association for Computational Linguistics: EMNLP 2024

Yonkoma Manga, characterized by its four-panel structure, presents unique challenges due to its rich contextual information and strong sequential features. To address the limitations of current multimodal large language models (MLLMs) in understanding this type of data, we create a novel dataset named YManga from the Internet. After filtering out low-quality content, we collect a dataset of 1,015 yonkoma strips, containing 10,150 human annotations. We then define three challenging tasks for this dataset: panel sequence detection, generation of the author’s creative intention, and description generation for masked panels. These tasks progressively introduce the complexity of understanding and utilizing such image-text data. To the best of our knowledge, YManga is the first dataset specifically designed for yonkoma manga strips understanding. Extensive experiments conducted on this dataset reveal significant challenges faced by current multimodal large language models. Our results show a substantial performance gap between models and humans across all three tasks.

pdf bib abs
“Barking up the Right Tree”, a GAN-Based Pun Generation Model through Semantic Pruning
JingJie Zeng | Liang Yang | Jiahao Kang | Yufeng Diao | Zhihao Yang | Hongfei Lin
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In the realm of artificial intelligence and linguistics, the automatic generation of humor, particularly puns, remains a complex task. This paper introduces an innovative approach that employs a Generative Adversarial Network (GAN) and semantic pruning techniques to generate humorous puns. We initiate our process by identifying potential pun candidates via semantic pruning. This is followed by the use of contrastive learning to decode the unique characteristics of puns, emphasizing both correct and incorrect interpretations. The learned features from contrastive learning are utilized within our GAN model to better capture the semantic nuances of puns. Specifically, the generator exploits the pruned semantic tree to generate pun texts, while the discriminator evaluates the generated puns, ensuring both linguistic correctness and humor. Evaluation results highlight our model’s capacity to produce semantically coherent and humorous puns, demonstrating an enhancement over prior methods and approach human-level performance. This work contributes significantly to the field of computational humor, advancing the capabilities of automatic pun generation.

pdf bib abs
Leveraging Social Context for Humor Recognition and Sense of Humor Evaluation in Social Media with a New Chinese Humor Corpus - HumorWB
Zeyuan Zeng | Zefeng Li | Liang Yang | Hongfei Lin
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

With the development of the Internet, social media has produced a large amount of user-generated data, which brings new challenges for humor computing. Traditional humor computing research mainly focuses on the content, while neglecting the information of interaction relationships in social media. In addition, both content and users are important in social media, while existing humor computing research mainly focuses on content rather than people. To address these problems, we model the information transfer and entity interactions in social media as a heterogeneous graph, and create the first dataset which introduces the social context information - HumorWB, which is collected from Chinese social media - Weibo. Two humor-related tasks are designed in the dataset. One is a content-oriented humor recognition task, and the other is a novel humor evaluation task. For the above tasks, we purpose a graph-based model called SCOG, which uses heterogeneous graph neural networks to optimize node representation for downstream tasks. Experimental results demonstrate the effectiveness of feature extraction and graph representation learning methods in the model, as well as the necessity of introducing social context information.

pdf bib abs
Take Its Essence, Discard Its Dross! Debiasing for Toxic Language Detection via Counterfactual Causal Effect
Junyu Lu | Bo Xu | Xiaokun Zhang | Kaiyuan Liu | Dongyu Zhang | Liang Yang | Hongfei Lin
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Researchers have attempted to mitigate lexical bias in toxic language detection (TLD). However, existing methods fail to disentangle the “useful” and “misleading” impact of lexical bias on model decisions. Therefore, they do not effectively exploit the positive effects of the bias and lead to a degradation in the detection performance of the debiased model. In this paper, we propose a Counterfactual Causal Debiasing Framework (CCDF) to mitigate lexical bias in TLD. It preserves the “useful impact” of lexical bias and eliminates the “misleading impact”. Specifically, we first represent the total effect of the original sentence and biased tokens on decisions from a causal view. We then conduct counterfactual inference to exclude the direct causal effect of lexical bias from the total effect. Empirical evaluations demonstrate that the debiased TLD model incorporating CCDF achieves state-of-the-art performance in both accuracy and fairness compared to competitive baselines applied on several vanilla models. The generalization capability of our model outperforms current debiased models for out-of-distribution data.

pdf bib abs
yangqi at SemEval-2024 Task 9: Simulate Human Thinking by Large Language Model for Lateral Thinking Challenges
Qi Yang | Jingjie Zeng | Liang Yang | Hongfei Lin
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This paper describes our system used in the SemEval-2024 Task 9 on two sub-tasks, BRAINTEASER: A Novel Task Defying Common Sense. In this work, we developed a system SHTL, which means simulate human thinking capabilities by Large Language Model (LLM). Our approach bifurcates into two main components: Common Sense Reasoning and Rationalize Defying Common Sense. To mitigate the hallucinations of LLM, we implemented a strategy that combines Retrieval-augmented Generation (RAG) with the the Self-Adaptive In-Context Learning (SAICL), thereby sufficiently leveraging the powerful language ability of LLM. The effectiveness of our method has been validated by its performance on the test set, with an average performance on two subtasks that is 30.1 higher than ChatGPT setting zero-shot and only 0.8 lower than that of humans.

pdf bib abs
Werkzeug at SemEval-2024 Task 8: LLM-Generated Text Detection via Gated Mixture-of-Experts Fine-Tuning
Youlin Wu | Kaichun Wang | Kai Ma | Liang Yang | Hongfei Lin
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

Recent advancements in Large Language Models (LLMs) have propelled text generation to unprecedented heights, approaching human-level quality. However, it poses a new challenge to distinguish LLM-generated text from human-written text. Presently, most methods address this issue through classification, achieved by fine-tuning on small language models. Unfortunately, small language models suffer from anisotropy issue, where encoded text embeddings become difficult to differentiate in the latent space. Moreover, LLMs possess the ability to alter language styles with versatility, further complicating the classification task. To tackle these challenges, we propose Gated Mixture-of-Experts Fine-tuning (GMoEF) to detect LLM-generated text. GMoEF leverages parametric whitening to normalize text embeddings, thereby mitigating the anisotropy problem. Additionally, GMoEF employs the mixture-of-experts framework equipped with gating router to capture features of LLM-generated text from multiple perspectives. Our GMoEF achieved an impressive ranking of #8 out of 70 teams. The source code is available on https://gitlab.com/sigrs/gmoef.

pdf bib abs
LMEME at SemEval-2024 Task 4: Teacher Student Fusion - Integrating CLIP with LLMs for Enhanced Persuasion Detection
Shiyi Li | Yike Wang | Liang Yang | Shaowu Zhang | Hongfei Lin
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This paper describes our system used in the SemEval-2024 Task 4 Multilingual Detection of Persuasion Techniques in Memes. Our team proposes a detection system that employs a Teacher Student Fusion framework. Initially, a Large Language Model serves as the teacher, engaging in abductive reasoning on multimodal inputs to generate background knowledge on persuasion techniques, assisting in the training of a smaller downstream model. The student model adopts CLIP as an encoder for text and image features, and we incorporate an attention mechanism for modality alignment. Ultimately, our proposed system achieves a Macro-F1 score of 0.8103, ranking 1st out of 20 on the leaderboard of Subtask 2b in English. In Bulgarian, Macedonian and Arabic, our detection capabilities are ranked 1/15, 3/15 and 14/15.

Detecting persuasive communication is an important topic in Natural Language Processing (NLP), as it can be useful in identifying fake information on social media. We have developed a system to identify applied persuasion techniques in text fragments across four languages: English, Bulgarian, North Macedonian, and Arabic. Our system uses data augmentation methods and employs an ensemble strategy that combines the strengths of both RoBERTa and DeBERTa models. Due to limited resources, we concentrated solely on task 1, and our solution achieved the top ranking in the English track during the official assessments. We also analyse the impact of architectural decisions, data constructionand training strategies.

2023

pdf bib abs
Just Like a Human Would, Direct Access to Sarcasm Augmented with Potential Result and Reaction
Changrong Min | Ximing Li | Liang Yang | Zhilin Wang | Bo Xu | Hongfei Lin
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Sarcasm, as a form of irony conveying mockery and contempt, has been widespread in social media such as Twitter and Weibo, where the sarcastic text is commonly characterized as an incongruity between the surface positive and negative situation. Naturally, it has an urgent demand to automatically identify sarcasm from social media, so as to illustrate people’s real views toward specific targets. In this paper, we develop a novel sarcasm detection method, namely Sarcasm Detector with Augmentation of Potential Result and Reaction (SD-APRR). Inspired by the direct access view, we treat each sarcastic text as an incomplete version without latent content associated with implied negative situations, including the result and human reaction caused by its observable content. To fill the latent content, we estimate the potential result and human reaction for each given training sample by [xEffect] and [xReact] relations inferred by the pre-trained commonsense reasoning tool COMET, and integrate the sample with them as an augmented one. We can then employ those augmented samples to train the sarcasm detector, whose encoder is a graph neural network with a denoising module. We conduct extensive empirical experiments to evaluate the effectiveness of SD-APRR. The results demonstrate that SD-APRR can outperform strong baselines on benchmark datasets.

pdf bib abs
Facilitating Fine-grained Detection of Chinese Toxic Language: Hierarchical Taxonomy, Resources, and Benchmarks
Junyu Lu | Bo Xu | Xiaokun Zhang | Changrong Min | Liang Yang | Hongfei Lin
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The widespread dissemination of toxic online posts is increasingly damaging to society. However, research on detecting toxic language in Chinese has lagged significantly due to limited datasets. Existing datasets suffer from a lack of fine-grained annotations, such as the toxic type and expressions with indirect toxicity. These fine-grained annotations are crucial factors for accurately detecting the toxicity of posts involved with lexical knowledge, which has been a challenge for researchers. To tackle this problem, we facilitate the fine-grained detection of Chinese toxic language by building a new dataset with benchmark results. First, we devised Monitor Toxic Frame, a hierarchical taxonomy to analyze the toxic type and expressions. Then, we built a fine-grained dataset ToxiCN, including both direct and indirect toxic samples. ToxiCN is based on an insulting vocabulary containing implicit profanity. We further propose a benchmark model, Toxic Knowledge Enhancement (TKE), by incorporating lexical features to detect toxic language. We demonstrate the usability of ToxiCN and the effectiveness of TKE based on a systematic quantitative and qualitative analysis.

pdf bib abs
基于动态常识推理与多维语义特征的幽默识别(Humor Recognition based on Dynamically Commonsense Reasoning and Multi-Dimensional Semantic Features)
Tuerxun Tunike (吐妮可吐尔逊) | Hongfei Lin (林鸿飞) | Dongyu Zhang (张冬瑜) | Liang Yang (杨亮) | Changrong Min (闵昶荣)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“随着社交媒体的飞速发展,幽默识别任务在近年来受到研究者的广泛关注。该任务的目标是判断给定的文本是否表达幽默。现有的幽默识别方法主要是在幽默产生理论的支撑下,利用规则或者设计神经网络模型来提取多种幽默相关特征,比如不一致性特征、情感特征以及语音特征等等。这些方法一方面说明情感信息在建模幽默语义当中的重要地位,另一方面说明幽默语义的构建依赖多个维度的特征。然而,这些方法没有充分捕捉文本内部的情感特征,忽略了幽默文本中的隐式情感表达,影响幽默识别的准确性。为了解决这一问题,本文提出一种动态常识与多维语义特征驱动的幽默识别方法CMSOR。该方法首先利用外部常识信息从文本中动态推理出说话者的隐式情感表达,然后引入外部词典WordNet计算文本内部词级语义距离进而捕捉不一致性,同时计算文本的模糊性特征。最后,根据上述三个特征维度构建幽默语义,实现幽默识别。本文在三个公开数据集上进行实验,结果表明本文所提方法CMSOR相比于当前基准模型有明显提升。”

pdf bib abs
MultiCMET: A Novel Chinese Benchmark for Understanding Multimodal Metaphor
Dongyu Zhang | Jingwei Yu | Senyuan Jin | Liang Yang | Hongfei Lin
Findings of the Association for Computational Linguistics: EMNLP 2023

Metaphor is a pervasive aspect of human communication, and its presence in multimodal forms has become more prominent with the progress of mass media. However, there is limited research on multimodal metaphor resources beyond the English language. Furthermore, the existing work in natural language processing does not address the exploration of categorizing the source and target domains in metaphors. This omission is significant considering the extensive research conducted in the fields of cognitive linguistics, which emphasizes that a profound understanding of metaphor relies on recognizing the differences and similarities between domain categories. We, therefore, introduce MultiCMET, a multimodal Chinese metaphor dataset, consisting of 13,820 text-image pairs of advertisements with manual annotations of the occurrence of metaphors, domain categories, and sentiments metaphors convey. We also constructed a domain lexicon that encompasses categorizations of metaphorical source domains and target domains and propose a Cascading Domain Knowledge Integration (CDKI) benchmark to detect metaphors by introducing domain-specific lexical features. Experimental results demonstrate the effectiveness of CDKI. The dataset and code are publicly available.

pdf bib abs
Poetry Generation Combining Poetry Theme Labels Representations
Yingyu Yan | Dongzhen Wen | Liang Yang | Dongyu Zhang | Hongfei Lin
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

Ancient Chinese poetry is the earliest literary genre that took shape in Chinese literature and has a dissemination effect, showing China’s profound cultural heritage. At the same time, the generation of ancient poetry is an important task in the field of digital humanities, which is of great significance to the inheritance of national culture and the education of ancient poetry. The current work in the field of poetry generation is mainly aimed at improving the fluency and structural accuracy of words and sentences, ignoring the theme unity of poetry generation results. In order to solve this problem, this paper proposes a graph neural network poetry theme representation model based on label embedding. On the basis of the network representation of poetry, the topic feature representation of poetry is constructed and learned from the granularity of words. Then, the features of the poetry theme representation model are combined with the autoregressive language model to construct a theme-oriented ancient Chinese poetry generation model TLPG (Poetry Generation with Theme Label). Through machine evaluation and evaluation by experts in related fields, the model proposed in this paper has significantly improved the topic consistency of poetry generation compared with existing work on the premise of ensuring the fluency and format accuracy of poetry.

2021

pdf bib abs
MultiMET: A Multimodal Dataset for Metaphor Understanding
Dongyu Zhang | Minghao Zhang | Heting Zhang | Liang Yang | Hongfei Lin
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Metaphor involves not only a linguistic phenomenon, but also a cognitive phenomenon structuring human thought, which makes understanding it challenging. As a means of cognition, metaphor is rendered by more than texts alone, and multimodal information in which vision/audio content is integrated with the text can play an important role in expressing and understanding metaphor. However, previous metaphor processing and understanding has focused on texts, partly due to the unavailability of large-scale datasets with ground truth labels of multimodal metaphor. In this paper, we introduce MultiMET, a novel multimodal metaphor dataset to facilitate understanding metaphorical information from multimodal text and image. It contains 10,437 text-image pairs from a range of sources with multimodal annotations of the occurrence of metaphors, domain relations, sentiments metaphors convey, and author intents. MultiMET opens the door to automatic metaphor understanding by investigating multimodal cues and their interplay. Moreover, we propose a range of strong baselines and show the importance of combining multimodal cues for metaphor understanding. MultiMET will be released publicly for research.

pdf bib abs
Hate Speech Detection Based on Sentiment Knowledge Sharing
Xianbing Zhou | Yang Yong | Xiaochao Fan | Ge Ren | Yunfeng Song | Yufeng Diao | Liang Yang | Hongfei Lin
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

The wanton spread of hate speech on the internet brings great harm to society and families. It is urgent to establish and improve automatic detection and active avoidance mechanisms for hate speech. While there exist methods for hate speech detection, they stereotype words and hence suffer from inherently biased training. In other words, getting more affective features from other affective resources will significantly affect the performance of hate speech detection. In this paper, we propose a hate speech detection framework based on sentiment knowledge sharing. While extracting the affective features of the target sentence itself, we make better use of the sentiment features from external resources, and finally fuse features from different feature extraction units to detect hate speech. Experimental results on two public datasets demonstrate the effectiveness of our model.

pdf bib abs
结合标签转移关系的多任务笑点识别方法(Multi-task punchlines recognition method combined with label transfer relationship)
Tongyue Zhang (张童越) | Shaowu Zhang (张绍武) | Bo Xu (徐博) | Liang Yang (杨亮) | Hongfei Lin (林鸿飞)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

幽默在人类交流中扮演着重要角色,并大量存在于情景喜剧中。笑点(punchline)是情景喜剧实现幽默效果的形式之一,在情景喜剧笑点识别任务中,每条句子的标签代表该句是否为笑点,但是以往的笑点识别工作通常只通过建模上下文语义关系识别笑点,对标签的利用并不充分。为了充分利用标签序列中的信息,本文提出了一种新的识别方法,即结合条件随机场的单词级-句子级多任务学习模型,该模型在两方面进行了改进,首先将标签序列中相邻两个标签之间的转移关系看作幽默理论中不一致性的一种体现,并使用条件随机场学习这种转移关系,其次由于学习相邻标签之间的转移关系以及上下文语义关系均能够学习到铺垫和笑点之间的不一致性,两者之间存在相关性,为了使模型通过利用这种相关性提高笑点识别的效果,该模型引入了多任务学习方法,使用多任务学习方法同时学习每条句子的句义、组成每条句子的所有字符的词义,单词级别的标签转移关系以及句子级别的标签转移关系。本文在CCL2020“小牛杯”幽默计算—情景喜剧笑点识别评测任务的英文数据集上进行实验,结果表明,本文提出的方法比目前最好的方法提高了3.2%,在情景喜剧幽默笑点识别任务上取得了最好的效果,并通过消融实验证明了上述两方面改进的有效性。

pdf bib abs
基于风格化嵌入的中文文本风格迁移(Chinese text style transfer based on stylized embedding)
Chenguang Wang (王晨光) | Hongfei Lin (林鸿飞) | Liang Yang (杨亮)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

对话风格能够反映对话者的属性,例如情感、性别和教育背景等。在对话系统中,通过理解用户的对话风格,能够更好地对用户进行建模。同样的,面对不同背景的用户,对话机器人也应该使用不同的语言风格与之交流。语言表达风格是文本的内在属性,然而现有的大多数文本风格迁移研究,集中在英文领域,在中文领域则研究较少。本文构建了三个可用于中文文本风格迁移研究的数据集,并将多种已有的文本风格迁移方法应用于该数据集。同时,本文提出了基于DeepStyle算法与Transformer的风格迁移模型,通过预训练可以获得不同风格的隐层向量表示。并基于Transformer构建生成端模型,在解码阶段,通过重建源文本的方式,保留生成文本的内容信息,并且引入对立风格的嵌入表示,使得模型能够生成不同风格的文本。实验结果表明,本文提出的模型在构建的中文数据集上均优于现有模型。

pdf bib abs
面向法律文本的实体关系联合抽取算法(Joint Entity and Relation Extraction for Legal Texts)
Wenhui Song (宋文辉) | Xiang Zhou (周翔) | Ping Yang (杨萍) | Yuanyuan Sun (孙媛媛) | Liang Yang (杨亮) | Hongfei Lin (林鸿飞)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

法律文本中包含的丰富信息可以通过结构化的实体关系三元组进行表示,便于法律知识的存储和查询。传统的流水线方法在自动抽取三元组时执行了大量冗余计算,造成了误差传播。而现有的联合学习方法无法适用于有大量重叠关系的法律文本,也并未关注语法结构信息对文本表示的增强,因此本文提出一种面向法律文本的实体关系联合抽取模型。该模型首先通过ON-LSTM注入语法信息,然后引入多头注意力机制分解重叠关系。相较于流水线和其他联合学习方法本文模型抽取效果最佳,在涉毒类法律文本数据集上抽取结果的F1值达到78.7%。

软件源代码的理解则是软件协同开发与维护的核心,而源代码中占半数以上的标识符的理解则在软件理解中起到重要作用,传统软件工程主要研究通过命名规范限制标识符的命名过程以构造更易理解和交流的标识符。本文则在梳理分析常见编程语言命名规范的基础上,提出一种全新的标识符可理解性评价标准。具体而言,本文首先总结梳理了常见主流编程语言中的命名规范并类比自然语言语素概念本文提出基于软件语素的标识符构成过程,即标识符的构成可被视为软件语素的生成、排列和连接过程。在此基础上,本文提出一种结合自然语料库的软件标识符规范性评价方法,用来衡量软件标识符是否易于理解。最后,本文通过源代码理解数据集和乇乩乴乨乵乢平台中开源项目对规范性指标进行了验证性实验,结果表明本文提出的规范性分数能够很好衡量软件项目的可理解性。

pdf bib abs
Label-Enhanced Hierarchical Contextualized Representation for Sequential Metaphor Identification
Shuqun Li | Liang Yang | Weidong He | Shiqi Zhang | Jingjie Zeng | Hongfei Lin
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Recent metaphor identification approaches mainly consider the contextual text features within a sentence or introduce external linguistic features to the model. But they usually ignore the extra information that the data can provide, such as the contextual metaphor information and broader discourse information. In this paper, we propose a model augmented with hierarchical contextualized representation to extract more information from both sentence-level and discourse-level. At the sentence level, we leverage the metaphor information of words that except the target word in the sentence to strengthen the reasoning ability of our model via a novel label-enhanced contextualized representation. At the discourse level, the position-aware global memory network is adopted to learn long-range dependency among the same words within a discourse. Finally, our model combines the representations obtained from these two parts. The experiment results on two tasks of the VUA dataset show that our model outperforms every other state-of-the-art method that also does not use any external knowledge except what the pre-trained language model contains.

pdf bib abs
Locality Preserving Sentence Encoding
Changrong Min | Yonghe Chu | Liang Yang | Bo Xu | Hongfei Lin
Findings of the Association for Computational Linguistics: EMNLP 2021

Although researches on word embeddings have made great progress in recent years, many tasks in natural language processing are on the sentence level. Thus, it is essential to learn sentence embeddings. Recently, Sentence BERT (SBERT) is proposed to learn embeddings on the sentence level, and it uses the inner product (or, cosine similarity) to compute semantic similarity between sentences. However, this measurement cannot well describe the semantic structures among sentences. The reason is that sentences may lie on a manifold in the ambient space rather than distribute in an Euclidean space. Thus, cosine similarity cannot approximate distances on the manifold. To tackle the severe problem, we propose a novel sentence embedding method called Sentence BERT with Locality Preserving (SBERT-LP), which discovers the sentence submanifold from a high-dimensional space and yields a compact sentence representation subspace by locally preserving geometric structures of sentences. We compare the SBERT-LP with several existing sentence embedding approaches from three perspectives: sentence similarity, sentence classification and sentence clustering. Experimental results and case studies demonstrate that our method encodes sentences better in the sense of semantic structures.

2020

pdf bib abs
基于多粒度语义交互理解网络的幽默等级识别(A Multi-Granularity Semantic Interaction Understanding Network for Humor Level Recognition)
Jinhui Zhang (张瑾晖) | Shaowu Zhang (张绍武) | Xiaochao Fan (樊小超) | Liang Yang (杨亮) | Hongfei Lin (林鸿飞)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

幽默在人们日常交流中发挥着重要作用。随着人工智能的快速发展,幽默等级识别成为自然语言处理领域的热点研究问题之一。已有的幽默等级识别研究往往将幽默文本看作一个整体,忽视了幽默文本内部的语义关系。本文将幽默等级识别视为自然语言推理任务,将幽默文本划分为“铺垫”和“笑点”两个部分,分别对其语义和语义关系进行建模,提出了一种多粒度语义交互理解网络,从单词和子句两个粒度捕获幽默文本中语义的关联和交互。本文在Reddit公开幽默数据集上进行了实验,相比之前最优结果,模型在语料上的准确率提升了1.3%。实验表明,引入幽默内部的语义关系信息可以提高模型幽默识别的性能,而本文提出的模型也可以很好地建模这种语义关系。

In our daily life, metaphor is a common way of expression. To understand the meaning of a metaphor, we should recognize the metaphor words which play important roles. In the metaphor detection task, we design a sequence labeling model based on ALBERT-LSTM-softmax. By applying this model, we carry out a lot of experiments and compare the experimental results with different processing methods, such as with different input sentences and tokens, or the methods with CRF and softmax. Then, some tricks are adopted to improve the experimental results. Finally, our model achieves a 0.707 F1-score for the all POS subtask and a 0.728 F1-score for the verb subtask on the TOEFL dataset.

2019

2018

Homographic puns have a long history in human writing, widely used in written and spoken literature, which usually occur in a certain syntactic or stylistic structure. How to recognize homographic puns is an important research. However, homographic pun recognition does not solve very well in existing work. In this work, we first use WordNet to understand and expand word embedding for settling the polysemy of homographic puns, and then propose a WordNet-Encoded Collocation-Attention network model (WECA) which combined with the context weights for recognizing the puns. Our experiments on the SemEval2017 Task7 and Pun of the Day demonstrate that the proposed model is able to distinguish between homographic pun and non-homographic pun texts. We show the effectiveness of the model to present the capability of choosing qualitatively informative words. The results show that our model achieves the state-of-the-art performance on homographic puns recognition.

pdf bib abs
Construction of a Chinese Corpus for the Analysis of the Emotionality of Metaphorical Expressions
Dongyu Zhang | Hongfei Lin | Liang Yang | Shaowu Zhang | Bo Xu
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Metaphors are frequently used to convey emotions. However, there is little research on the construction of metaphor corpora annotated with emotion for the analysis of emotionality of metaphorical expressions. Furthermore, most studies focus on English, and few in other languages, particularly Sino-Tibetan languages such as Chinese, for emotion analysis from metaphorical texts, although there are likely to be many differences in emotional expressions of metaphorical usages across different languages. We therefore construct a significant new corpus on metaphor, with 5,605 manually annotated sentences in Chinese. We present an annotation scheme that contains annotations of linguistic metaphors, emotional categories (joy, anger, sadness, fear, love, disgust and surprise), and intensity. The annotation agreement analyses for multiple annotators are described. We also use the corpus to explore and analyze the emotionality of metaphors. To the best of our knowledge, this is the first relatively large metaphor corpus with an annotation of emotions in Chinese.