2023
pdf
bib
abs
Just Like a Human Would, Direct Access to Sarcasm Augmented with Potential Result and Reaction
Changrong Min
|
Ximing Li
|
Liang Yang
|
Zhilin Wang
|
Bo Xu
|
Hongfei Lin
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Sarcasm, as a form of irony conveying mockery and contempt, has been widespread in social media such as Twitter and Weibo, where the sarcastic text is commonly characterized as an incongruity between the surface positive and negative situation. Naturally, it has an urgent demand to automatically identify sarcasm from social media, so as to illustrate people’s real views toward specific targets. In this paper, we develop a novel sarcasm detection method, namely Sarcasm Detector with Augmentation of Potential Result and Reaction (SD-APRR). Inspired by the direct access view, we treat each sarcastic text as an incomplete version without latent content associated with implied negative situations, including the result and human reaction caused by its observable content. To fill the latent content, we estimate the potential result and human reaction for each given training sample by [xEffect] and [xReact] relations inferred by the pre-trained commonsense reasoning tool COMET, and integrate the sample with them as an augmented one. We can then employ those augmented samples to train the sarcasm detector, whose encoder is a graph neural network with a denoising module. We conduct extensive empirical experiments to evaluate the effectiveness of SD-APRR. The results demonstrate that SD-APRR can outperform strong baselines on benchmark datasets.
pdf
bib
abs
Facilitating Fine-grained Detection of Chinese Toxic Language: Hierarchical Taxonomy, Resources, and Benchmarks
Junyu Lu
|
Bo Xu
|
Xiaokun Zhang
|
Changrong Min
|
Liang Yang
|
Hongfei Lin
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The widespread dissemination of toxic online posts is increasingly damaging to society. However, research on detecting toxic language in Chinese has lagged significantly due to limited datasets. Existing datasets suffer from a lack of fine-grained annotations, such as the toxic type and expressions with indirect toxicity. These fine-grained annotations are crucial factors for accurately detecting the toxicity of posts involved with lexical knowledge, which has been a challenge for researchers. To tackle this problem, we facilitate the fine-grained detection of Chinese toxic language by building a new dataset with benchmark results. First, we devised Monitor Toxic Frame, a hierarchical taxonomy to analyze the toxic type and expressions. Then, we built a fine-grained dataset ToxiCN, including both direct and indirect toxic samples. ToxiCN is based on an insulting vocabulary containing implicit profanity. We further propose a benchmark model, Toxic Knowledge Enhancement (TKE), by incorporating lexical features to detect toxic language. We demonstrate the usability of ToxiCN and the effectiveness of TKE based on a systematic quantitative and qualitative analysis.
pdf
bib
abs
基于动态常识推理与多维语义特征的幽默识别(Humor Recognition based on Dynamically Commonsense Reasoning and Multi-Dimensional Semantic Features)
Tuerxun Tunike
|
Hongfei Lin
|
Dongyu Zhang
|
Liang Yang
|
Changrong Min
|
吐尔逊 吐妮可
|
鸿飞 林
|
冬瑜 张
|
亮 杨
|
昶荣 闵
Proceedings of the 22nd Chinese National Conference on Computational Linguistics
“随着社交媒体的飞速发展,幽默识别任务在近年来受到研究者的广泛关注。该任务的目标是判断给定的文本是否表达幽默。现有的幽默识别方法主要是在幽默产生理论的支撑下,利用规则或者设计神经网络模型来提取多种幽默相关特征,比如不一致性特征、情感特征以及语音特征等等。这些方法一方面说明情感信息在建模幽默语义当中的重要地位,另一方面说明幽默语义的构建依赖多个维度的特征。然而,这些方法没有充分捕捉文本内部的情感特征,忽略了幽默文本中的隐式情感表达,影响幽默识别的准确性。为了解决这一问题,本文提出一种动态常识与多维语义特征驱动的幽默识别方法CMSOR。该方法首先利用外部常识信息从文本中动态推理出说话者的隐式情感表达,然后引入外部词典WordNet计算文本内部词级语义距离进而捕捉不一致性,同时计算文本的模糊性特征。最后,根据上述三个特征维度构建幽默语义,实现幽默识别。本文在三个公开数据集上进行实验,结果表明本文所提方法CMSOR相比于当前基准模型有明显提升。”
2021
pdf
bib
abs
Locality Preserving Sentence Encoding
Changrong Min
|
Yonghe Chu
|
Liang Yang
|
Bo Xu
|
Hongfei Lin
Findings of the Association for Computational Linguistics: EMNLP 2021
Although researches on word embeddings have made great progress in recent years, many tasks in natural language processing are on the sentence level. Thus, it is essential to learn sentence embeddings. Recently, Sentence BERT (SBERT) is proposed to learn embeddings on the sentence level, and it uses the inner product (or, cosine similarity) to compute semantic similarity between sentences. However, this measurement cannot well describe the semantic structures among sentences. The reason is that sentences may lie on a manifold in the ambient space rather than distribute in an Euclidean space. Thus, cosine similarity cannot approximate distances on the manifold. To tackle the severe problem, we propose a novel sentence embedding method called Sentence BERT with Locality Preserving (SBERT-LP), which discovers the sentence submanifold from a high-dimensional space and yields a compact sentence representation subspace by locally preserving geometric structures of sentences. We compare the SBERT-LP with several existing sentence embedding approaches from three perspectives: sentence similarity, sentence classification and sentence clustering. Experimental results and case studies demonstrate that our method encodes sentences better in the sense of semantic structures.