Neural-based RST Parsing And Analysis In Persuasive Discourse
Jinfen Li | Lu Xiao
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)
Most of the existing studies of language use in social media content have focused on the surface-level linguistic features (e.g., function words and punctuation marks) and the semantic level aspects (e.g., the topics, sentiment, and emotions) of the comments. The writer’s strategies of constructing and connecting text segments have not been widely explored even though this knowledge is expected to shed light on how people reason in online environments. Contributing to this analysis direction for social media studies, we build an openly accessible neural RST parsing system that analyzes discourse relations in an online comment. Our experiments demonstrate that this system achieves comparable performance among all the neural RST parsing systems. To demonstrate the use of this tool in social media analysis, we apply it to identify the discourse relations in persuasive and non-persuasive comments and examine the relationships among the binary discourse tree depth, discourse relations, and the perceived persuasiveness of online comments. Our work demonstrates the potential of analyzing discourse structures of online comments with our system and the implications of these structures for understanding online communications.
syrapropa at SemEval-2020 Task 11: BERT-based Models Design for Propagandistic Technique and Span Detection
Jinfen Li | Lu Xiao
Proceedings of the Fourteenth Workshop on Semantic Evaluation
This paper describes the BERT-based models proposed for two subtasks in SemEval-2020 Task 11: Detection of Propaganda Techniques in News Articles. We first build the model for Span Identification (SI) based on SpanBERT, and facilitate the detection by a deeper model and a sentence-level representation. We then develop a hybrid model for the Technique Classification (TC). The hybrid model is composed of three submodels including two BERT models with different training methods, and a feature-based Logistic Regression model. We endeavor to deal with imbalanced dataset by adjusting cost function. We are in the seventh place in SI subtask (0.4711 of F1-measure), and in the third place in TC subtask (0.6783 of F1-measure) on the development set.
Tree Representations in Transition System for RST Parsing
Jinfen Li | Lu Xiao
Proceedings of the 28th International Conference on Computational Linguistics
The transition-based systems in the past studies propose a series of actions, to build a right-heavy binarized tree for the RST parsing. However, the nodes of the binary-nuclear relations (e.g., Contrast) have the same nuclear type with those of the multi-nuclear relations (e.g., Joint) in the binary tree structure. In addition, the reduce action only construct binary trees instead of multi-branch trees, which is the original RST tree structure. In our paper, we design a new nuclear type for the multi-nuclear relations, and a new action to construct a multi-branch tree. We enrich the feature set by extracting additional refined dependency feature of texts from the Bi-Affine model. We also compare the performance of two approaches for RST parsing in the transition-based system: a joint action of reduce-shift and nuclear type (i.e., Reduce-SN) vs a separate one that applies Reduce action first and then assigns nuclear type. We find that the new devised nuclear type and action are more capable of capturing the multi-nuclear relation and the joint action is more suitable than the separate one. Our multi-branch tree structure obtains the state-of-the-art performance for all the 18 coarse relations.
Detection of Propaganda Using Logistic Regression
Jinfen Li | Zhihao Ye | Lu Xiao
Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda
Various propaganda techniques are used to manipulate peoples perspectives in order to foster a predetermined agenda such as by the use of logical fallacies or appealing to the emotions of the audience. In this paper, we develop a Logistic Regression-based tool that automatically classifies whether a sentence is propagandistic or not. We utilize features like TF-IDF, BERT vector, sentence length, readability grade level, emotion feature, LIWC feature and emphatic content feature to help us differentiate these two categories. The linguistic and semantic features combination results in 66.16% of F1 score, which outperforms the baseline hugely.