Lu Xiao

2022

Identifying Tension in Holocaust Survivors’ Interview: Code-switching/Code-mixing as Cues
Xinyuan Xia | Lu Xiao | Kun Yang | Yueyue Wang
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this study, we thrive on finding out how code-switching and code-mixing (CS/CM) as a linguistic phenomenon could be a sign of tension in Holocaust survivors’ interviews. We first created an interview corpus (a total of 39 interviews) that contains manually annotated CS/CM codes (a total of 802 quotations). We then compared our annotations with the tension places in the corpus. The tensions are identified by a computational tool. We found that most of our annotations were captured in the tension places, and it showed a relatively outstanding performance. The finding implies that CS/CM can be appropriate cues for detecting tension in this communication context. Our CS/CM annotated interview corpus is openly accessible. Aside from annotating and examining CS/CM occurrences, we annotated silence situations in this open corpus. Silence is shown to be an indicator of tension in interpersonal communications. Making this corpus openly accessible, we call for more research endeavors on tension detection.

pdf bib

2021

pdf bib abs

Neural-based RST Parsing And Analysis In Persuasive Discourse
Jinfen Li | Lu Xiao
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)

Most of the existing studies of language use in social media content have focused on the surface-level linguistic features (e.g., function words and punctuation marks) and the semantic level aspects (e.g., the topics, sentiment, and emotions) of the comments. The writer’s strategies of constructing and connecting text segments have not been widely explored even though this knowledge is expected to shed light on how people reason in online environments. Contributing to this analysis direction for social media studies, we build an openly accessible neural RST parsing system that analyzes discourse relations in an online comment. Our experiments demonstrate that this system achieves comparable performance among all the neural RST parsing systems. To demonstrate the use of this tool in social media analysis, we apply it to identify the discourse relations in persuasive and non-persuasive comments and examine the relationships among the binary discourse tree depth, discourse relations, and the perceived persuasiveness of online comments. Our work demonstrates the potential of analyzing discourse structures of online comments with our system and the implications of these structures for understanding online communications.

2020

pdf bib abs

Tree Representations in Transition System for RST Parsing
Jinfen Li | Lu Xiao
Proceedings of the 28th International Conference on Computational Linguistics

The transition-based systems in the past studies propose a series of actions, to build a right-heavy binarized tree for the RST parsing. However, the nodes of the binary-nuclear relations (e.g., Contrast) have the same nuclear type with those of the multi-nuclear relations (e.g., Joint) in the binary tree structure. In addition, the reduce action only construct binary trees instead of multi-branch trees, which is the original RST tree structure. In our paper, we design a new nuclear type for the multi-nuclear relations, and a new action to construct a multi-branch tree. We enrich the feature set by extracting additional refined dependency feature of texts from the Bi-Affine model. We also compare the performance of two approaches for RST parsing in the transition-based system: a joint action of reduce-shift and nuclear type (i.e., Reduce-SN) vs a separate one that applies Reduce action first and then assigns nuclear type. We find that the new devised nuclear type and action are more capable of capturing the multi-nuclear relation and the joint action is more suitable than the separate one. Our multi-branch tree structure obtains the state-of-the-art performance for all the 18 coarse relations.

pdf bib abs

A Lexicon-Based Approach for Detecting Hedges in Informal Text
Jumayel Islam | Lu Xiao | Robert E. Mercer
Proceedings of the Twelfth Language Resources and Evaluation Conference

Hedging is a commonly used strategy in conversational management to show the speaker’s lack of commitment to what they communicate, which may signal problems between the speakers. Our project is interested in examining the presence of hedging words and phrases in identifying the tension between an interviewer and interviewee during a survivor interview. While there have been studies on hedging detection in the natural language processing literature, all existing work has focused on structured texts and formal communications. Our project thus investigated a corpus of eight unstructured conversational interviews about the Rwanda Genocide and identified hedging patterns in the interviewees’ responses. Our work produced three manually constructed lists of hedge words, booster words, and hedging phrases. Leveraging these lexicons, we developed a rule-based algorithm that detects sentence-level hedges in informal conversations such as survivor interviews. Our work also produced a dataset of 3000 sentences having the categories Hedge and Non-hedge annotated by three researchers. With experiments on this annotated dataset, we verify the efficacy of our proposed algorithm. Our work contributes to the further development of tools that identify hedges from informal conversations and discussions.

pdf bib abs

TV-AfD: An Imperative-Annotated Corpus from The Big Bang Theory and Wikipedia’s Articles for Deletion Discussions
Yimin Xiao | Zong-Ying Slaton | Lu Xiao
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this study, we created an imperative corpus with speech conversations from dialogues in The Big Bang Theory and with the written comments in Wikipedia’s Articles for Deletion discussions. For the TV show data, 59 episodes containing 25,076 statements are used. We manually annotated imperatives based on the annotation guideline adapted from Condoravdi and Lauer’s study (2012) and used the retrieved data to assess the performance of syntax-based classification rules. For the Wikipedia AfD comments data, we first developed and leveraged a syntax-based classifier to extract 10,624 statements that may be imperative, and we manually examined the statements and then identified true positives. With this corpus, we also examined the performance of the rule-based imperative detection tool. Our result shows different outcomes for speech (dialogue) and written data. The rule-based classification performs better in the written data in precision (0.80) compared to the speech data (0.44). Also, the rule-based classification has a low-performance overall for speech data with the precision of 0.44, recall of 0.41, and f-1 measure of 0.42. This finding implies the syntax-based model may need to be adjusted for a speech dataset because imperatives in oral communication have greater syntactic varieties and are highly context-dependent.

pdf bib abs

Effects of Anonymity on Comment Persuasiveness in Wikipedia Articles for Deletion Discussions
Yimin Xiao | Lu Xiao
Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science

It has been shown that anonymity affects various aspects of online communications such as message credibility, the trust among communicators, and the participants’ accountability and reputation. Anonymity influences social interactions in online communities in these many ways, which can lead to influences on opinion change and the persuasiveness of a message. Prior studies also suggest that the effect of anonymity can vary in different online communication contexts and online communities. In this study, we focus on Wikipedia Articles for Deletion (AfD) discussions as an example of online collaborative communities to study the relationship between anonymity and persuasiveness in this context. We find that in Wikipedia AfD discussions, more identifiable users tend to be more persuasive. The higher persuasiveness can be related to multiple aspects, including linguistic features of the comments, the user’s motivation to participate, persuasive skills the user learns over time, and the user’s identity and credibility established in the community through participation.

pdf bib abs

syrapropa at SemEval-2020 Task 11: BERT-based Models Design for Propagandistic Technique and Span Detection
Jinfen Li | Lu Xiao
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes the BERT-based models proposed for two subtasks in SemEval-2020 Task 11: Detection of Propaganda Techniques in News Articles. We first build the model for Span Identification (SI) based on SpanBERT, and facilitate the detection by a deeper model and a sentence-level representation. We then develop a hybrid model for the Technique Classification (TC). The hybrid model is composed of three submodels including two BERT models with different training methods, and a feature-based Logistic Regression model. We endeavor to deal with imbalanced dataset by adjusting cost function. We are in the seventh place in SI subtask (0.4711 of F1-measure), and in the third place in TC subtask (0.6783 of F1-measure) on the development set.

2019

pdf bib abs

Detection of Propaganda Using Logistic Regression
Jinfen Li | Zhihao Ye | Lu Xiao
Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda

Various propaganda techniques are used to manipulate peoples perspectives in order to foster a predetermined agenda such as by the use of logical fallacies or appealing to the emotions of the audience. In this paper, we develop a Logistic Regression-based tool that automatically classifies whether a sentence is propagandistic or not. We utilize features like TF-IDF, BERT vector, sentence length, readability grade level, emotion feature, LIWC feature and emphatic content feature to help us differentiate these two categories. The linguistic and semantic features combination results in 66.16% of F1 score, which outperforms the baseline hugely.

pdf bib abs

Multi-Channel Convolutional Neural Network for Twitter Emotion and Sentiment Recognition
Jumayel Islam | Robert E. Mercer | Lu Xiao
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

The advent of micro-blogging sites has paved the way for researchers to collect and analyze huge volumes of data in recent years. Twitter, being one of the leading social networking sites worldwide, provides a great opportunity to its users for expressing their states of mind via short messages which are called tweets. The urgency of identifying emotions and sentiments conveyed through tweets has led to several research works. It provides a great way to understand human psychology and impose a challenge to researchers to analyze their content easily. In this paper, we propose a novel use of a multi-channel convolutional neural architecture which can effectively use different emotion and sentiment indicators such as hashtags, emoticons and emojis that are present in the tweets and improve the performance of emotion and sentiment identification. We also investigate the incorporation of different lexical features in the neural network model and its effect on the emotion and sentiment identification task. We analyze our model on some standard datasets and compare its effectiveness with existing techniques.

Lu Xiao

2022

2021

2020

2019

2015

2014

Co-authors

Venues