2019
pdf
bib
abs
eventAI at SemEval-2019 Task 7: Rumor Detection on Social Media by Exploiting Content, User Credibility and Propagation Information
Quanzhi Li
|
Qiong Zhang
|
Luo Si
Proceedings of the 13th International Workshop on Semantic Evaluation
This paper describes our system for SemEval 2019 RumorEval: Determining rumor veracity and support for rumors (SemEval 2019 Task 7). This track has two tasks: Task A is to determine a user’s stance towards the source rumor, and Task B is to detect the veracity of the rumor: true, false or unverified. For stance classification, a neural network model with language features is utilized. For rumor verification, our approach exploits information from different dimensions: rumor content, source credibility, user credibility, user stance, event propagation path, etc. We use an ensemble approach in both tasks, which includes neural network models as well as the traditional classification algorithms. Our system is ranked 1st place in the rumor verification task by both the macro F1 measure and the RMSE measure.
pdf
bib
abs
Rumor Detection by Exploiting User Credibility Information, Attention and Multi-task Learning
Quanzhi Li
|
Qiong Zhang
|
Luo Si
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
In this study, we propose a new multi-task learning approach for rumor detection and stance classification tasks. This neural network model has a shared layer and two task specific layers. We incorporate the user credibility information into the rumor detection layer, and we also apply attention mechanism in the rumor detection process. The attended information include not only the hidden states in the rumor detection layer, but also the hidden states from the stance detection layer. The experiments on two datasets show that our proposed model outperforms the state-of-the-art rumor detection approaches.
pdf
bib
abs
Uncover Sexual Harassment Patterns from Personal Stories by Joint Key Element Extraction and Categorization
Yingchi Liu
|
Quanzhi Li
|
Marika Cifor
|
Xiaozhong Liu
|
Qiong Zhang
|
Luo Si
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
The number of personal stories about sexual harassment shared online has increased exponentially in recent years. This is in part inspired by the #MeToo and #TimesUp movements. Safecity is an online forum for people who experienced or witnessed sexual harassment to share their personal experiences. It has collected >10,000 stories so far. Sexual harassment occurred in a variety of situations, and categorization of the stories and extraction of their key elements will provide great help for the related parties to understand and address sexual harassment. In this study, we manually annotated those stories with labels in the dimensions of location, time, and harassers’ characteristics, and marked the key elements related to these dimensions. Furthermore, we applied natural language processing technologies with joint learning schemes to automatically categorize these stories in those dimensions and extract key elements at the same time. We also uncovered significant patterns from the categorized sexual harassment stories. We believe our annotated data set, proposed algorithms, and analysis will help people who have been harassed, authorities, researchers and other related parties in various ways, such as automatically filling reports, enlightening the public in order to prevent future harassment, and enabling more effective, faster action to be taken.
pdf
bib
abs
Rumor Detection on Social Media: Datasets, Methods and Opportunities
Quanzhi Li
|
Qiong Zhang
|
Luo Si
|
Yingchi Liu
Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda
Social media platforms have been used for information and news gathering, and they are very valuable in many applications. However, they also lead to the spreading of rumors and fake news. Many efforts have been taken to detect and debunk rumors on social media by analyzing their content and social context using machine learning techniques. This paper gives an overview of the recent studies in the rumor detection field. It provides a comprehensive list of datasets used for rumor detection, and reviews the important studies based on what types of information they exploit and the approaches they take. And more importantly, we also present several new directions for future research.
2018
pdf
bib
abs
NAI-SEA at SemEval-2018 Task 5: An Event Search System
Yingchi Liu
|
Quanzhi Li
|
Luo Si
Proceedings of the 12th International Workshop on Semantic Evaluation
In this paper, we describe Alibaba’s participating system in the semEval-2018 Task5: Counting Events and Participants in the Long Tail. We designed and implemented a pipeline system that consists of components to extract question properties and document features, document event category classifications, document retrieval and document clustering. To retrieve the majority of the relevant documents, we carefully designed our system to extract key information from each question and document pair. After retrieval, we perform further document clustering to count the number of events. The task contains 3 subtasks, on which we achieved F1 score of 78.33, 50.52, 63.59 , respectively, for document level retrieval. Our system ranks first in all the three subtasks on document level retrieval, and it also ranks first in incident-level evaluation by RSME measure in subtask 3.
2017
pdf
bib
abs
Learning Stock Market Sentiment Lexicon and Sentiment-Oriented Word Vector from StockTwits
Quanzhi Li
|
Sameena Shah
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)
Previous studies have shown that investor sentiment indicators can predict stock market change. A domain-specific sentiment lexicon and sentiment-oriented word embedding model would help the sentiment analysis in financial domain and stock market. In this paper, we present a new approach to learning stock market lexicon from StockTwits, a popular financial social network for investors to share ideas. It learns word polarity by predicting message sentiment, using a neural net-work. The sentiment-oriented word embeddings are learned from tens of millions of StockTwits posts, and this is the first study presenting sentiment-oriented word embeddings for stock market. The experiments of predicting investor sentiment show that our lexicon outperformed other lexicons built by the state-of-the-art methods, and the sentiment-oriented word vector was much better than the general word embeddings.
pdf
bib
abs
funSentiment at SemEval-2017 Task 4: Topic-Based Message Sentiment Classification by Exploiting Word Embeddings, Text Features and Target Contexts
Quanzhi Li
|
Armineh Nourbakhsh
|
Xiaomo Liu
|
Rui Fang
|
Sameena Shah
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
This paper describes the approach we used for SemEval-2017 Task 4: Sentiment Analysis in Twitter. Topic-based (target-dependent) sentiment analysis has become attractive and been used in some applications recently, but it is still a challenging research task. In our approach, we take the left and right context of a target into consideration when generating polarity classification features. We use two types of word embeddings in our classifiers: the general word embeddings learned from 200 million tweets, and sentiment-specific word embeddings learned from 10 million tweets using distance supervision. We also incorporate a text feature model in our algorithm. This model produces features based on text negation, tf.idf weighting scheme, and a Rocchio text classification method. We participated in four subtasks (B, C, D & E for English), all of which are about topic-based message polarity classification. Our team is ranked #6 in subtask B, #3 by MAEu and #9 by MAEm in subtask C, #3 using RAE and #6 using KLD in subtask D, and #3 in subtask E.
pdf
bib
abs
funSentiment at SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs Using Word Vectors Built from StockTwits and Twitter
Quanzhi Li
|
Sameena Shah
|
Armineh Nourbakhsh
|
Rui Fang
|
Xiaomo Liu
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
This paper describes the approach we used for SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs. We use three types of word embeddings in our algorithm: word embeddings learned from 200 million tweets, sentiment-specific word embeddings learned from 10 million tweets using distance supervision, and word embeddings learned from 20 million StockTwits messages. In our approach, we also take the left and right context of the target company into consideration when generating polarity prediction features. All the features generated from different word embeddings and contexts are integrated together to train our algorithm
2016
pdf
bib
Witness Identification in Twitter
Rui Fang
|
Armineh Nourbakhsh
|
Xiaomo Liu
|
Sameena Shah
|
Quanzhi Li
Proceedings of the Fourth International Workshop on Natural Language Processing for Social Media