Yongfeng Huang


2021

pdf bib
DA-Transformer: Distance-aware Transformer
Chuhan Wu | Fangzhao Wu | Yongfeng Huang
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Transformer has achieved great success in the NLP field by composing various advanced models like BERT and GPT. However, Transformer and its existing variants may not be optimal in capturing token distances because the position or distance embeddings used by these methods usually cannot keep the precise information of real distances, which may not be beneficial for modeling the orders and relations of contexts. In this paper, we propose DA-Transformer, which is a distance-aware Transformer that can exploit the real distance. We propose to incorporate the real distances between tokens to re-scale the raw self-attention weights, which are computed by the relevance between attention query and key. Concretely, in different self-attention heads the relative distance between each pair of tokens is weighted by different learnable parameters, which control the different preferences on long- or short-term information of these heads. Since the raw weighted real distances may not be optimal for adjusting self-attention weights, we propose a learnable sigmoid function to map them into re-scaled coefficients that have proper ranges. We first clip the raw self-attention weights via the ReLU function to keep non-negativity and introduce sparsity, and then multiply them with the re-scaled coefficients to encode real distance information into self-attention. Extensive experiments on five benchmark datasets show that DA-Transformer can effectively improve the performance of many tasks and outperform the vanilla Transformer and its several variants.

pdf bib
Jointly Extracting Explicit and Implicit Relational Triples with Reasoning Pattern Enhanced Binary Pointer Network
Yubo Chen | Yunqi Zhang | Changran Hu | Yongfeng Huang
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Relational triple extraction is a crucial task for knowledge graph construction. Existing methods mainly focused on explicit relational triples that are directly expressed, but usually suffer from ignoring implicit triples that lack explicit expressions. This will lead to serious incompleteness of the constructed knowledge graphs. Fortunately, other triples in the sentence provide supplementary information for discovering entity pairs that may have implicit relations. Also, the relation types between the implicitly connected entity pairs can be identified with relational reasoning patterns in the real world. In this paper, we propose a unified framework to jointly extract explicit and implicit relational triples. To explore entity pairs that may be implicitly connected by relations, we propose a binary pointer network to extract overlapping relational triples relevant to each word sequentially and retain the information of previously extracted triples in an external memory. To infer the relation types of implicit relational triples, we propose to introduce real-world relational reasoning patterns in our model and capture these patterns with a relation network. We conduct experiments on several benchmark datasets, and the results prove the validity of our method.

pdf bib
HieRec: Hierarchical User Interest Modeling for Personalized News Recommendation
Tao Qi | Fangzhao Wu | Chuhan Wu | Peiru Yang | Yang Yu | Xing Xie | Yongfeng Huang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

User interest modeling is critical for personalized news recommendation. Existing news recommendation methods usually learn a single user embedding for each user from their previous behaviors to represent their overall interest. However, user interest is usually diverse and multi-grained, which is difficult to be accurately modeled by a single user embedding. In this paper, we propose a news recommendation method with hierarchical user interest modeling, named HieRec. Instead of a single user embedding, in our method each user is represented in a hierarchical interest tree to better capture their diverse and multi-grained interest in news. We use a three-level hierarchy to represent 1) overall user interest; 2) user interest in coarse-grained topics like sports; and 3) user interest in fine-grained topics like football. Moreover, we propose a hierarchical user interest matching framework to match candidate news with different levels of user interest for more accurate user interest targeting. Extensive experiments on two real-world datasets validate our method can effectively improve the performance of user modeling for personalized news recommendation.

pdf bib
PP-Rec: News Recommendation with Personalized User Interest and Time-aware News Popularity
Tao Qi | Fangzhao Wu | Chuhan Wu | Yongfeng Huang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Personalized news recommendation methods are widely used in online news services. These methods usually recommend news based on the matching between news content and user interest inferred from historical behaviors. However, these methods usually have difficulties in making accurate recommendations to cold-start users, and tend to recommend similar news with those users have read. In general, popular news usually contain important information and can attract users with different interests. Besides, they are usually diverse in content and topic. Thus, in this paper we propose to incorporate news popularity information to alleviate the cold-start and diversity problems for personalized news recommendation. In our method, the ranking score for recommending a candidate news to a target user is the combination of a personalized matching score and a news popularity score. The former is used to capture the personalized user interest in news. The latter is used to measure time-aware popularity of candidate news, which is predicted based on news content, recency, and real-time CTR using a unified framework. Besides, we propose a popularity-aware user encoder to eliminate the popularity bias in user behaviors for accurate interest modeling. Experiments on two real-world datasets show our method can effectively improve the accuracy and diversity for news recommendation.

pdf bib
Hi-Transformer: Hierarchical Interactive Transformer for Efficient and Effective Long Document Modeling
Chuhan Wu | Fangzhao Wu | Tao Qi | Yongfeng Huang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Transformer is important for text modeling. However, it has difficulty in handling long documents due to the quadratic complexity with input text length. In order to handle this problem, we propose a hierarchical interactive Transformer (Hi-Transformer) for efficient and effective long document modeling. Hi-Transformer models documents in a hierarchical way, i.e., first learns sentence representations and then learns document representations. It can effectively reduce the complexity and meanwhile capture global document context in the modeling of each sentence. More specifically, we first use a sentence Transformer to learn the representations of each sentence. Then we use a document Transformer to model the global document context from these sentence representations. Next, we use another sentence Transformer to enhance sentence modeling using the global document context. Finally, we use hierarchical pooling method to obtain document embedding. Extensive experiments on three benchmark datasets validate the efficiency and effectiveness of Hi-Transformer in long document modeling.

pdf bib
Provably Secure Generative Linguistic Steganography
Siyu Zhang | Zhongliang Yang | Jinshuai Yang | Yongfeng Huang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers
Chuhan Wu | Fangzhao Wu | Yongfeng Huang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib
Privacy-Preserving News Recommendation Model Learning
Tao Qi | Fangzhao Wu | Chuhan Wu | Yongfeng Huang | Xing Xie
Findings of the Association for Computational Linguistics: EMNLP 2020

News recommendation aims to display news articles to users based on their personal interest. Existing news recommendation methods rely on centralized storage of user behavior data for model training, which may lead to privacy concerns and risks due to the privacy-sensitive nature of user behaviors. In this paper, we propose a privacy-preserving method for news recommendation model training based on federated learning, where the user behavior data is locally stored on user devices. Our method can leverage the useful information in the behaviors of massive number users to train accurate news recommendation models and meanwhile remove the need of centralized storage of them. More specifically, on each user device we keep a local copy of the news recommendation model, and compute gradients of the local model based on the user behaviors in this device. The local gradients from a group of randomly selected users are uploaded to server, which are further aggregated to update the global model in the server. Since the model gradients may contain some implicit private information, we apply local differential privacy (LDP) to them before uploading for better privacy protection. The updated global model is then distributed to each user device for local model update. We repeat this process for multiple rounds. Extensive experiments on a real-world dataset show the effectiveness of our method in news recommendation model training with privacy protection.

pdf bib
PTUM: Pre-training User Model from Unlabeled User Behaviors via Self-supervision
Chuhan Wu | Fangzhao Wu | Tao Qi | Jianxun Lian | Yongfeng Huang | Xing Xie
Findings of the Association for Computational Linguistics: EMNLP 2020

User modeling is critical for many personalized web services. Many existing methods model users based on their behaviors and the labeled data of target tasks. However, these methods cannot exploit useful information in unlabeled user behavior data, and their performance may be not optimal when labeled data is scarce. Motivated by pre-trained language models which are pre-trained on large-scale unlabeled corpus to empower many downstream tasks, in this paper we propose to pre-train user models from large-scale unlabeled user behaviors data. We propose two self-supervision tasks for user model pre-training. The first one is masked behavior prediction, which can model the relatedness between historical behaviors. The second one is next K behavior prediction, which can model the relatedness between past and future behaviors. The pre-trained user models are finetuned in downstream tasks to learn task-specific user representations. Experimental results on two real-world datasets validate the effectiveness of our proposed user model pre-training method.

pdf bib
Attentive Pooling with Learnable Norms for Text Representation
Chuhan Wu | Fangzhao Wu | Tao Qi | Xiaohui Cui | Yongfeng Huang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Pooling is an important technique for learning text representations in many neural NLP models. In conventional pooling methods such as average, max and attentive pooling, text representations are weighted summations of the L1 or L∞ norm of input features. However, their pooling norms are always fixed and may not be optimal for learning accurate text representations in different tasks. In addition, in many popular pooling methods such as max and attentive pooling some features may be over-emphasized, while other useful ones are not fully exploited. In this paper, we propose an Attentive Pooling with Learnable Norms (APLN) approach for text representation. Different from existing pooling methods that use a fixed pooling norm, we propose to learn the norm in an end-to-end manner to automatically find the optimal ones for text representation in different tasks. In addition, we propose two methods to ensure the numerical stability of the model training. The first one is scale limiting, which re-scales the input to ensure non-negativity and alleviate the risk of exponential explosion. The second one is re-formulation, which decomposes the exponent operation to avoid computing the real-valued powers of the input and further accelerate the pooling operation. Experimental results on four benchmark datasets show that our approach can effectively improve the performance of attentive pooling.

pdf bib
SentiRec: Sentiment Diversity-aware Neural News Recommendation
Chuhan Wu | Fangzhao Wu | Tao Qi | Yongfeng Huang
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

Personalized news recommendation is important for online news services. Many news recommendation methods recommend news based on their relevance to users’ historical browsed news, and the recommended news usually have similar sentiment with browsed news. However, if browsed news is dominated by certain kinds of sentiment, the model may intensively recommend news with the same sentiment orientation, making it difficult for users to receive diverse opinions and news events. In this paper, we propose a sentiment diversity-aware neural news recommendation approach, which can recommend news with more diverse sentiment. In our approach, we propose a sentiment-aware news encoder, which is jointly trained with an auxiliary sentiment prediction task, to learn sentiment-aware news representations. We learn user representations from browsed news representations, and compute click scores based on user and candidate news representations. In addition, we propose a sentiment diversity regularization method to penalize the model by combining the overall sentiment orientation of browsed news as well as the click and sentiment scores of candidate news. Extensive experiments on real-world dataset show that our approach can effectively improve the sentiment diversity in news recommendation without performance sacrifice.

pdf bib
Named Entity Recognition in Multi-level Contexts
Yubo Chen | Chuhan Wu | Tao Qi | Zhigang Yuan | Yongfeng Huang
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

Named entity recognition is a critical task in the natural language processing field. Most existing methods for this task can only exploit contextual information within a sentence. However, their performance on recognizing entities in limited or ambiguous sentence-level contexts is usually unsatisfactory. Fortunately, other sentences in the same document can provide supplementary document-level contexts to help recognize these entities. In addition, words themselves contain word-level contextual information since they usually have different preferences of entity type and relative position from named entities. In this paper, we propose a unified framework to incorporate multi-level contexts for named entity recognition. We use TagLM as our basic model to capture sentence-level contexts. To incorporate document-level contexts, we propose to capture interactions between sentences via a multi-head self attention network. To mine word-level contexts, we propose an auxiliary task to predict the type of each word to capture its type preference. We jointly train our model in entity recognition and the auxiliary classification task via multi-task learning. The experimental results on several benchmark datasets validate the effectiveness of our method.

pdf bib
Named Entity Recognition with Context-Aware Dictionary Knowledge
Chuhan Wu | Fangzhao Wu | Tao Qi | Yongfeng Huang
Proceedings of the 19th Chinese National Conference on Computational Linguistics

Named entity recognition (NER) is an important task in the natural language processing field. Existing NER methods heavily rely on labeled data for model training, and their performance on rare entities is usually unsatisfactory. Entity dictionaries can cover many entities including both popular ones and rare ones, and are useful for NER. However, many entity names are context-dependent and it is not optimal to directly apply dictionaries without considering the context. In this paper, we propose a neural NER approach which can exploit dictionary knowledge with contextual information. We propose to learn context-aware dictionary knowledge by modeling the interactions between the entities in dictionaries and their contexts via context-dictionary attention. In addition, we propose an auxiliary term classification task to predict the types of the matched entity names, and jointly train it with the NER model to fuse both contexts and dictionary knowledge into NER. Extensive experiments on the CoNLL-2003 benchmark dataset validate the effectiveness of our approach in exploiting entity dictionaries to improve the performance of various NER models.

pdf bib
Clickbait Detection with Style-aware Title Modeling and Co-attention
Chuhan Wu | Fangzhao Wu | Tao Qi | Yongfeng Huang
Proceedings of the 19th Chinese National Conference on Computational Linguistics

Clickbait is a form of web content designed to attract attention and entice users to click on specific hyperlinks. The detection of clickbaits is an important task for online platforms to improve the quality of web content and the satisfaction of users. Clickbait detection is typically formed as a binary classification task based on the title and body of a webpage, and existing methods are mainly based on the content of title and the relevance between title and body. However, these methods ignore the stylistic patterns of titles, which can provide important clues on identifying clickbaits. In addition, they do not consider the interactions between the contexts within title and body, which are very important for measuring their relevance for clickbait detection. In this paper, we propose a clickbait detection approach with style-aware title modeling and co-attention. Specifically, we use Transformers to learn content representations of title and body, and respectively compute two content-based clickbait scores for title and body based on their representations. In addition, we propose to use a character-level Transformer to learn a style-aware title representation by capturing the stylistic patterns of title, and we compute a title stylistic score based on this representation. Besides, we propose to use a co-attention network to model the relatedness between the contexts within title and body, and further enhance their representations by encoding the interaction information. We compute a title-body matching score based on the representations of title and body enhanced by their interactions. The final clickbait score is predicted by a weighted summation of the aforementioned four kinds of scores. Extensive experiments on two benchmark datasets show that our approach can effectively improve the performance of clickbait detection and consistently outperform many baseline methods.

2019

pdf bib
Neural News Recommendation with Heterogeneous User Behavior
Chuhan Wu | Fangzhao Wu | Mingxiao An | Tao Qi | Jianqiang Huang | Yongfeng Huang | Xing Xie
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

News recommendation is important for online news platforms to help users find interested news and alleviate information overload. Existing news recommendation methods usually rely on the news click history to model user interest. However, these methods may suffer from the data sparsity problem, since the news click behaviors of many users in online news platforms are usually very limited. Fortunately, some other kinds of user behaviors such as webpage browsing and search queries can also provide useful clues of users’ news reading interest. In this paper, we propose a neural news recommendation approach which can exploit heterogeneous user behaviors. Our approach contains two major modules, i.e., news representation and user representation. In the news representation module, we learn representations of news from their titles via CNN networks, and apply attention networks to select important words. In the user representation module, we propose an attentive multi-view learning framework to learn unified representations of users from their heterogeneous behaviors such as search queries, clicked news and browsed webpages. In addition, we use word- and record-level attentions to select informative words and behavior records. Experiments on a real-world dataset validate the effectiveness of our approach.

pdf bib
Reviews Meet Graphs: Enhancing User and Item Representations for Recommendation with Hierarchical Attentive Graph Neural Network
Chuhan Wu | Fangzhao Wu | Tao Qi | Suyu Ge | Yongfeng Huang | Xing Xie
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

User and item representation learning is critical for recommendation. Many of existing recommendation methods learn representations of users and items based on their ratings and reviews. However, the user-user and item-item relatedness are usually not considered in these methods, which may be insufficient. In this paper, we propose a neural recommendation approach which can utilize useful information from both review content and user-item graphs. Since reviews and graphs have different characteristics, we propose to use a multi-view learning framework to incorporate them as different views. In the review content-view, we propose to use a hierarchical model to first learn sentence representations from words, then learn review representations from sentences, and finally learn user/item representations from reviews. In addition, we propose to incorporate a three-level attention network into this view to select important words, sentences and reviews for learning informative user and item representations. In the graph-view, we propose a hierarchical graph neural network to jointly model the user-item, user-user and item-item relatedness by capturing the first- and second-order interactions between users and items in the user-item graph. In addition, we apply attention mechanism to model the importance of these interactions to learn informative user and item representations. Extensive experiments on four benchmark datasets validate the effectiveness of our approach.

pdf bib
Neural News Recommendation with Multi-Head Self-Attention
Chuhan Wu | Fangzhao Wu | Suyu Ge | Tao Qi | Yongfeng Huang | Xing Xie
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

News recommendation can help users find interested news and alleviate information overload. Precisely modeling news and users is critical for news recommendation, and capturing the contexts of words and news is important to learn news and user representations. In this paper, we propose a neural news recommendation approach with multi-head self-attention (NRMS). The core of our approach is a news encoder and a user encoder. In the news encoder, we use multi-head self-attentions to learn news representations from news titles by modeling the interactions between words. In the user encoder, we learn representations of users from their browsed news and use multi-head self-attention to capture the relatedness between the news. Besides, we apply additive attention to learn more informative news and user representations by selecting important words and news. Experiments on a real-world dataset validate the effectiveness and efficiency of our approach.

pdf bib
Hierarchical User and Item Representation with Three-Tier Attention for Recommendation
Chuhan Wu | Fangzhao Wu | Junxin Liu | Yongfeng Huang
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Utilizing reviews to learn user and item representations is useful for recommender systems. Existing methods usually merge all reviews from the same user or for the same item into a long document. However, different reviews, sentences and even words usually have different informativeness for modeling users and items. In this paper, we propose a hierarchical user and item representation model with three-tier attention to learn user and item representations from reviews for recommendation. Our model contains three major components, i.e., a sentence encoder to learn sentence representations from words, a review encoder to learn review representations from sentences, and a user/item encoder to learn user/item representations from reviews. In addition, we incorporate a three-tier attention network in our model to select important words, sentences and reviews. Besides, we combine the user and item representations learned from the reviews with user and item embeddings based on IDs as the final representations to capture the latent factors of individual users and items. Extensive experiments on four benchmark datasets validate the effectiveness of our approach.

pdf bib
Detecting and Extracting of Adverse Drug Reaction Mentioning Tweets with Multi-Head Self Attention
Suyu Ge | Tao Qi | Chuhan Wu | Yongfeng Huang
Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task

This paper describes our system for the first and second shared tasks of the fourth Social Media Mining for Health Applications (SMM4H) workshop. We enhance tweet representation with a language model and distinguish the importance of different words with Multi-Head Self-Attention. In addition, transfer learning is exploited to make up for the data shortage. Our system achieved competitive results on both tasks with an F1-score of 0.5718 for task 1 and 0.653 (overlap) / 0.357 (strict) for task 2.

pdf bib
THU_NGN at SemEval-2019 Task 3: Dialog Emotion Classification using Attentional LSTM-CNN
Suyu Ge | Tao Qi | Chuhan Wu | Yongfeng Huang
Proceedings of the 13th International Workshop on Semantic Evaluation

With the development of the Internet, dialog systems are widely used in online platforms to provide personalized services for their users. It is important to understand the emotions through conversations to improve the quality of dialog systems. To facilitate the researches on dialog emotion recognition, the SemEval-2019 Task 3 named EmoContext is proposed. This task aims to classify the emotions of user utterance along with two short turns of dialogues into four categories. In this paper, we propose an attentional LSTM-CNN model to participate in this shared task. We use a combination of convolutional neural networks and long-short term neural networks to capture both local and long-distance contextual information in conversations. In addition, we apply attention mechanism to recognize and attend to important words within conversations. Besides, we propose to use ensemble strategies by combing the variants of our model with different pre-trained word embeddings via weighted voting. Our model achieved 0.7542 micro-F1 score in the final test data, ranking 15ˆth out of 165 teams.

pdf bib
THU_NGN at SemEval-2019 Task 12: Toponym Detection and Disambiguation on Scientific Papers
Tao Qi | Suyu Ge | Chuhan Wu | Yubo Chen | Yongfeng Huang
Proceedings of the 13th International Workshop on Semantic Evaluation

First name: Tao Last name: Qi Email: taoqi.qt@gmail.com Affiliation: Department of Electronic Engineering, Tsinghua University First name: Suyu Last name: Ge Email: gesy17@mails.tsinghua.edu.cn Affiliation: Department of Electronic Engineering, Tsinghua University First name: Chuhan Last name: Wu Email: wuch15@mails.tsinghua.edu.cn Affiliation: Department of Electronic Engineering, Tsinghua University First name: Yubo Last name: Chen Email: chen-yb18@mails.tsinghua.edu.cn Affiliation: Department of Electronic Engineering, Tsinghua University First name: Yongfeng Last name: Huang Email: yfhuang@mail.tsinghua.edu.cn Affiliation: Department of Electronic Engineering, Tsinghua University Toponym resolution is an important and challenging task in the neural language processing field, and has wide applications such as emergency response and social media geographical event analysis. Toponym resolution can be roughly divided into two independent steps, i.e., toponym detection and toponym disambiguation. In order to facilitate the study on toponym resolution, the SemEval 2019 task 12 is proposed, which contains three subtasks, i.e., toponym detection, toponym disambiguation and toponym resolution. In this paper, we introduce our system that participated in the SemEval 2019 task 12. For toponym detection, in our approach we use TagLM as the basic model, and explore the use of various features in this task, such as word embeddings extracted from pre-trained language models, POS tags and lexical features extracted from dictionaries. For toponym disambiguation, we propose a heuristics rule-based method using toponym frequency and population. Our systems achieved 83.03% strict macro F1, 74.50 strict micro F1, 85.92 overlap macro F1 and 78.47 overlap micro F1 in toponym detection subtask.

pdf bib
Neural News Recommendation with Topic-Aware News Representation
Chuhan Wu | Fangzhao Wu | Mingxiao An | Yongfeng Huang | Xing Xie
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

News recommendation can help users find interested news and alleviate information overload. The topic information of news is critical for learning accurate news and user representations for news recommendation. However, it is not considered in many existing news recommendation methods. In this paper, we propose a neural news recommendation approach with topic-aware news representations. The core of our approach is a topic-aware news encoder and a user encoder. In the news encoder we learn representations of news from their titles via CNN networks and apply attention networks to select important words. In addition, we propose to learn topic-aware news representations by jointly training the news encoder with an auxiliary topic classification task. In the user encoder we learn the representations of users from their browsed news and use attention networks to select informative news for user representation learning. Extensive experiments on a real-world dataset validate the effectiveness of our approach.

2018

pdf bib
Neural Metaphor Detecting with CNN-LSTM Model
Chuhan Wu | Fangzhao Wu | Yubo Chen | Sixing Wu | Zhigang Yuan | Yongfeng Huang
Proceedings of the Workshop on Figurative Language Processing

Metaphors are figurative languages widely used in daily life and literatures. It’s an important task to detect the metaphors evoked by texts. Thus, the metaphor shared task is aimed to extract metaphors from plain texts at word level. We propose to use a CNN-LSTM model for this task. Our model combines CNN and LSTM layers to utilize both local and long-range contextual information for identifying metaphorical information. In addition, we compare the performance of the softmax classifier and conditional random field (CRF) for sequential labeling in this task. We also incorporated some additional features such as part of speech (POS) tags and word cluster to improve the performance of model. Our best model achieved 65.06% F-score in the all POS testing subtask and 67.15% in the verbs testing subtask.

pdf bib
Detecting Tweets Mentioning Drug Name and Adverse Drug Reaction with Hierarchical Tweet Representation and Multi-Head Self-Attention
Chuhan Wu | Fangzhao Wu | Junxin Liu | Sixing Wu | Yongfeng Huang | Xing Xie
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task

This paper describes our system for the first and third shared tasks of the third Social Media Mining for Health Applications (SMM4H) workshop, which aims to detect the tweets mentioning drug names and adverse drug reactions. In our system we propose a neural approach with hierarchical tweet representation and multi-head self-attention (HTR-MSA) for both tasks. Our system achieved the first place in both the first and third shared tasks of SMM4H with an F-score of 91.83% and 52.20% respectively.

pdf bib
THU_NGN at SemEval-2018 Task 3: Tweet Irony Detection with Densely connected LSTM and Multi-task Learning
Chuhan Wu | Fangzhao Wu | Sixing Wu | Junxin Liu | Zhigang Yuan | Yongfeng Huang
Proceedings of The 12th International Workshop on Semantic Evaluation

Detecting irony is an important task to mine fine-grained information from social web messages. Therefore, the Semeval-2018 task 3 is aimed to detect the ironic tweets (subtask A) and their ironic types (subtask B). In order to address this task, we propose a system based on a densely connected LSTM network with multi-task learning strategy. In our dense LSTM model, each layer will take all outputs from previous layers as input. The last LSTM layer will output the hidden representations of texts, and they will be used in three classification task. In addition, we incorporate several types of features to improve the model performance. Our model achieved an F-score of 70.54 (ranked 2/43) in the subtask A and 49.47 (ranked 3/29) in the subtask B. The experimental results validate the effectiveness of our system.

pdf bib
THU_NGN at SemEval-2018 Task 1: Fine-grained Tweet Sentiment Intensity Analysis with Attention CNN-LSTM
Chuhan Wu | Fangzhao Wu | Junxin Liu | Zhigang Yuan | Sixing Wu | Yongfeng Huang
Proceedings of The 12th International Workshop on Semantic Evaluation

Traditional sentiment analysis approaches mainly focus on classifying the sentiment polarities or emotion categories of texts. However, they can’t exploit the sentiment intensity information. Therefore, the SemEval-2018 Task 1 is aimed to automatically determine the intensity of emotions or sentiment of tweets to mine fine-grained sentiment information. In order to address this task, we propose a system based on an attention CNN-LSTM model. In our model, LSTM is used to extract the long-term contextual information from texts. We apply attention techniques to selecting this information. A CNN layer with different size of kernels is used to extract local features. The dense layers take the pooled CNN feature maps and predict the intensity scores. Our system reaches average Pearson correlation score of 0.722 (ranked 12/48) in emotion intensity regression task, and 0.810 in valence regression task (ranked 15/38). It indicates that our system can be further extended.

pdf bib
THU_NGN at SemEval-2018 Task 2: Residual CNN-LSTM Network with Attention for English Emoji Prediction
Chuhan Wu | Fangzhao Wu | Sixing Wu | Zhigang Yuan | Junxin Liu | Yongfeng Huang
Proceedings of The 12th International Workshop on Semantic Evaluation

Emojis are widely used by social media and social network users when posting their messages. It is important to study the relationships between messages and emojis. Thus, in SemEval-2018 Task 2 an interesting and challenging task is proposed, i.e., predicting which emojis are evoked by text-based tweets. We propose a residual CNN-LSTM with attention (RCLA) model for this task. Our model combines CNN and LSTM layers to capture both local and long-range contextual information for tweet representation. In addition, attention mechanism is used to select important components. Besides, residual connection is applied to CNN layers to facilitate the training of neural networks. We also incorporated additional features such as POS tags and sentiment features extracted from lexicons. Our model achieved 30.25% macro-averaged F-score in the first subtask (i.e., emoji prediction in English), ranking 7th out of 48 participants.

pdf bib
THU_NGN at SemEval-2018 Task 10: Capturing Discriminative Attributes with MLP-CNN model
Chuhan Wu | Fangzhao Wu | Sixing Wu | Zhigang Yuan | Yongfeng Huang
Proceedings of The 12th International Workshop on Semantic Evaluation

Existing semantic models are capable of identifying the semantic similarity of words. However, it’s hard for these models to discriminate between a word and another similar word. Thus, the aim of SemEval-2018 Task 10 is to predict whether a word is a discriminative attribute between two concepts. In this task, we apply a multilayer perceptron (MLP)-convolutional neural network (CNN) model to identify whether an attribute is discriminative. The CNNs are used to extract low-level features from the inputs. The MLP takes both the flatten CNN maps and inputs to predict the labels. The evaluation F-score of our system on the test set is 0.629 (ranked 15th), which indicates that our system still needs to be improved. However, the behaviours of our system in our experiments provide useful information, which can help to improve the collective understanding of this novel task.

2017

pdf bib
Active Sentiment Domain Adaptation
Fangzhao Wu | Yongfeng Huang | Jun Yan
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Domain adaptation is an important technology to handle domain dependence problem in sentiment analysis field. Existing methods usually rely on sentiment classifiers trained in source domains. However, their performance may heavily decline if the distributions of sentiment features in source and target domains have significant difference. In this paper, we propose an active sentiment domain adaptation approach to handle this problem. Instead of the source domain sentiment classifiers, our approach adapts the general-purpose sentiment lexicons to target domain with the help of a small number of labeled samples which are selected and annotated in an active learning mode, as well as the domain-specific sentiment similarities among words mined from unlabeled samples of target domain. A unified model is proposed to fuse different types of sentiment information and train sentiment classifier for target domain. Extensive experiments on benchmark datasets show that our approach can train accurate sentiment classifier with less labeled samples.

pdf bib
THU_NGN at IJCNLP-2017 Task 2: Dimensional Sentiment Analysis for Chinese Phrases with Deep LSTM
Chuhan Wu | Fangzhao Wu | Yongfeng Huang | Sixing Wu | Zhigang Yuan
Proceedings of the IJCNLP 2017, Shared Tasks

Predicting valence-arousal ratings for words and phrases is very useful for constructing affective resources for dimensional sentiment analysis. Since the existing valence-arousal resources of Chinese are mainly in word-level and there is a lack of phrase-level ones, the Dimensional Sentiment Analysis for Chinese Phrases (DSAP) task aims to predict the valence-arousal ratings for Chinese affective words and phrases automatically. In this task, we propose an approach using a densely connected LSTM network and word features to identify dimensional sentiment on valence and arousal for words and phrases jointly. We use word embedding as major feature and choose part of speech (POS) and word clusters as additional features to train the dense LSTM network. The evaluation results of our submissions (1st and 2nd in average performance) validate the effectiveness of our system to predict valence and arousal dimensions for Chinese words and phrases.

2016

pdf bib
Sentiment Domain Adaptation with Multiple Sources
Fangzhao Wu | Yongfeng Huang
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)