Multimodal aspect-based sentiment analysis (MABSA) aims to extract the aspect terms from text and image pairs, and then analyze their corresponding sentiment. Recent studies typically use either a pipeline method or a unified transformer based on a cross-attention mechanism. However, these methods fail to explicitly and effectively incorporate the alignment between text and image. Supervised finetuning of the universal transformers for MABSA still requires a certain number of aligned image-text pairs. This study proposes a dual-encoder transformer with cross-modal alignment (DTCA). Two auxiliary tasks, including text-only extraction and text-patch alignment are introduced to enhance cross-attention performance. To align text and image, we propose an unsupervised approach which minimizes the Wasserstein distance between both modalities, forcing both encoders to produce more appropriate representations for the final extraction. Experimental results on two benchmarks demonstrate that DTCA consistently outperforms existing methods.
Named Entity Recognition (NER) is a fundamental task in information extraction that locates the mentions of named entities and classifies them in unstructured texts. Previous studies typically used hidden Markov model (HMM) and conditional random fields (CRF) for NER. To learn long-distance dependencies in text, recurrent neural networks, e.g., LSTM and GRU can extract the semantic features for each token with a sequential manner. Based on Transformers, this paper describes the contribution to ROCLING-2022 Share Task. This paper adopts a transformer-based model with focal Loss and regularization dropout. The focal loss is to overcome the uneven distribution of the label. The regularization dropout (r-drop) is to address the problem of vocabulary and descriptions that are too domain-specific. The ensemble learning is to improve the performance of the model. Comparative experiments were conducted on dev set to select the model with the best performance for submission. That is, BERT model with BiLSTM-CRF, focal loss and R-Drop has achieved the best F1-score of 0.7768 and rank the 4th place.
This paper will present the methods we use as the YNU-HPCC team in the SemEval-2022 Task 2, Multilingual Idiomaticity Detection and Sentence Embedding. We are involved in two subtasks, including four settings. In subtask B of sentence representation, we used novel approaches with ideas of contrastive learning to optimize model, where method of CoSENT was used in the pre-train setting, and triplet loss and multiple negatives ranking loss functions in fine-tune setting. We had achieved very competitive results on the final released test datasets. However, for subtask A of idiomaticity detection, we simply did a few explorations and experiments based on the xlm-RoBERTa model. Sentence concatenated with additional MWE as inputs did well in a one-shot setting. Sentences containing context had a poor performance on final released test data in zero-shot setting even if we attempted to extract effective information from CLS tokens of hidden layers.
This paper describes a system built in the SemEval-2022 competition. As participants in Task 4: Patronizing and Condescending Language Detection, we implemented the text sentiment classification system for two subtasks in English. Both subtasks involve determining emotions; subtask 1 requires us to determine whether the text belongs to the PCL category (single-label classification), and subtask 2 requires us to determine to which PCL category the text belongs (multi-label classification). Our system is based on the bidirectional encoder representations from transformers (BERT) model. For the single-label classification, our system applies a BertForSequenceClassification model to classify the input text. For the multi-label classification, we use the fine-tuned BERT model to extract the sentiment score of the text and a fully connected layer to classify the text into the PCL categories. Our system achieved relatively good results on the competition’s official leaderboard.
This paper describes our system used in the SemEval-2022 Task5 Multimedia Automatic Misogyny Identification (MAMI). This task is to use the provided text-image pairs to classify emotions. In this paper, We propose a multi-label emotion classification model based on pre-trained LXMERT. We use Faster-RCNN to extract visual representation and utilize LXMERT’s cross-attention for multi-modal alignment. Then we use the Bilinear-interaction layer to fuse these features. Our experimental results surpass the F1 score of baseline. For Sub-task A, our F1 score is 0.662 and Sub-task B’s F1 score is 0.633. The code of this study is available on GitHub.
In this paper, we (a YNU-HPCC team) describe the system we built in the SemEval-2022 competition. As participants in Task 6 (titled “iSarcasmEval: Intended Sarcasm Detection In English and Arabic”), we implement the sentiment system for all three subtasks in English and Arabic. All subtasks involve the detection of sarcasm (binary and multilabel classification) and the determination of the sarcastic text location (sentence pair classification). Our system primarily applies the sequence classification model of a bidirectional encoder representation from a transformer (BERT). The BERT is used to extract sentence information from both directions for downstream classification tasks. A single basic model is used for single-sentence and sentence-pair binary classification tasks. For the multilabel task, the Label-Powerset method and binary cross-entropy loss function with weights are used. Our system exhibits competitive performance, obtaining 12/43 (21/32), 11/22, and 3/16 (8/13) rankings in the three official rankings for English (Arabic).
This paper describes the system submitted by our team (YNU-HPCC) to SemEval-2022 Task 8: Multilingual news article similarity. This task requires participants to develop a system which could evaluate the similarity between multilingual news article pairs. We propose an approach that relies on Transformers to compute the similarity between pairs of news. We tried different models namely BERT, ALBERT, ELECTRA, RoBERTa, M-BERT and Compared their results. At last, we chose M-BERT as our System, which has achieved the best Pearson Correlation Coefficient score of 0.738.
Conditional computation algorithms, such as the early exiting (EE) algorithm, can be applied to accelerate the inference of pretrained language models (PLMs) while maintaining competitive performance on resource-constrained devices. However, this approach is only applied to the vertical architecture to decide which layers should be used for inference. Conversely, the operation of the horizontal perspective is ignored, and the determination of which tokens in each layer should participate in the computation fails, leading to a high redundancy for adaptive inference. To address this limitation, a unified horizontal and vertical multi-perspective early exiting (MPEE) framework is proposed in this study to accelerate the inference of transformer-based models. Specifically, the vertical architecture uses recycling EE classifier memory and weighted self-distillation to enhance the performance of the EE classifiers. Then, the horizontal perspective uses recycling class attention memory to emphasize the informative tokens. Conversely, the tokens with less information are truncated by weighted fusion and isolated from the following computation. Based on this, both horizontal and vertical EE are unified to obtain a better tradeoff between performance and efficiency. Extensive experimental results show that MPEE can achieve higher acceleration inference with competent performance than existing competitive methods.
The billions, and sometimes even trillions, of parameters involved in pre-trained language models significantly hamper their deployment in resource-constrained devices and real-time applications. Knowledge distillation (KD) can transfer knowledge from the original model (i.e., teacher) into a compact model (i.e., student) to achieve model compression. However, previous KD methods have usually frozen the teacher and applied its immutable output feature maps as soft labels to guide the student’s training. Moreover, the goal of the teacher is to achieve the best performance on downstream tasks rather than knowledge transfer. Such a fixed architecture may limit the teacher’s teaching and student’s learning abilities. Herein, a knowledge distillation method with reptile meta-learning is proposed to facilitate the transfer of knowledge from the teacher to the student. The teacher can continuously meta-learn the student’s learning objective to adjust its parameters for maximizing the student’s performance throughout the distillation process. In this way, the teacher learns to teach, produces more suitable soft labels, and transfers more appropriate knowledge to the student, resulting in improved performance. Unlike previous KD using meta-learning, the proposed method only needs to calculate the first-order derivatives to update the teacher, leading to lower computational cost but better convergence. Extensive experiments on the GLUE benchmark show the competitive performance achieved by the proposed method. For reproducibility, the code for this paper is available at: https://github.com/maxinge8698/ReptileDistil.
This paper describes the system we built as the YNU-HPCC team in the SemEval-2021 Task 11: NLPContributionGraph. This task involves first identifying sentences in the given natural language processing (NLP) scholarly articles that reflect research contributions through binary classification; then identifying the core scientific terms and their relation phrases from these contribution sentences by sequence labeling; and finally, these scientific terms and relation phrases are categorized, identified, and organized into subject-predicate-object triples to form a knowledge graph with the help of multiclass classification and multi-label classification. We developed a system for this task using a pre-trained language representation model called BERT that stands for Bidirectional Encoder Representations from Transformers, and achieved good results. The average F1-score for Evaluation Phase 2, Part 1 was 0.4562 and ranked 7th, and the average F1-score for Evaluation Phase 2, Part 2 was 0.6541, and also ranked 7th.
Toxic span detection requires the detection of spans that make a text toxic instead of simply classifying the text. In this paper, a transformer-based model with auxiliary information is proposed for SemEval-2021 Task 5. The proposed model was implemented based on the BERT-CRF architecture. It consists of three parts: a transformer-based model that can obtain the token representation, an auxiliary information module that combines features from different layers, and an output layer used for the classification. Various BERT-based models, such as BERT, ALBERT, RoBERTa, and XLNET, were used to learn contextual representations. The predictions of these models were assembled to improve the sequence labeling tasks by using a voting strategy. Experimental results showed that the introduced auxiliary information can improve the performance of toxic spans detection. The proposed model ranked 5th of 91 in the competition. The code of this study is available at https://github.com/Chenrj233/semeval2021_task5
In recent years, memes combining image and text have been widely used in social media, and memes are one of the most popular types of content used in online disinformation campaigns. In this paper, our study on the detection of persuasion techniques in texts and images in SemEval-2021 Task 6 is summarized. For propaganda technology detection in text, we propose a combination model of both ALBERT and Text CNN for text classification, as well as a BERT-based multi-task sequence labeling model for propaganda technology coverage span detection. For the meme classification task involved in text understanding and visual feature extraction, we designed a parallel channel model divided into text and image channels. Our method achieved a good performance on subtasks 1 and 3. The micro F1-scores of 0.492, 0.091, and 0.446 achieved on the test sets of the three subtasks ranked 12th, 7th, and 11th, respectively, and all are higher than the baseline model.
Data sharing restrictions are common in NLP datasets. The purpose of this task is to develop a model trained in a source domain to make predictions for a target domain with related domain data. To address the issue, the organizers provided the models that fine-tuned a large number of source domain data on pre-trained models and the dev data for participants. But the source domain data was not distributed. This paper describes the provided model to the NER (Name entity recognition) task and the ways to develop the model. As a little data provided, pre-trained models are suitable to solve the cross-domain tasks. The models fine-tuned by large number of another domain could be effective in new domain because the task had no change.
Aspect-level sentiment analysis(ASC) predicts each specific aspect term’s sentiment polarity in a given text or review. Recent studies used attention-based methods that can effectively improve the performance of aspect-level sentiment analysis. These methods ignored the syntactic relationship between the aspect and its corresponding context words, leading the model to focus on syntactically unrelated words mistakenly. One proposed solution, the graph convolutional network (GCN), cannot completely avoid the problem. While it does incorporate useful information about syntax, it assigns equal weight to all the edges between connected words. It may still incorrectly associate unrelated words to the target aspect through the iterations of graph convolutional propagation. In this study, a graph attention network with memory fusion is proposed to extend GCN’s idea by assigning different weights to edges. Syntactic constraints can be imposed to block the graph convolutional propagation of unrelated words. A convolutional layer and a memory fusion were applied to learn and exploit multiword relations and draw different weights of words to improve performance further. Experimental results on five datasets show that the proposed method yields better performance than existing methods.
This paper describes an ensemble model designed for Semeval-2020 Task 7. The task is based on the Humicroedit dataset that is comprised of news titles and one-word substitutions designed to make them humorous. We use BERT, FastText, Elmo, and Word2Vec to encode these titles then pass them to a bidirectional gated recurrent unit (BiGRU) with attention. Finally, we used XGBoost on the concatenation of the results of the different models to make predictions.
this paper proposed a parallel-channel model to process the textual and visual information in memes and then analyze the sentiment polarity of memes. In the shared task of identifying and categorizing memes, we preprocess the dataset according to the language behaviors on social media. Then, we adapt and fine-tune the Bidirectional Encoder Representations from Transformers (BERT), and two types of convolutional neural network models (CNNs) were used to extract the features from the pictures. We applied an ensemble model that combined the BiLSTM, BIGRU, and Attention models to perform cross domain suggestion mining. The officially released results show that our system performs better than the baseline algorithm
It is fairly common to use code-mixing on a social media platform to express opinions and emotions in multilingual societies. The purpose of this task is to detect the sentiment of code-mixed social media text. Code-mixed text poses a great challenge for the traditional NLP system, which currently uses monolingual resources to deal with the problem of multilingual mixing. This task has been solved in the past using lexicon lookup in respective sentiment dictionaries and using a long short-term memory (LSTM) neural network for monolingual resources. In this paper, we present a system that uses a bilingual vector gating mechanism for bilingual resources to complete the task. The model consists of two main parts: the vector gating mechanism, which combines the character and word levels, and the attention mechanism, which extracts the important emotional parts of the text. The results show that the proposed system outperforms the baseline algorithm. We achieved fifth place in Spanglish and 19th place in Hinglish.
This paper summarizes our studies on propaganda detection techniques for news articles in the SemEval-2020 task 11. This task is divided into the SI and TC subtasks. We implemented the GloVe word representation, the BERT pretraining model, and the LSTM model architecture to accomplish this task. Our approach achieved good results for both the SI and TC subtasks. The macro- F 1 - score for the SI subtask is 0.406, and the micro- F 1 - score for the TC subtask is 0.505. Our method significantly outperforms the officially released baseline method, and the SI and TC subtasks rank 17th and 22nd, respectively, for the test set. This paper also compares the performances of different deep learning model architectures, such as the Bi-LSTM, LSTM, BERT, and XGBoost models, on the detection of news promotion techniques.
In this study, we propose a multi-granularity ordinal classification method to address the problem of emphasis selection. In detail, the word embedding is learned from Embeddings from Language Model (ELMO) to extract feature vector representation. Then, the ordinal classifica-tions are implemented on four different multi-granularities to approximate the continuous em-phasize values. Comparative experiments were conducted to compare the model with baseline in which the problem is transformed to label distribution problem.
This paper describes our approach to the sentiment analysis of Twitter textual conversations based on deep learning. We analyze the syntax, abbreviations, and informal-writing of Twitter; and perform perfect data preprocessing on the data to convert them to normative text. We apply a multi-step ensemble strategy to solve the problem of extremely unbalanced data in the training set. This is achieved by taking the GloVe and Elmo word vectors as input into a combination model with four different deep neural networks. The experimental results from the development dataset demonstrate that the proposed model exhibits a strong generalization ability. For evaluation on the best dataset, we integrated the results using the stacking ensemble learning approach and achieved competitive results. According to the final official review, the results of our model ranked 10th out of 165 teams.
This paper describes our system designed for SemEval 2019 Task 5 “Shared Task on Multilingual Detection of Hate”.We only participate in subtask-A in English. To address this task, we present a stacked BiGRU model based on a capsule network system. In or- der to convert the tweets into corresponding vector representations and input them into the neural network, we use the fastText tools to get word representations. Then, the sentence representation is enriched by stacked Bidirectional Gated Recurrent Units (BiGRUs) and used as the input of capsule network. Our system achieves an average F1-score of 0.546 and ranks 3rd in the subtask-A in English.
This document describes the submission of team YNU-HPCC to SemEval-2019 for three Sub-tasks of Task 6: Sub-task A, Sub-task B, and Sub-task C. We have submitted four systems to identify and categorise offensive language. The first subsystem is an attention-based 2-layer bidirectional long short-term memory (BiLSTM). The second subsystem is a voting ensemble of four different deep learning architectures. The third subsystem is a stacking ensemble of four different deep learning architectures. Finally, the fourth subsystem is a bidirectional encoder representations from transformers (BERT) model. Among our models, in Sub-task A, our first subsystem performed the best, ranking 16th among 103 teams; in Sub-task B, the second subsystem performed the best, ranking 12th among 75 teams; in Sub-task C, the fourth subsystem performed best, ranking 4th among 65 teams.
This paper describes the system submitted to SemEval 2019 Task 6: OffensEval 2019. The task aims to identify and categorize offensive language in social media, we only participate in Sub-task A, which aims to identify offensive language. In order to address this task, we propose a system based on a K-max pooling convolutional neural network model, and use an argument for averaging as a valid meta-embedding technique to get a metaembedding. Finally, we also use a cyclic learning rate policy to improve model performance. Our model achieves a Macro F1-score of 0.802 (ranked 9/103) in the Sub-task A.
We propose a system that uses a long short-term memory with attention mechanism (LSTM-Attention) model to complete the task. The LSTM-Attention model uses two LSTM to extract the features of the question and answer pair. Then, each of the features is sequentially composed using the attention mechanism, concatenating the two vectors into one. Finally, the concatenated vector is used as input for the MLP and the MLP’s output layer uses the softmax function to classify the provided answers into three categories. This model is capable of extracting the features of the question and answer pair well. The results show that the proposed system outperforms the baseline algorithm.
In this paper we describe a deep-learning system that competed as SemEval 2019 Task 9-SubTask A: Suggestion Mining from Online Reviews and Forums. We use Word2Vec to learn the distributed representations from sentences. This system is composed of a Stacked Bidirectional Long-Short Memory Network (SBiLSTM) for enriching word representations before and after the sequence relationship with context. We perform an ensemble to improve the effectiveness of our model. Our official submission results achieve an F1-score 0.5659.
Consumer opinions towards commercial entities are generally expressed through online reviews, blogs, and discussion forums. These opinions largely express positive and negative sentiments towards a given entity,but also tend to contain suggestions for improving the entity. In this task, we extract suggestions from given the unstructured text, compared to the traditional opinion mining systems. Such suggestion mining is more applicability and extends capabilities.
Deep neural network models such as long short-term memory (LSTM) and tree-LSTM have been proven to be effective for sentiment analysis. However, sequential LSTM is a bias model wherein the words in the tail of a sentence are more heavily emphasized than those in the header for building sentence representations. Even tree-LSTM, with useful structural information, could not avoid the bias problem because the root node will be dominant and the nodes in the bottom of the parse tree will be less emphasized even though they may contain salient information. To overcome the bias problem, this study proposes a capsule tree-LSTM model, introducing a dynamic routing algorithm as an aggregation layer to build sentence representation by assigning different weights to nodes according to their contributions to prediction. Experiments on Stanford Sentiment Treebank (SST) for sentiment classification and EmoBank for regression show that the proposed method improved the performance of tree-LSTM and other neural network models. In addition, the deeper the tree structure, the bigger the improvement.
We implemented the sentiment system in all five subtasks for English and Spanish. All subtasks involve emotion or sentiment intensity prediction (regression and ordinal classification) and emotions determining (multi-labels classification). The useful BiLSTM (Bidirectional Long-Short Term Memory) model with attention mechanism was mainly applied for our system. We use BiLSTM in order to get word information extracted from both directions. The attention mechanism was used to find the contribution of each word for improving the scores. Furthermore, based on BiLSTMATT (BiLSTM with attention mechanism) a few deep-learning algorithms were employed for different subtasks. For regression and ordinal classification tasks we used domain adaptation and ensemble learning methods to leverage base model. While a single base model was used for multi-labels task.
This paper describes our approach to SemEval-2018 Task 2, which aims to predict the most likely associated emoji, given a tweet in English or Spanish. We normalized text-based tweets during pre-processing, following which we utilized a bi-directional gated recurrent unit with an attention mechanism to build our base model. Multi-models with or without class weights were trained for the ensemble methods. We boosted models without class weights, and only strong boost classifiers were identified. In our system, not only was a boosting method used, but we also took advantage of the voting ensemble method to enhance our final system result. Our method demonstrated an obvious improvement of approximately 3% of the macro F1 score in English and 2% in Spanish.
This paper describe the system we proposed to participate the first year of Irony detection in English tweets competition. Previous works demonstrate that LSTMs models have achieved remarkable performance in natural language processing; besides, combining multiple classification from various individual classifiers in general is more powerful than a single classification. In order to obtain more precision classification of irony detection, our system trained several individual neural network classifiers and combined their results according to the ensemble-learning algorithm.
This shared task is a typical question answering task. Compared with the normal question and answer system, it needs to give the answer to the question based on the text provided. The essence of the problem is actually reading comprehension. Typically, there are several questions for each text that correspond to it. And for each question, there are two candidate answers (and only one of them is correct). To solve this problem, the usual approach is to use convolutional neural networks (CNN) and recurrent neural network (RNN) or their improved models (such as long short-term memory (LSTM)). In this paper, an attention-based CNN-LSTM model is proposed for this task. By adding an attention mechanism and combining the two models, this experimental result has been significantly improved.
An argument is divided into two parts, the claim and the reason. To obtain a clearer conclusion, some additional explanation is required. In this task, the explanations are called warrants. This paper introduces a bi-directional long short term memory (Bi-LSTM) with an attention model to select a correct warrant from two to explain an argument. We address this question as a question-answering system. For each warrant, the model produces a probability that it is correct. Finally, the system chooses the highest correct probability as the answer. Ensemble learning is used to enhance the performance of the model. Among all of the participants, we ranked 15th on the test results.
Building a system to detect Chinese grammatical errors is a challenge for natural-language processing researchers. As Chinese learners are increasing, developing such a system can help them study Chinese more easily. This paper introduces a bi-directional long short-term memory (BiLSTM) - conditional random field (CRF) model to produce the sequences that indicate an error type for every position of a sentence, since we regard Chinese grammatical error diagnosis (CGED) as a sequence-labeling problem.
This paper describes our submission to IJCNLP 2017 shared task 4, for predicting the tags of unseen customer feedback sentences, such as comments, complaints, bugs, requests, and meaningless and undetermined statements. With the use of a neural network, a large number of deep learning methods have been developed, which perform very well on text classification. Our ensemble classification model is based on a bi-directional gated recurrent unit and an attention mechanism which shows a 3.8% improvement in classification accuracy. To enhance the model performance, we also compared it with several word-embedding models. The comparative results show that a combination of both word2vec and GloVe achieves the best performance.
A shared task is a typical question answering task that aims to test how accurately the participants can answer the questions in exams. Typically, for each question, there are four candidate answers, and only one of the answers is correct. The existing methods for such a task usually implement a recurrent neural network (RNN) or long short-term memory (LSTM). However, both RNN and LSTM are biased models in which the words in the tail of a sentence are more dominant than the words in the header. In this paper, we propose the use of an attention-based LSTM (AT-LSTM) model for these tasks. By adding an attention mechanism to the standard LSTM, this model can more easily capture long contextual information.
In this paper, we present a system that uses a convolutional neural network with long short-term memory (CNN-LSTM) model to complete the task. The CNN-LSTM model has two combined parts: CNN extracts local n-gram features within tweets and LSTM composes the features to capture long-distance dependency across tweets. Additionally, we used other three models (CNN, LSTM, BiLSTM) as baseline algorithms. Our introduced model showed good performance in the experimental results.
Word embeddings that can capture semantic and syntactic information from contexts have been extensively used for various natural language processing tasks. However, existing methods for learning context-based word embeddings typically fail to capture sufficient sentiment information. This may result in words with similar vector representations having an opposite sentiment polarity (e.g., good and bad), thus degrading sentiment analysis performance. Therefore, this study proposes a word vector refinement model that can be applied to any pre-trained word vectors (e.g., Word2vec and GloVe). The refinement model is based on adjusting the vector representations of words such that they can be closer to both semantically and sentimentally similar words and further away from sentimentally dissimilar words. Experimental results show that the proposed method can improve conventional word embeddings and outperform previously proposed sentiment embeddings for both binary and fine-grained classification on Stanford Sentiment Treebank (SST).
In this paper, we propose a multi-channel convolutional neural network-long short-term memory (CNN-LSTM) model that consists of two parts: multi-channel CNN and LSTM to analyze the sentiments of short English messages from Twitter. Un-like a conventional CNN, the proposed model applies a multi-channel strategy that uses several filters of different length to extract active local n-gram features in different scales. This information is then sequentially composed using LSTM. By combining both CNN and LSTM, we can consider both local information within tweets and long-distance dependency across tweets in the classification process. Officially released results show that our system outperforms the baseline algo-rithm.
Abstract Automatic grammatical error detection for Chinese has been a big challenge for NLP researchers. Due to the formal and strict grammar rules in Chinese, it is hard for foreign students to master Chinese. A computer-assisted learning tool which can automatically detect and correct Chinese grammatical errors is necessary for those foreign students. Some of the previous works have sought to identify Chinese grammatical errors using template- and learning-based methods. In contrast, this study introduced convolutional neural network (CNN) and long-short term memory (LSTM) for the shared task of Chinese Grammatical Error Diagnosis (CGED). Different from traditional word-based embedding, single word embedding was used as input of CNN and LSTM. The proposed single word embedding can capture both semantic and syntactic information to detect those four type grammatical error. In experimental evaluation, the recall and f1-score of our submitted results Run1 of the TOCFL testing data ranked the fourth place in all submissions in detection-level.