Text simplification is the process of splitting and rephrasing a sentence to a sequence of sentences making it easier to read and understand while preserving the content and approximating the original meaning. Text simplification has been exploited in NLP applications like machine translation, summarization, semantic role labeling, and information extraction, opening a broad avenue for its exploitation in comprehension-based question-answering downstream tasks. In this work, we investigate the effect of text simplification in the task of question-answering using a comprehension context. We release Simple-SQuAD, a simplified version of the widely-used SQuAD dataset. Firstly, we outline each step in the dataset creation pipeline, including style transfer, thresholding of sentences showing correct transfer, and offset finding for each answer. Secondly, we verify the quality of the transferred sentences through various methodologies involving both automated and human evaluation. Thirdly, we benchmark the newly created corpus and perform an ablation study for examining the effect of the simplification process in the SQuAD-based question answering task. Our experiments show that simplification leads to up to 2.04% and 1.74% increase in Exact Match and F1, respectively. Finally, we conclude with an analysis of the transfer process, investigating the types of edits made by the model, and the effect of sentence length on the transfer model.
In this paper, we present the results obtained by BERT, BiLSTM and SVM classifiers on the shared task on Sarcasm Detection held as part of The Second Workshop on Figurative Language Processing. The shared task required the use of conversational context to detect sarcasm. We experimented by varying the amount of context used along with the response (response is the text to be classified). The amount of context used includes (i) zero context, (ii) last one, two or three utterances, and (iii) all utterances. It was found that including the last utterance in the dialogue along with the response improved the performance of the classifier for the Twitter data set. On the other hand, the best performance for the Reddit data set was obtained when using only the response without any contextual information. The BERT classifier obtained F-score of 0.743 and 0.658 for the Twitter and Reddit data set respectively.
In this paper, we present a multimodal architecture to determine the emotion expressed in a meme. This architecture utilizes both textual and visual information present in a meme. To extract image features we experimented with pre-trained VGG-16 and Inception-V3 classifiers and to extract text features we used LSTM and BERT classifiers. Both FastText and GloVe embeddings were experimented with for the LSTM classifier. The best F1 scores our classifier obtained on the official analysis results are 0.3309, 0.4752, and 0.2897 for Task A, B, and C respectively in the Memotion Analysis task (Task 8) organized as part of International Workshop on Semantic Evaluation 2020 (SemEval 2020). In our study, we found that combining both textual and visual information expressed in a meme improves the performance of the classifier as opposed to using standalone classifiers that use only text or visual data.
In this paper, we present the results that the team IIITG-ADBU (codalab username ‘abaruah’) obtained in the SentiMix task (Task 9) of the International Workshop on Semantic Evaluation 2020 (SemEval 2020). This task required the detection of sentiment in code-mixed Hindi-English tweets. Broadly, we performed two sets of experiments for this task. The first experiment was performed using the multilingual BERT classifier and the second set of experiments was performed using SVM classifiers. The character-based SVM classifier obtained the best F1 score of 0.678 in the test set with a rank of 21 among 62 participants. The performance of the multilingual BERT classifier was quite comparable with the SVM classifier on the development set. However, on the test set it obtained an F1 score of 0.342.
Task 12 of SemEval 2020 consisted of 3 subtasks, namely offensive language identification (Subtask A), categorization of offense type (Subtask B), and offense target identification (Subtask C). This paper presents the results our classifiers obtained for the English language in the 3 subtasks. The classifiers used by us were BERT and BiLSTM. On the test set, our BERT classifier obtained macro F1 score of 0.90707 for subtask A, and 0.65279 for subtask B. The BiLSTM classifier obtained macro F1 score of 0.57565 for subtask C. The paper also performs an analysis of the errors made by our classifiers. We conjecture that the presence of few misleading instances in the dataset is affecting the performance of the classifiers. Our analysis also discusses the need of temporal context and world knowledge to determine the offensiveness of few comments.
This paper presents the results of the classifiers we developed for the shared tasks in aggression identification and misogynistic aggression identification. These two shared tasks were held as part of the second workshop on Trolling, Aggression and Cyberbullying (TRAC). Both the subtasks were held for English, Hindi and Bangla language. In our study, we used English BERT (En-BERT), RoBERTa, DistilRoBERTa, and SVM based classifiers for English language. For Hindi and Bangla language, multilingual BERT (M-BERT), XLM-RoBERTa and SVM classifiers were used. Our best performing models are EN-BERT for English Subtask A (Weighted F1 score of 0.73, Rank 5/16), SVM for English Subtask B (Weighted F1 score of 0.87, Rank 2/15), SVM for Hindi Subtask A (Weighted F1 score of 0.79, Rank 2/10), XLMRoBERTa for Hindi Subtask B (Weighted F1 score of 0.87, Rank 2/10), SVM for Bangla Subtask A (Weighted F1 score of 0.81, Rank 2/10), and SVM for Bangla Subtask B (Weighted F1 score of 0.93, Rank 4/8). It is seen that the superior performance of the SVM classifier was achieved mainly because of its better prediction of the majority class. BERT based classifiers were found to predict the minority classes better.
In this paper, we present the results obtained using bi-directional long short-term memory (BiLSTM) with and without attention and Logistic Regression (LR) models for SemEval-2019 Task 5 titled ”HatEval: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter”. This paper presents the results obtained for Subtask A for English language. The results of the BiLSTM and LR models are compared for two different types of preprocessing. One with no stemming performed and no stopwords removed. The other with stemming performed and stopwords removed. The BiLSTM model without attention performed the best for the first test, while the LR model with character n-grams performed the best for the second test. The BiLSTM model obtained an F1 score of 0.51 on the test set and obtained an official ranking of 8/71.