Coreference resolution is an NLP task to find out whether the set of referring expressions belong to the same concept in discourse. A multi-pass sieve is a deterministic coreference model that implements several layers of sieves, where each sieve takes a pair of correlated mentions from a collection of non-coherent mentions. The multi-pass sieve is based on the principle of high precision, followed by increased recall in each sieve. In this work, we examine the portability of the multi-pass sieve coreference resolution model to the Indonesian language. We conduct the experiment on 201 Wikipedia documents and the multi-pass sieve system yields 72.74% of MUC F-measure and 52.18% of BCUBED F-measure.
This paper describes our system, UI, for task A: Sentiment Classification in SemEval-2020 Task 8 Memotion Analysis. We use a common traditional machine learning, which is SVM, by utilizing the combination of text and images features. The data consist text that extracted from memes and the images of memes. We employ n-gram language model for text features and pre-trained model,VGG-16,for image features. After obtaining both features from text and images in form of 2-dimensional arrays, we concatenate and classify the final features using SVM. The experiment results show SVM achieved 35% for its F1 macro, which is 0.132 points or 13.2% above the baseline model.
In this paper, we present the result of our experiment with a variant of 1 Dimensional Convolutional Neural Network (Conv1D) hyper-parameters value. We describe the system entered by the team of Information Retrieval Lab. Universitas Indonesia (3218IR) in the SemEval 2020 Task 11 Sub Task 1 about propaganda span identification in news articles. The best model obtained an F1 score of 0.24 in the development set and 0.23 in the test set. We show that there is a potential for performance improvement through the use of models with appropriate hyper-parameters. Our system uses a combination of Conv1D and GloVe as Word Embedding to detect propaganda in the fragment text level.
In this paper, we present our approach and the results of our participation in OffensEval 2020. There are three sub-tasks in OffensEval 2020 namely offensive language identification (sub-task A), automatic categorization of offense types (sub-task B), and offense target identification (sub-task C). We participated in sub-task A of English OffensEval 2020. Our approach emphasizes on how the emoji affects offensive language identification. Our model used LSTM combined with GloVe pre-trained word vectors to identify offensive language on social media. The best model obtained macro F1-score of 0.88428.
Hate speech and abusive language spreading on social media need to be detected automatically to avoid conflict between citizen. Moreover, hate speech has a target, category, and level that also needs to be detected to help the authority in prioritizing which hate speech must be addressed immediately. This research discusses multi-label text classification for abusive language and hate speech detection including detecting the target, category, and level of hate speech in Indonesian Twitter using machine learning approach with Support Vector Machine (SVM), Naive Bayes (NB), and Random Forest Decision Tree (RFDT) classifier and Binary Relevance (BR), Label Power-set (LP), and Classifier Chains (CC) as the data transformation method. We used several kinds of feature extractions which are term frequency, orthography, and lexicon features. Our experiment results show that in general RFDT classifier using LP as the transformation method gives the best accuracy with fast computational time.