Indra Budi


2021

pdf bib
A Multi-Pass Sieve Coreference Resolution for Indonesian
Valentina Kania Prameswara Artari | Rahmad Mahendra | Meganingrum Arista Jiwanggi | Adityo Anggraito | Indra Budi
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Coreference resolution is an NLP task to find out whether the set of referring expressions belong to the same concept in discourse. A multi-pass sieve is a deterministic coreference model that implements several layers of sieves, where each sieve takes a pair of correlated mentions from a collection of non-coherent mentions. The multi-pass sieve is based on the principle of high precision, followed by increased recall in each sieve. In this work, we examine the portability of the multi-pass sieve coreference resolution model to the Indonesian language. We conduct the experiment on 201 Wikipedia documents and the multi-pass sieve system yields 72.74% of MUC F-measure and 52.18% of BCUBED F-measure.

2020

pdf bib
Aspect-based Sentiment Analysis on Indonesia’s Tourism Destinations Based on Google Maps User Code-Mixed Reviews (Study Case: Borobudur and Prambanan Temples)
Dian Arianto | Indra Budi
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation

pdf bib
UI at SemEval-2020 Task 8: Text-Image Fusion for Sentiment Classification
Andi Suciati | Indra Budi
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes our system, UI, for task A: Sentiment Classification in SemEval-2020 Task 8 Memotion Analysis. We use a common traditional machine learning, which is SVM, by utilizing the combination of text and images features. The data consist text that extracted from memes and the images of memes. We employ n-gram language model for text features and pre-trained model,VGG-16,for image features. After obtaining both features from text and images in form of 2-dimensional arrays, we concatenate and classify the final features using SVM. The experiment results show SVM achieved 35% for its F1 macro, which is 0.132 points or 13.2% above the baseline model.

pdf bib
3218IR at SemEval-2020 Task 11: Conv1D and Word Embedding in Propaganda Span Identification at News Articles
Dimas Sony Dewantara | Indra Budi | Muhammad Okky Ibrohim
Proceedings of the Fourteenth Workshop on Semantic Evaluation

In this paper, we present the result of our experiment with a variant of 1 Dimensional Convolutional Neural Network (Conv1D) hyper-parameters value. We describe the system entered by the team of Information Retrieval Lab. Universitas Indonesia (3218IR) in the SemEval 2020 Task 11 Sub Task 1 about propaganda span identification in news articles. The best model obtained an F1 score of 0.24 in the development set and 0.23 in the test set. We show that there is a potential for performance improvement through the use of models with appropriate hyper-parameters. Our system uses a combination of Conv1D and GloVe as Word Embedding to detect propaganda in the fragment text level.

pdf bib
IR3218-UI at SemEval-2020 Task 12: Emoji Effects on Offensive Language IdentifiCation
Sandy Kurniawan | Indra Budi | Muhammad Okky Ibrohim
Proceedings of the Fourteenth Workshop on Semantic Evaluation

In this paper, we present our approach and the results of our participation in OffensEval 2020. There are three sub-tasks in OffensEval 2020 namely offensive language identification (sub-task A), automatic categorization of offense types (sub-task B), and offense target identification (sub-task C). We participated in sub-task A of English OffensEval 2020. Our approach emphasizes on how the emoji affects offensive language identification. Our model used LSTM combined with GloVe pre-trained word vectors to identify offensive language on social media. The best model obtained macro F1-score of 0.88428.

2019

pdf bib
Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter
Muhammad Okky Ibrohim | Indra Budi
Proceedings of the Third Workshop on Abusive Language Online

Hate speech and abusive language spreading on social media need to be detected automatically to avoid conflict between citizen. Moreover, hate speech has a target, category, and level that also needs to be detected to help the authority in prioritizing which hate speech must be addressed immediately. This research discusses multi-label text classification for abusive language and hate speech detection including detecting the target, category, and level of hate speech in Indonesian Twitter using machine learning approach with Support Vector Machine (SVM), Naive Bayes (NB), and Random Forest Decision Tree (RFDT) classifier and Binary Relevance (BR), Label Power-set (LP), and Classifier Chains (CC) as the data transformation method. We used several kinds of feature extractions which are term frequency, orthography, and lexicon features. Our experiment results show that in general RFDT classifier using LP as the transformation method gives the best accuracy with fast computational time.