2020
pdf
bib
abs
Speech-Emotion Detection in an Indonesian Movie
Fahmi Fahmi
|
Meganingrum Arista Jiwanggi
|
Mirna Adriani
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)
The growing demand to develop an automatic emotion recognition system for the Human-Computer Interaction field had pushed some research in speech emotion detection. Although it is growing, there is still little research about automatic speech emotion detection in Bahasa Indonesia. Another issue is the lack of standard corpus for this research area in Bahasa Indonesia. This study proposed several approaches to detect speech-emotion in the dialogs of an Indonesian movie by classifying them into 4 different emotion classes i.e. happiness, sadness, anger, and neutral. There are two different speech data representations used in this study i.e. statistical and temporal/sequence representations. This study used Artificial Neural Network (ANN), Recurrent Neural Network (RNN) with Long Short Term Memory (LSTM) variation, word embedding, and also the hybrid of three to perform the classification task. The best accuracies given by one-vs-rest scenario for each emotion class with speech-transcript pairs using hybrid of non-temporal and embedding approach are 1) happiness: 76.31%; 2) sadness: 86.46%; 3) anger: 82.14%; and 4) neutral: 68.51%. The multiclass classification resulted in 64.66% of precision, 66.79% of recall, and 64.83% of F1-score.
2019
pdf
bib
abs
Normalization of Indonesian-English Code-Mixed Twitter Data
Anab Maulana Barik
|
Rahmad Mahendra
|
Mirna Adriani
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)
Twitter is an excellent source of data for NLP researches as it offers tremendous amount of textual data. However, processing tweet to extract meaningful information is very challenging, at least for two reasons: (i) using nonstandard words as well as informal writing manner, and (ii) code-mixing issues, which is combining multiple languages in single tweet conversation. Most of the previous works have addressed both issues in isolated different task. In this study, we work on normalization task in code-mixed Twitter data, more specifically in Indonesian-English language. We propose a pipeline that consists of four modules, i.e tokenization, language identification, lexical normalization, and translation. Another contribution is to provide a gold standard of Indonesian-English code-mixed data for each module.
2018
pdf
bib
abs
Cross-Lingual and Supervised Learning Approach for Indonesian Word Sense Disambiguation Task
Rahmad Mahendra
|
Heninggar Septiantri
|
Haryo Akbarianto Wibowo
|
Ruli Manurung
|
Mirna Adriani
Proceedings of the 9th Global Wordnet Conference
Ambiguity is a problem we frequently face in Natural Language Processing. Word Sense Disambiguation (WSD) is a task to determine the correct sense of an ambiguous word. However, research in WSD for Indonesian is still rare to find. The availability of English-Indonesian parallel corpora and WordNet for both languages can be used as training data for WSD by applying Cross-Lingual WSD method. This training data is used as an input to build a model using supervised machine learning algorithms. Our research also examines the use of Word Embedding features to build the WSD model.
2014
pdf
bib
Automatically Building a Corpus for Sentiment Analysis on Indonesian Tweets
Alfan Farizki Wicaksono
|
Clara Vania
|
Bayu Distiawan
|
Mirna Adriani
Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing
2012
pdf
bib
Predicting Answer Location Using Shallow Semantic Analogical Reasoning in a Factoid Question Answering System
Hapnes Toba
|
Mirna Adriani
|
Hisar Maruli Manurung
Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation