Sonal Kumari
2020
EmpLite: A Lightweight Sequence Labeling Model for Emphasis Selection of Short Texts
Vibhav Agarwal
|
Sourav Ghosh
|
Kranti Ch
|
Bharath Challa
|
Sonal Kumari
|
Harshavardhana
|
Barath Raj Kandur Raja
Proceedings of the Workshop on Joint NLP Modelling for Conversational AI @ ICON 2020
Word emphasis in textual content aims at conveying the desired intention by changing the size, color, typeface, style (bold, italic, etc.), and other typographical features. The emphasized words are extremely helpful in drawing the readers’ attention to specific information that the authors wish to emphasize. However, performing such emphasis using a soft keyboard for social media interactions is time-consuming and has an associated learning curve. In this paper, we propose a novel approach to automate the emphasis word detection on short written texts. To the best of our knowledge, this work presents the first lightweight deep learning approach for smartphone deployment of emphasis selection. Experimental results show that our approach achieves comparable accuracy at a much lower model size than existing models. Our best lightweight model has a memory footprint of 2.82 MB with a matching score of 0.716 on SemEval-2020 public benchmark dataset.
Sonal.kumari at SemEval-2020 Task 12: Social Media Multilingual Offensive Text Identification and Categorization Using Neural Network Models
Sonal Kumari
Proceedings of the Fourteenth Workshop on Semantic Evaluation
In this paper, we present our approaches and results for SemEval-2020 Task 12, Multilingual Offensive Language Identification in Social Media (OffensEval 2020). The OffensEval 2020 had three subtasks: A) Identifying the tweets to be offensive (OFF) or non-offensive (NOT) for Arabic, Danish, English, Greek, and Turkish languages, B) Detecting if the offensive tweet is targeted (TIN) or untargeted (UNT) for the English language, and C) Categorizing the offensive targeted tweets into three classes, namely: individual (IND), Group (GRP), or Other (OTH) for the English language. We participate in all the subtasks A, B, and C. In our solution, first we use the pre-trained BERT model for all subtasks, A, B, and C and then we apply the BiLSTM model with attention mechanism (Attn-BiLSTM) for the same. Our result demonstrates that the pre-trained model is not giving good results for all types of languages and is compute and memory intensive whereas the Attn-BiLSTM model is fast and gives good accuracy with fewer resources. The Attn-BiLSTM model is giving better accuracy for Arabic and Greek where the pre-trained model is not able to capture the complete context of these languages due to lower vocab-size.
Search
Fix data
Co-authors
- Vibhav Agarwal 1
- Kranti Ch 1
- Bharath Challa 1
- Sourav Ghosh 1
- Harshavardhana 1
- show all...