MUCS@TechDOfication using FineTuned Vectors and n-grams

Fazlourrahman Balouchzahi, M D Anusha, H L Shashirekha


Abstract
The increase in domain specific text processing applications are demanding tools and techniques for domain specific Text Classification (TC) which may be helpful in many downstream applications like Machine Translation, Summarization, Question Answering etc. Further, many TC algorithms are applied on globally recognized languages like English giving less importance for local languages particularly Indian languages. To boost the research for technical domains and text processing activities in Indian languages, a shared task named ”TechDOfication2020” is organized by ICON’20. The objective of this shared task is to automatically identify the technical domain of a given text which provides information about coarse grained technical domains and fine grained subdomains in eight languages. To tackle this challenge we, team MUCS have proposed three models, namely, DL-FineTuned model applied for all subtasks, and VC-FineTuned and VC-ngrams models applied only for some subtasks. n-grams and word embedding with a step of fine-tuning are used as features and machine learning and deep learning algorithms are used as classifiers in the proposed models. The proposed models outperformed in most of subtasks and also obtained first rank in subTask1b (Bangla) and subTask1e (Malayalam) with f1 score of 0.8353 and 0.3851 respectively using DL-FineTuned model for both the subtasks.
Anthology ID:
2020.icon-techdofication.1
Volume:
Proceedings of the 17th International Conference on Natural Language Processing (ICON): TechDOfication 2020 Shared Task
Month:
December
Year:
2020
Address:
Patna, India
Editors:
Dipti Misra Sharma, Asif Ekbal, Karunesh Arora, Sudip Kumar Naskar, Dipankar Ganguly, Sobha L, Radhika Mamidi, Sunita Arora, Pruthwik Mishra, Vandan Mujadia
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
1–5
Language:
URL:
https://aclanthology.org/2020.icon-techdofication.1
DOI:
Bibkey:
Cite (ACL):
Fazlourrahman Balouchzahi, M D Anusha, and H L Shashirekha. 2020. MUCS@TechDOfication using FineTuned Vectors and n-grams. In Proceedings of the 17th International Conference on Natural Language Processing (ICON): TechDOfication 2020 Shared Task, pages 1–5, Patna, India. NLP Association of India (NLPAI).
Cite (Informal):
MUCS@TechDOfication using FineTuned Vectors and n-grams (Balouchzahi et al., ICON 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.icon-techdofication.1.pdf