Fine-grained domain classification using Transformers

Akshat Gahoi, Akshat Chhajer, Dipti Mishra Sharma


Abstract
The introduction of transformers in 2017 and successively BERT in 2018 brought about a revolution in the field of natural language processing. Such models are pretrained on vast amounts of data, and are easily extensible to be used for a wide variety of tasks through transfer learning. Continual work on transformer based architectures has led to a variety of new models with state of the art results. RoBERTa(CITATION) is one such model, which brings about a series of changes to the BERT architecture and is capable of producing better quality embeddings at an expense of functionality. In this paper, we attempt to solve the well known text classification task of fine-grained domain classification using BERT and RoBERTa and perform a comparative analysis of the same. We also attempt to evaluate the impact of data preprocessing specially in the context of fine-grained domain classification. The results obtained outperformed all the other models at the ICON TechDOfication 2020 (subtask-2a) Fine-grained domain classification task and ranked first. This proves the effectiveness of our approach.
Anthology ID:
2020.icon-techdofication.7
Volume:
Proceedings of the 17th International Conference on Natural Language Processing (ICON): TechDOfication 2020 Shared Task
Month:
December
Year:
2020
Address:
Patna, India
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
31–34
Language:
URL:
https://aclanthology.org/2020.icon-techdofication.7
DOI:
Bibkey:
Cite (ACL):
Akshat Gahoi, Akshat Chhajer, and Dipti Mishra Sharma. 2020. Fine-grained domain classification using Transformers. In Proceedings of the 17th International Conference on Natural Language Processing (ICON): TechDOfication 2020 Shared Task, pages 31–34, Patna, India. NLP Association of India (NLPAI).
Cite (Informal):
Fine-grained domain classification using Transformers (Gahoi et al., ICON 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.icon-techdofication.7.pdf