Enhancing Aspect Extraction for Hindi

Arghya Bhattacharya, Alok Debnath, Manish Shrivastava


Abstract
Aspect extraction is not a well-explored topic in Hindi, with only one corpus having been developed for the task. In this paper, we discuss the merits of the existing corpus in terms of quality, size, sparsity, and performance in aspect extraction tasks using established models. To provide a better baseline corpus for aspect extraction, we translate the SemEval 2014 aspect-based sentiment analysis dataset and annotate the aspects in that data. We provide rigorous guidelines and a replicable methodology for this task. We quantitatively evaluate the translations and annotations using inter-annotator agreement scores. We also evaluate our dataset using state-of-the-art neural aspect extraction models in both monolingual and multilingual settings and show that the models perform far better on our corpus than on the existing Hindi dataset. With this, we establish our corpus as the gold-standard aspect extraction dataset in Hindi.
Anthology ID:
2021.ecnlp-1.17
Volume:
Proceedings of the 4th Workshop on e-Commerce and NLP
Month:
August
Year:
2021
Address:
Online
Editors:
Shervin Malmasi, Surya Kallumadi, Nicola Ueffing, Oleg Rokhlenko, Eugene Agichtein, Ido Guy
Venue:
ECNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
140–149
Language:
URL:
https://aclanthology.org/2021.ecnlp-1.17
DOI:
10.18653/v1/2021.ecnlp-1.17
Bibkey:
Cite (ACL):
Arghya Bhattacharya, Alok Debnath, and Manish Shrivastava. 2021. Enhancing Aspect Extraction for Hindi. In Proceedings of the 4th Workshop on e-Commerce and NLP, pages 140–149, Online. Association for Computational Linguistics.
Cite (Informal):
Enhancing Aspect Extraction for Hindi (Bhattacharya et al., ECNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.ecnlp-1.17.pdf