T20NGD: Annotated corpus for news headlines classification in low resource language,Telugu.

Mallikarjuna Chindukuri, Sivanesan Sangeetha


Abstract
News classification allows analysts and researchers to study trends over time. Based on classification, news platforms can provide readers with related articles. Many digital news platforms and apps use classification to offer personalized content for their users. While there are numerous resources accessible for news classification in various Indian languages, there is still a lack of extensive benchmark dataset specifically for the Telugu language. Our paper presents and describes the Telugu20news group dataset, where news has been collected from various online Telugu news channels. We describe in detail the accumulation and annotation of the proposed news headlines dataset. In addition, we conducted extensive experiments on our proposed news headlines dataset in order to deliver solid baselines for future work.
Anthology ID:
2023.icon-1.35
Volume:
Proceedings of the 20th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2023
Address:
Goa University, Goa, India
Editors:
D. Pawar Jyoti, Lalitha Devi Sobha
Venue:
ICON
SIG:
SIGLEX
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
423–432
Language:
URL:
https://aclanthology.org/2023.icon-1.35
DOI:
Bibkey:
Cite (ACL):
Mallikarjuna Chindukuri and Sivanesan Sangeetha. 2023. T20NGD: Annotated corpus for news headlines classification in low resource language,Telugu.. In Proceedings of the 20th International Conference on Natural Language Processing (ICON), pages 423–432, Goa University, Goa, India. NLP Association of India (NLPAI).
Cite (Informal):
T20NGD: Annotated corpus for news headlines classification in low resource language,Telugu. (Chindukuri & Sangeetha, ICON 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.icon-1.35.pdf