Mallikarjuna Chindukuri


2023

pdf bib
T20NGD: Annotated corpus for news headlines classification in low resource language,Telugu.
Mallikarjuna Chindukuri | Sivanesan Sangeetha
Proceedings of the 20th International Conference on Natural Language Processing (ICON)

News classification allows analysts and researchers to study trends over time. Based on classification, news platforms can provide readers with related articles. Many digital news platforms and apps use classification to offer personalized content for their users. While there are numerous resources accessible for news classification in various Indian languages, there is still a lack of extensive benchmark dataset specifically for the Telugu language. Our paper presents and describes the Telugu20news group dataset, where news has been collected from various online Telugu news channels. We describe in detail the accumulation and annotation of the proposed news headlines dataset. In addition, we conducted extensive experiments on our proposed news headlines dataset in order to deliver solid baselines for future work.