Praveenraj Dhandapani


2021

pdf bib
Tamil Lyrics Corpus: Analysis and Experiments
Dhivya Chinnappa | Praveenraj Dhandapani
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

In this paper, we present a new Tamil lyrics corpus extracted from Tamil movies captured across a range of 65 years (1954 to 2019). We present a detailed corpus analysis showing the nature of Tamil lyrics with respect to lyricists and the year which it was written. We also present similar- ity score across different lyricists based on their song lyrics. We present experi- mental results based on the SOTA BERT Tamil models to identify the lyricists of a song. Finally, we present future research directions encouraging researchers to pur- sue Tamil NLP research.