Tamil Lyrics Corpus: Analysis and Experiments

Dhivya Chinnappa, Praveenraj Dhandapani


Abstract
In this paper, we present a new Tamil lyrics corpus extracted from Tamil movies captured across a range of 65 years (1954 to 2019). We present a detailed corpus analysis showing the nature of Tamil lyrics with respect to lyricists and the year which it was written. We also present similar- ity score across different lyricists based on their song lyrics. We present experi- mental results based on the SOTA BERT Tamil models to identify the lyricists of a song. Finally, we present future research directions encouraging researchers to pur- sue Tamil NLP research.
Anthology ID:
2021.dravidianlangtech-1.1
Volume:
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages
Month:
April
Year:
2021
Address:
Kyiv
Venues:
DravidianLangTech | EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–9
Language:
URL:
https://aclanthology.org/2021.dravidianlangtech-1.1
DOI:
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2021.dravidianlangtech-1.1.pdf
Software:
 2021.dravidianlangtech-1.1.Software.zip
Dataset:
 2021.dravidianlangtech-1.1.Dataset.zip