Tamil Lyrics Corpus: Analysis and Experiments

Dhivya Chinnappa; Praveenraj Dhandapani

Tamil Lyrics Corpus: Analysis and Experiments

Abstract

In this paper, we present a new Tamil lyrics corpus extracted from Tamil movies captured across a range of 65 years (1954 to 2019). We present a detailed corpus analysis showing the nature of Tamil lyrics with respect to lyricists and the year which it was written. We also present similar- ity score across different lyricists based on their song lyrics. We present experi- mental results based on the SOTA BERT Tamil models to identify the lyricists of a song. Finally, we present future research directions encouraging researchers to pur- sue Tamil NLP research.

Anthology ID:: 2021.dravidianlangtech-1.1
Volume:: Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages
Month:: April
Year:: 2021
Address:: Kyiv
Editors:: Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar M, Parameswari Krishnamurthy, Elizabeth Sherly
Venue:: DravidianLangTech
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1–9
Language:
URL:: https://aclanthology.org/2021.dravidianlangtech-1.1/
DOI:
Bibkey:
Cite (ACL):: Dhivya Chinnappa and Praveenraj Dhandapani. 2021. Tamil Lyrics Corpus: Analysis and Experiments. In Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, pages 1–9, Kyiv. Association for Computational Linguistics.
Cite (Informal):: Tamil Lyrics Corpus: Analysis and Experiments (Chinnappa & Dhandapani, DravidianLangTech 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.dravidianlangtech-1.1.pdf
Software:: 2021.dravidianlangtech-1.1.Software.zip
Dataset:: 2021.dravidianlangtech-1.1.Dataset.zip

PDF Cite Search Software Dataset Fix data