Mining Hindi-English Transliteration Pairs from Online Hindi Lyrics

Kanika Gupta, Monojit Choudhury, Kalika Bali


Abstract
This paper describes a method to mine Hindi-English transliteration pairs from online Hindi song lyrics. The technique is based on the observations that lyrics are transliterated word-by-word, maintaining the precise word order. The mining task is nevertheless challenging because the Hindi lyrics and its transliterations are usually available from different, often unrelated, websites. Therefore, it is a non-trivial task to match the Hindi lyrics to their transliterated counterparts. Moreover, there are various types of noise in lyrics data that needs to be appropriately handled before songs can be aligned at word level. The mined data of 30823 unique Hindi-English transliteration pairs with an accuracy of more than 92% is available publicly. Although the present work reports mining of Hindi-English word pairs, the same technique can be easily adapted for other languages for which song lyrics are available online in native and Roman scripts.
Anthology ID:
L12-1179
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2459–2465
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/365_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Kanika Gupta, Monojit Choudhury, and Kalika Bali. 2012. Mining Hindi-English Transliteration Pairs from Online Hindi Lyrics. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 2459–2465, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Mining Hindi-English Transliteration Pairs from Online Hindi Lyrics (Gupta et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/365_Paper.pdf