A Seed Corpus of Hindu Temples in India

Priya Radhakrishnan


Abstract
Temples are an integral part of culture and heritage of India and are centers of religious practice for practicing Hindus. A scientific study of temples can reveal valuable insights into Indian culture and heritage. However to the best of our knowledge, learning resources that aid such a study are either not publicly available or non-existent. In this endeavour we present our initial efforts to create a corpus of Hindu temples in India. In this paper, we present a simple, re-usable platform that creates temple corpus from web text on temples. Curation is improved using classifiers trained on textual data in Wikipedia articles on Hindu temples. The training data is verified by human volunteers. The temple corpus consists of 4933 high accuracy facts about 573 temples. We make the corpus and the platform freely available. We also test the re-usability of the platform by creating a corpus of museums in India. We believe the temple corpus will aid scientific study of temples and the platform will aid in construction of similar corpuses. We believe both these will significantly contribute in promoting research on culture and heritage of a region.
Anthology ID:
2020.lrec-1.32
Volume:
Proceedings of the 12th Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
254–258
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.32
DOI:
Bibkey:
Cite (ACL):
Priya Radhakrishnan. 2020. A Seed Corpus of Hindu Temples in India. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 254–258, Marseille, France. European Language Resources Association.
Cite (Informal):
A Seed Corpus of Hindu Temples in India (Radhakrishnan, LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.32.pdf