IndoUKC: A Concept-Centered Indian Multilingual Lexical Resource

Nandu Chandran Nair, Rajendran S. Velayuthan, Yamini Chandrashekar, Gábor Bella, Fausto Giunchiglia


Abstract
We introduce the IndoUKC, a new multilingual lexical database comprised of eighteen Indian languages, with a focus on formally capturing words and word meanings specific to Indian languages and cultures. The IndoUKC reuses content from the existing IndoWordNet resource while providing a new model for the cross-lingual mapping of lexical meanings that allows for a richer, diversity-aware representation. Accordingly, beyond a thorough syntactic and semantic cleaning, the IndoWordNet lexical content has been thoroughly remodeled in order to allow a more precise expression of language-specific meaning. The resulting database is made available both for browsing through a graphical web interface and for download through the LiveLanguage data catalogue.
Anthology ID:
2022.lrec-1.303
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
2833–2840
Language:
URL:
https://aclanthology.org/2022.lrec-1.303
DOI:
Bibkey:
Cite (ACL):
Nandu Chandran Nair, Rajendran S. Velayuthan, Yamini Chandrashekar, Gábor Bella, and Fausto Giunchiglia. 2022. IndoUKC: A Concept-Centered Indian Multilingual Lexical Resource. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 2833–2840, Marseille, France. European Language Resources Association.
Cite (Informal):
IndoUKC: A Concept-Centered Indian Multilingual Lexical Resource (Chandran Nair et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.303.pdf