Accessing and standardizing Wiktionary lexical entries for the translation of labels in Cultural Heritage taxonomies

Thierry Declerck, Karlheinz Mörth, Piroska Lendvai


Abstract
We describe the usefulness of Wiktionary, the freely available web-based lexical resource, in providing multilingual extensions to catalogues that serve content-based indexing of folktales and related narratives. We develop conversion tools between Wiktionary and TEI, using ISO standards (LMF, MAF), to make such resources available to both the Digital Humanities community and the Language Resources community. The converted data can be queried via a web interface, while the tools of the workflow are to be released with an open source license. We report on the actual state and functionality of our tools and analyse some shortcomings of Wiktionary, as well as potential domains of application.
Anthology ID:
L12-1487
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2511–2514
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/820_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Thierry Declerck, Karlheinz Mörth, and Piroska Lendvai. 2012. Accessing and standardizing Wiktionary lexical entries for the translation of labels in Cultural Heritage taxonomies. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 2511–2514, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Accessing and standardizing Wiktionary lexical entries for the translation of labels in Cultural Heritage taxonomies (Declerck et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/820_Paper.pdf