Joep Kruijsen


2006

pdf bib
A Unified Structure for Dutch Dialect Dictionary Data
Folkert de Vriend | Lou Boves | Henk van den Heuvel | Roeland van Hout | Joep Kruijsen | Jos Swanenberg
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The traditional dialect vocabulary of the Netherlands and Flanders is recorded and researched in several Dutch and Belgian research institutes and universities. Most of these distributed dictionary creation and research projects collaborate in the “Permanent Overlegorgaan Regionale Woordenboeken” (ReWo). In the project “digital databases and digital tools for WBD and WLD” (D-square) the dialect data published by two of these dictionary projects (Woordenboek van de Brabantse Dialecten and Woordenboek van de Limburgse Dialecten) is being digitised. One of the additional goals of the D-square project is the development of an infrastructure for electronic access to all dialect dictionaries collaborating in the ReWo. In this paper we will firstly reconsider the nature of the core data types - form, sense and location - present in the different dialect dictionaries and the ways these data types are further classified. Next we will focus on the problems encountered when trying to unify this dictionary data and their classifications and suggest solutions. Finally we will look at several implementation issues regarding a specific encoding for the dictionaries.