Lou Boves

Also published as: Louis Boves

The traditional dialect vocabulary of the Netherlands and Flanders is recorded and researched in several Dutch and Belgian research institutes and universities. Most of these distributed dictionary creation and research projects collaborate in the Permanent Overlegorgaan Regionale Woordenboeken (ReWo). In the project digital databases and digital tools for WBD and WLD (D-square) the dialect data published by two of these dictionary projects (Woordenboek van de Brabantse Dialecten and Woordenboek van de Limburgse Dialecten) is being digitised. One of the additional goals of the D-square project is the development of an infrastructure for electronic access to all dialect dictionaries collaborating in the ReWo. In this paper we will firstly reconsider the nature of the core data types - form, sense and location - present in the different dialect dictionaries and the ways these data types are further classified. Next we will focus on the problems encountered when trying to unify this dictionary data and their classifications and suggest solutions. Finally we will look at several implementation issues regarding a specific encoding for the dictionaries.

pdf bib abs

Data for question answering: The case of why
Suzan Verberne | Lou Boves | Nelleke Oostdijk | Peter-Arno Coppen
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

For research and development of an approach for automatically answering why-questions (why-QA) a data collection was created. The data set was obtained by way of elicitation and comprises a total of 395 why-questions. For each question, the data set includes the source document and one or two user-formulated answers. In addition, for a subset of the questions, user-formulated paraphrases are available. All question-answer pairs have been annotated with information on topic and semantic answer type. The resulting data set is of importance not only for our research, but we expect it to contribute to and stimulate other research in the field of why-QA.

pdf bib abs

User requirements analysis for the design of a reference corpus of written Dutch
Nelleke Oostdijk | Lou Boves
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The Dutch Language Corpus Initiative (D-Coi) project aims to specify the design of a 500-million-word reference corpus of written Dutch, and to put the tools and procedures in place that are needed to actually construct such a corpus. One of the tasks in the project is to conduct a user requirements study that should provide the basis for the eventual design of the 500-million-word reference corpus. The present paper outlines the user requirements analysis and reports the results so far.

2004

pdf bib abs

Using Large Multi-purpose Corpora for Specific Research Questions: Discourse Phenomena Related to Wh-questions in the Spoken Dutch Corpus
Nelleke Oostdijk | Lou Boves
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

In this paper, we investigate whether a dataset derived from a multi-purpose corpus such as the Spoken Dutch Corpus may be considered appropriate for developing a taxonomy of wh-questions, and a model of the way in which these questions are integrated in spoken discourse. We compare the results obtained from the Spoken Dutch Corpus with a similar analysis of a large random collection of FAQs from the internet. We find substantial differences between the questions in spoken discourse and FAQs. Therefore, it may not be trivial to use a general purpose corpus as a starting point for developing models for human-computer interaction.

pdf bib

Improving Automatic Phonetic Transcription of Spontaneous Speech Through Variant-Based Pronunciation Variation Modelling
Diana Binnenpoorte | Catia Cucchiarini | Helmer Strik | Lou Boves
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)