Eline Westerhout
2009
Extraction of Definitions Using Grammar-Enhanced Machine Learning
Eline Westerhout
Proceedings of the Student Research Workshop at EACL 2009
Definition Extraction using Linguistic and Structural Features
Eline Westerhout
Proceedings of the 1st Workshop on Definition Extraction
2008
Creating Glossaries Using Pattern-Based and Machine Learning Techniques
Eline Westerhout
|
Paola Monachesi
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
One of the aims of the Language Technology for eLearning project is to show that Natural Language Processing techniques can be employed to enhance the learning process. To this end, one of the functionalities that has been developed is a pattern-based glossary candidate detector which is capable of extracting definitions in eight languages. In order to improve the results obtained with the pattern-based approach, machine learning techniques are applied on the Dutch results to filter out incorrectly extracted definitions. In this paper, we discuss the machine learning techniques used and we present the results of the quantitative evaluation. We also discuss the integration of the tool into the Learning Management System ILIAS.
2006
A pilot study for a Corpus of Dutch Aphasic Speech (CoDAS)
Eline Westerhout
|
Paola Monachesi
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
In this paper, a pilot study for the development of a corpus of Dutch Aphasic Speech (CoDAS) is presented. Given the lack of resources of this kind not only for Dutch but also for other languages, CoDAS will be able to set standards and will contribute to the future research in this area. Given the special character of the speech contained in CoDAS, we cannot simply carry over the design and annotation protocols of existing corpora, such as the Corpus Gesproken Nederlands or CHILDES. However, they have been assumed as starting point. We have investigated whether and how the procedures and protocols for the annotation (part-of-speech tagging) and transcription (orthographic and phonetic) used for the CGN should be adapted in order to annotate and transcribe aphasic speech properly. Besides, we have established the basic requirements with respect to text types, metadata, and annotation levels that CoDAS should fulfill.