2016
pdf
bib
abs
AfriBooms: An Online Treebank for Afrikaans
Liesbeth Augustinus
|
Peter Dirix
|
Daniel van Niekerk
|
Ineke Schuurman
|
Vincent Vandeghinste
|
Frank Van Eynde
|
Gerhard van Huyssteen
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Compared to well-resourced languages such as English and Dutch, natural language processing (NLP) tools for Afrikaans are still not abundant. In the context of the AfriBooms project, KU Leuven and the North-West University collaborated to develop a first, small treebank, a dependency parser, and an easy to use online linguistic search engine for Afrikaans for use by researchers and students in the humanities and social sciences. The search tool is based on a similar development for Dutch, i.e. GrETEL, a user-friendly search engine which allows users to query a treebank by means of a natural language example instead of a formal search instruction.
2014
pdf
bib
abs
The Development of Dutch and Afrikaans Language Resources for Compound Boundary Analysis.
Menno van Zaanen
|
Gerhard van Huyssteen
|
Suzanne Aussems
|
Chris Emmery
|
Roald Eiselen
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
In most languages, new words can be created through the process of compounding, which combines two or more words into a new lexical unit. Whereas in languages such as English the components that make up a compound are separated by a space, in languages such as Finnish, German, Afrikaans and Dutch these components are concatenated into one word. Compounding is very productive and leads to practical problems in developing machine translators and spelling checkers, as newly formed compounds cannot be found in existing lexicons. The Automatic Compound Processing (AuCoPro) project deals with the analysis of compounds in two closely-related languages, Afrikaans and Dutch. In this paper, we present the development and evaluation of two datasets, one for each language, that contain compound words with annotated compound boundaries. Such datasets can be used to train classifiers to identify the compound components in novel compounds. We describe the process of annotation and provide an overview of the annotation guidelines as well as global properties of the datasets. The inter-rater agreements between the annotators are considered highly reliable. Furthermore, we show the usability of these datasets by building an initial automatic compound boundary detection system, which assigns compound boundaries with approximately 90% accuracy.
pdf
bib
Proceedings of the First Workshop on Computational Approaches to Compound Analysis (ComAComA 2014)
Ben Verhoeven
|
Walter Daelemans
|
Menno van Zaanen
|
Gerhard van Huyssteen
Proceedings of the First Workshop on Computational Approaches to Compound Analysis (ComAComA 2014)
pdf
bib
Automatic Compound Processing: Compound Splitting and Semantic Analysis for Afrikaans and Dutch
Ben Verhoeven
|
Menno van Zaanen
|
Walter Daelemans
|
Gerhard van Huyssteen
Proceedings of the First Workshop on Computational Approaches to Compound Analysis (ComAComA 2014)
pdf
bib
A Taxonomy for Afrikaans and Dutch Compounds
Gerhard van Huyssteen
|
Ben Verhoeven
Proceedings of the First Workshop on Computational Approaches to Compound Analysis (ComAComA 2014)
2013
pdf
bib
More Than Only Noun-Noun Compounds: Towards an Annotation Scheme for the Semantic Modelling of Other Noun Compound Types
Ben Verhoeven
|
Gerhard B. van Huyssteen
Proceedings of the 9th Joint ISO - ACL SIGSEM Workshop on Interoperable Semantic Annotation
2012
pdf
bib
abs
Aspects of a Legal Framework for Language Resource Management
Aditi Sharma Grover
|
Annamart Nieman
|
Gerhard Van Huyssteen
|
Justus Roux
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
The management of language resources requires several legal aspects to be taken into consideration. In this paper we discuss a number of these aspects which lead towards the formation of a legal framework for a language resources management agency. The legal framework entails examination of; the agency's stakeholders and the relationships that exist amongst them, the privacy and intellectual property rights that exist around the language resources offered by the agency, and the external (e.g. laws, acts, policies) and internal legal instruments (e.g. end user licence agreements) required for the agency's operation.
2010
pdf
bib
Automatic Extraction of Constructional Schemas
Gerhard van Huyssteen
|
Marelie Davel
Proceedings of the NAACL HLT Workshop on Extracting and Using Constructions in Computational Linguistics
pdf
bib
abs
The South African Human Language Technologies Audit
Aditi Sharma Grover
|
Gerhard B. van Huyssteen
|
Marthinus W. Pretorius
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Human language technologies (HLT) can play a vital role in bridging the digital divide and thus the HLT field has been recognised as a priority area by the South African government. We present our work on conducting a technology audit on the South African HLT landscape across the countrys eleven official languages. The process and the instruments employed in conducting the audit are described and an overview of the various complementary approaches used in the results analysis is provided. We find that a number of HLT language resources (LRs) are available in SA but they are of a very basic and exploratory nature. Lessons learnt in conducting a technology audit in a young and multilingual context are also discussed.
2009
pdf
bib
Prototype-based Active Learning for Lemmatization
Walter Daelemans
|
Hendrik J. Groenewald
|
Gerhard B. van Huyssteen
Proceedings of the International Conference RANLP-2009
2005
pdf
bib
Teaching Language Technology at the North-West University
Suléne Pilon
|
Gerhard B van Huyssteen
|
Bertus van Rooy
Proceedings of the Second ACL Workshop on Effective Tools and Methodologies for Teaching NLP and CL