Menno van Zaanen - ACL Anthology

Menno van Zaanen

Also published as: Menno van Zannen, Menno Van Zaanen

2024

Annotating Mystery Novels: Guidelines and Adaptations
Nuette Heyns | Menno Van Zaanen
Proceedings of the 6th Workshop on Narrative Understanding

To understand how stories are structured, we would like to be able to analyze the architecture of narratives. This article reviews and compares existing annotation guidelines for scene and narrative level annotation. We propose new guidelines, based on existing ones, and show how these can be effectively extended from general-purpose to specialized contexts, such as mystery novels which feature unique narrative elements like red herrings and plot twists. This provides a controlled environment for examining genre-specific event structuring. Additionally, we present a newly annotated genre-specific dataset of mystery novels, offering valuable resources for training and evaluating models in narrative understanding. This study aims to enhance annotation practices and advance the development of computational models for narrative analysis.

Proceedings of the Fifth Workshop on Resources for African Indigenous Languages @ LREC-COLING 2024
Rooweither Mabuya | Muzi Matfunjwa | Mmasibidi Setaka | Menno van Zaanen
Proceedings of the Fifth Workshop on Resources for African Indigenous Languages @ LREC-COLING 2024

Adapting Nine Traditional Text Readability Measures into Sesotho
Johannes Sibeko | Menno van Zaanen
Proceedings of the Fifth Workshop on Resources for African Indigenous Languages @ LREC-COLING 2024

This article discusses the adaptation of traditional English readability measures into Sesotho, a Southern African indigenous low-resource language. We employ the use of a translated readability corpus to extract textual features from the Sesotho texts and readability levels from the English translations. We look at the correlation between the different features to ensure that non-competing features are used in the readability metrics. Next, through linear regression analyses, we examine the impact of the text features from the Sesotho texts on the overall readability levels (which are gauged from the English translations). Starting from the structure of the traditional English readability measures, linear regression models identify coefficients and intercepts for the different variables considered in the readability formulas for Sesotho. In the end, we propose ten readability formulas for Sesotho (one more than the initial nine; we provide two formulas based on the structure of the Gunning Fog index). We also introduce intercepts for the Gunning Fog index, the Läsbarhets index and the Readability index (which do not have intercepts in the English variants) in the Sesotho formulas.

2023

Proceedings of the Fourth workshop on Resources for African Indigenous Languages (RAIL 2023)
Rooweither Mabuya | Don Mthobela | Mmasibidi Setaka | Menno Van Zaanen
Proceedings of the Fourth workshop on Resources for African Indigenous Languages (RAIL 2023)

2022

Detecting Multiple Transitions in Literary Texts
Nuette Heyns | Menno van Zaanen
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Identifying the high level structure of texts provides important information when performing distant reading analysis. The structure of texts is not necessarily linear, as transitions, such as changes in the scenery or flashbacks, can be present. As a first step in identifying this structure, we aim to identify transitions in texts. Previous work (Heyns and van Zaanen, 2021) proposed a system that can successfully identify one transition in literary texts. The text is split in snippets and LDA is applied, resulting in a sequence of topics. A transition is introduced at the point that separates the topics (before and after the point) best. In this article, we extend the existing system such that it can detect multiple transitions. Additionally, we introduce a new system that inherently handles multiple transitions in texts. The new system also relies on LDA information, but is more robust than the previous system. We apply these systems to texts with known transitions (as they are constructed by concatenating text snippets stemming from different source texts) and evaluation both systems on texts with one transition and texts with two transitions. As both systems rely on LDA to identify transitions between snippets, we also show the impact of varying the number of LDA topics on the results as well. The new system consistently outperforms the previous system, not only on texts with multiple transitions, but also on single boundary texts.

2020

A Process-oriented Dataset of Revisions during Writing
Rianne Conijn | Emily Dux Speltz | Menno van Zaanen | Luuk Van Waes | Evgeny Chukharev-Hudilainen
Proceedings of the Twelfth Language Resources and Evaluation Conference

Revision plays a major role in writing and the analysis of writing processes. Revisions can be analyzed using a product-oriented approach (focusing on a finished product, the text that has been produced) or a process-oriented approach (focusing on the process that the writer followed to generate this product). Although several language resources exist for the product-oriented approach to revisions, there are hardly any resources available yet for an in-depth analysis of the process of revisions. Therefore, we provide an extensive dataset on revisions made during writing (accessible via https://hdl.handle.net/10411/VBDYGX). This dataset is based on keystroke data and eye tracking data of 65 students from a variety of backgrounds (undergraduate and graduate English as a first language and English as a second language students) and a variety of tasks (argumentative text and academic abstract). In total, 7,120 revisions were identified in the dataset. For each revision, 18 features have been manually annotated and 31 features have been automatically extracted. As a case study, we show two potential use cases of the dataset. In addition, future uses of the dataset are described.

Proceedings of the first workshop on Resources for African Indigenous Languages
Rooweither Mabuya | Phathutshedzo Ramukhadi | Mmasibidi Setaka | Valencia Wagner | Menno van Zaanen
Proceedings of the first workshop on Resources for African Indigenous Languages

2018

A Multilingual Wikified Data Set of Educational Material
Iris Hendrickx | Eirini Takoulidou | Thanasis Naskos | Katia Lida Kermanidis | Vilelmini Sosoni | Hugo de Vos | Maria Stasimioti | Menno van Zaanen | Panayota Georgakopoulou | Valia Kordoni | Maja Popovic | Markus Egg | Antal van den Bosch
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Translation Crowdsourcing: Creating a Multilingual Corpus of Online Educational Content
Vilelmini Sosoni | Katia Lida Kermanidis | Maria Stasimioti | Thanasis Naskos | Eirini Takoulidou | Menno van Zaanen | Sheila Castilho | Panayota Georgakopoulou | Valia Kordoni | Markus Egg
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Improving Machine Translation of Educational Content via Crowdsourcing
Maximiliana Behnke | Antonio Valerio Miceli Barone | Rico Sennrich | Vilelmini Sosoni | Thanasis Naskos | Eirini Takoulidou | Maria Stasimioti | Menno van Zaanen | Sheila Castilho | Federico Gaspari | Panayota Georgakopoulou | Valia Kordoni | Markus Egg | Katia Lida Kermanidis
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

The Influence of Context on the Learning of Metrical Stress Systems Using Finite-State Machines
Cesko Voeten | Menno van Zaanen
Computational Linguistics, Volume 44, Issue 2 - June 2018

Languages vary in the way stress is assigned to syllables within words. This article investigates the learnability of stress systems in a wide range of languages. The stress systems can be described using finite-state automata with symbols indicating levels of stress (primary, secondary, or no stress). Finite-state automata have been the focus of research in the area of grammatical inference for some time now. It has been shown that finite-state machines are learnable from examples using state-merging. One such approach, which aims to learn k-testable languages, has been applied to stress systems with some success. The family of k-testable languages has been shown to be efficiently learnable (in polynomial time). Here, we extend this approach to k, l-local languages by taking not only left context, but also right context, into account. We consider empirical results testing the performance of our learner using various amounts of context (corresponding to varying definitions of phonological locality). Our results show that our approach of learning stress patterns using state-merging is more reliant on left context than on right context. Additionally, some stress systems fail to be learned by our learner using either the left-context k-testable or the left-and-right-context k, l-local learning system. A more complex merging strategy, and hence grammar representation, is required for these stress systems.

2016

2015

2014

OpenSoNaR: user-driven development of the SoNaR corpus interfaces
Martin Reynaert | Matje van de Camp | Menno van Zaanen
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations

The Development of Dutch and Afrikaans Language Resources for Compound Boundary Analysis.
Menno van Zaanen | Gerhard van Huyssteen | Suzanne Aussems | Chris Emmery | Roald Eiselen
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In most languages, new words can be created through the process of compounding, which combines two or more words into a new lexical unit. Whereas in languages such as English the components that make up a compound are separated by a space, in languages such as Finnish, German, Afrikaans and Dutch these components are concatenated into one word. Compounding is very productive and leads to practical problems in developing machine translators and spelling checkers, as newly formed compounds cannot be found in existing lexicons. The Automatic Compound Processing (AuCoPro) project deals with the analysis of compounds in two closely-related languages, Afrikaans and Dutch. In this paper, we present the development and evaluation of two datasets, one for each language, that contain compound words with annotated compound boundaries. Such datasets can be used to train classifiers to identify the compound components in novel compounds. We describe the process of annotation and provide an overview of the annotation guidelines as well as global properties of the datasets. The inter-rater agreements between the annotators are considered highly reliable. Furthermore, we show the usability of these datasets by building an initial automatic compound boundary detection system, which assigns compound boundaries with approximately 90% accuracy.

Proceedings of the First Workshop on Computational Approaches to Compound Analysis (ComAComA 2014)
Ben Verhoeven | Walter Daelemans | Menno van Zaanen | Gerhard van Huyssteen
Proceedings of the First Workshop on Computational Approaches to Compound Analysis (ComAComA 2014)

Automatic Compound Processing: Compound Splitting and Semantic Analysis for Afrikaans and Dutch
Ben Verhoeven | Menno van Zaanen | Walter Daelemans | Gerhard van Huyssteen
Proceedings of the First Workshop on Computational Approaches to Compound Analysis (ComAComA 2014)

2011

Formal and Empirical Grammatical Inference
Jeffrey Heinz | Colin de la Higuera | Menno van Zannen
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

2009

Language Models for Contextual Error Detection and Correction
Herman Stehouwer | Menno van Zaanen
Proceedings of the EACL 2009 Workshop on Computational Linguistic Aspects of Grammatical Inference

Proceedings of the EACL 2009 Workshop on Computational Linguistic Aspects of Grammatical Inference
Menno van Zaanen | Colin de la Higuera
Proceedings of the EACL 2009 Workshop on Computational Linguistic Aspects of Grammatical Inference

Grammatical Inference and Computational Linguistics
Menno van Zaanen | Colin de la Higuera
Proceedings of the EACL 2009 Workshop on Computational Linguistic Aspects of Grammatical Inference

2007

Named Entity Recognition in Question Answering of Speech Data
Diego Mollá | Menno van Zaanen | Steve Cassidy
Proceedings of the Australasian Language Technology Workshop 2007

2006

Named Entity Recognition for Question Answering
Diego Mollá | Menno van Zaanen | Daniel Smith
Proceedings of the Australasian Language Technology Workshop 2006

2005

DEMOCRAT: Deciding between Multiple Outputs Created by Automatic Translation
Menno van Zaanen | Harold Somers
Proceedings of Machine Translation Summit X: Papers

Proceedings of the Australasian Language Technology Workshop 2005
Timothy Baldwin | James Curran | Menno van Zaanen
Proceedings of the Australasian Language Technology Workshop 2005

Learning of Graph Rules for Question Answering
Diego Molla | Menno van Zaanen
Proceedings of the Australasian Language Technology Workshop 2005

2000

ABL: Alignment-Based Learning
Menno van Zaanen
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics

Co-authors

Iris Hendrickx 4

Antal van den Bosch 4

Kostadin Cholakov 3

Maria Gialama 3

Rooweither Mabuya 3

Thanasis Naskos 3

Michael Papadopoulos 3

Mmasibidi Setaka 3

Maria Stasimioti 3

Eirini Takoulidou 3

Dimitrios Tsoumakos 3

Colin de la Higuera 3

Gerhard B. van Huyssteen 3

Sheila Castilho 2

Walter Daelemans 2

Federico Gaspari 2

Maja Popović 2

Rico Sennrich 2

Ben Verhoeven 2

Suzanne Aussems 1

Timothy Baldwin 1

Maximiliana Behnke 1

Steve Cassidy 1

Evgeny Chukharev-Hudilainen 1

Rianne Conijn 1

James R. Curran 1

Emily Dux Speltz 1

Roald Eiselen 1

Yota Georgakopolou 1

Jeffrey Heinz 1

Muzi Matfunjwa 1

Antonio Valerio Miceli-Barone 1

Joss Moorkens 1

Phathutshedzo Ramukhadi 1

Martin Reynaert 1

Johannes Sibeko 1

Harold Somers 1

Herman Stehouwer 1

Luuk Van Waes 1

Valencia Wagner 1

Matje van de Camp 1

Venues