Gijs Wijnholds

2025

Dataset Creation for Visual Entailment using Generative AI
Rob Reijtenbach | Suzan Verberne | Gijs Wijnholds
Proceedings of the 5th Workshop on Natural Logic Meets Machine Learning (NALOMA)

In this paper we present and validate a new synthetic dataset for training visual entailment models. Existing datasets for visual entailment are small and sparse compared to datasets for textual entailment. Manually creating datasets is labor-intensive. We base our synthetic dataset on the SNLI dataset for textual entailment. We take the premise text from SNLI as input prompts in a generative image model, Stable Diffusion, creating an image to replace each textual premise. We evaluate our dataset both intrinsically and extrinsically. For extrinsic evaluation, we evaluate the validity of the generated images by using them as training data for a visual entailment classifier based on CLIP feature vectors. We find that synthetic training data only leads to a slight drop in quality on SNLI-VE, with an F-score 0.686 compared to 0.703 when trained on real data. We also compare the quality of our generated training data to original training data on another dataset: SICK-VTE. Again, there is only a slight drop in F-score: from 0.400 to 0.384. These results indicate that in settings with data sparsity, synthetic data can be a promising solution for training visual entailment models.

pdf bib abs

In the Mood for Inference: Logic-Based Natural Language Inference with Large Language Models
Bill Noble | Rasmus Blanck | Gijs Wijnholds
Proceedings of the 5th Workshop on Natural Logic Meets Machine Learning (NALOMA)

In this paper we explore challenging Natural Language Inference datasets with a logic-based approach and Large Language Models (LLMs), in order to assess the validity of this hybrid strategy. We report on an experiment which combines an LLM meta-prompting strategy, eliciting logical representations, and Prover9, a first-order logic theorem prover. In addition, we experiment with the inclusion of (logical) world knowledge. Our findings suggest that (i) broad performance is sometimes on par, (ii) formula generation is rather brittle, and (iii) world knowledge aids performance relative to data annotation. We argue that these results explicate the weaknesses of both approaches. As such, we consider this study a source of inspiration for future work in the field of neuro-symbolic reasoning.

2024

pdf bib abs

Tree Transformer’s Disambiguation Ability of Prepositional Phrase Attachment and Garden Path Effects
Lingling Zhou | Suzan Verberne | Gijs Wijnholds
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This work studies two types of ambiguity in natural language: prepositional phrase (PP) attachment ambiguity, and garden path constructions. Due to the different nature of these ambiguities – one being structural, the other incremental in nature – we pretrain and evaluate the Tree Transformer of Wang et al. (2019), an unsupervised Transformer model that induces tree representations internally. To assess PP attachment ambiguity we inspect the model’s induced parse trees against a newly prepared dataset derived from the PP attachment corpus (Ratnaparkhi et al., 1994). Measuring garden path effects is done by considering surprisal rates of the underlying language model on a number of dedicated test suites, following Futrell et al. (2019). For comparison we evaluate a pretrained supervised BiLSTM-based model trained on constituency parsing as sequence labelling (Gómez-Rodríguez and Vilares, 2018). Results show that the unsupervised Tree Transformer does exhibit garden path effects, but its parsing ability is far inferior to the supervised BiLSTM, and it is not as sensitive to lexical cues as other large LSTM models, suggesting that supervised parsers based on a pre-Transformer architecture may be the better choice in the presence of ambiguity.

2023

pdf bib abs

Assessing Monotonicity Reasoning in Dutch through Natural Language Inference
Gijs Wijnholds
Findings of the Association for Computational Linguistics: EACL 2023

In this paper we investigate monotonicity reasoning in Dutch, through a novel Natural Language Inference dataset. Monotonicity reasoning shows to be highly challenging for Transformer-based language models in English and here, we corroborate those findings using a parallel Dutch dataset, obtained by translating the Monotonicity Entailment Dataset of Yanaka et al. (2019). After fine-tuning two Dutch language models BERTje and RobBERT on the Dutch NLI dataset SICK-NL, we find that performance severely drops on the monotonicity reasoning dataset, indicating poor generalization capacity of the models. We provide a detailed analysis of the test results by means of the linguistic annotations in the dataset. We find that models struggle with downward entailing contexts, and argue that this is due to a poor understanding of negation. Additionally, we find that the choice of monotonicity context affects model performance on conjunction and disjunction. We hope that this new resource paves the way for further research in generalization of neural reasoning models in Dutch, and contributes to the development of better language technology for Natural Language Inference, specifically for Dutch.

pdf bib abs

Improving BERT Pretraining with Syntactic Supervision
Georgios Tziafas | Konstantinos Kogkalidis | Gijs Wijnholds | Michael Moortgat
Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD)

Bidirectional masked Transformers have become the core theme in the current NLP landscape. Despite their impressive benchmarks, a recurring theme in recent research has been to question such models’ capacity for syntactic generalization. In this work, we seek to address this question by adding a supervised, token-level supertagging objective to standard unsupervised pretraining, enabling the explicit incorporation of syntactic biases into the network’s training dynamics. Our approach is straightforward to implement, induces a marginal computational overhead and is general enough to adapt to a variety of settings. We apply our methodology on Lassy Large, an automatically annotated corpus of written Dutch. Our experiments suggest that our syntax-aware model performs on par with established baselines, despite Lassy Large being one order of magnitude smaller than commonly used corpora.

pdf bib abs

Structural Ambiguity and its Disambiguation in Language Model Based Parsers: the Case of Dutch Clause Relativization
Gijs Wijnholds | Michael Moortgat
Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL)

This paper addresses structural ambiguity in Dutch relative clauses. By investigating the task of disambiguation by grounding, we study how the presence of a prior sentence can resolve relative clause ambiguities. We apply this method to two parsing architectures in an attempt to demystify the parsing and language model components of two present-day neural parsers. Results show that a neurosymbolic parser, based on proof nets, is more open to data bias correction than an approach based on universal dependencies, although both set-ups suffer from a comparable initial data bias.

2022

pdf bib abs

Discontinuous Constituency and BERT: A Case Study of Dutch
Konstantinos Kogkalidis | Gijs Wijnholds
Findings of the Association for Computational Linguistics: ACL 2022

In this paper, we set out to quantify the syntactic capacity of BERT in the evaluation regime of non-context free patterns, as occurring in Dutch. We devise a test suite based on a mildly context-sensitive formalism, from which we derive grammars that capture the linguistic phenomena of control verb nesting and verb raising. The grammars, paired with a small lexicon, provide us with a large collection of naturalistic utterances, annotated with verb-subject pairings, that serve as the evaluation test bed for an attention-based span selection probe. Our results, backed by extensive analysis, suggest that the models investigated fail in the implicit acquisition of the dependencies examined.

2021

pdf bib abs

SICK-NL: A Dataset for Dutch Natural Language Inference
Gijs Wijnholds | Michael Moortgat
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

We present SICK-NL (read: signal), a dataset targeting Natural Language Inference in Dutch. SICK-NL is obtained by translating the SICK dataset of (Marelli et al., 2014) from English into Dutch. Having a parallel inference dataset allows us to compare both monolingual and multilingual NLP models for English and Dutch on the two tasks. In the paper, we motivate and detail the translation process, perform a baseline evaluation on both the original SICK dataset and its Dutch incarnation SICK-NL, taking inspiration from Dutch skipgram embeddings and contextualised embedding models. In addition, we encapsulate two phenomena encountered in the translation to formulate stress tests and verify how well the Dutch models capture syntactic restructurings that do not affect semantics. Our main finding is all models perform worse on SICK-NL than on SICK, indicating that the Dutch dataset is more challenging than the English original. Results on the stress tests show that models don’t fully capture word order freedom in Dutch, warranting future systematic studies.

2020

pdf bib abs

Representation Learning for Type-Driven Composition
Gijs Wijnholds | Mehrnoosh Sadrzadeh | Stephen Clark
Proceedings of the 24th Conference on Computational Natural Language Learning

This paper is about learning word representations using grammatical type information. We use the syntactic types of Combinatory Categorial Grammar to develop multilinear representations, i.e. maps with n arguments, for words with different functional types. The multilinear maps of words compose with each other to form sentence representations. We extend the skipgram algorithm from vectors to multi- linear maps to learn these representations and instantiate it on unary and binary maps for transitive verbs. These are evaluated on verb and sentence similarity and disambiguation tasks and a subset of the SICK relatedness dataset. Our model performs better than previous type- driven models and is competitive with state of the art representation learning methods such as BERT and neural sentence encoders.

pdf bib abs

A toy distributional model for fuzzy generalised quantifiers
Mehrnoosh Sadrzadeh | Gijs Wijnholds
Proceedings of the Probability and Meaning Conference (PaM 2020)

Recent work in compositional distributional semantics showed how bialgebras model generalised quantifiers of natural language. That technique requires working with vector space over power sets of bases, and therefore is computationally costly. It is possible to overcome the computational hurdles by working with fuzzy generalised quantifiers. In this paper, we show that the compositional notion of semantics of natural language, guided by a grammar, extends from a binary to a many valued setting and instantiate in it the fuzzy computations. We import vector representations of words and predicates, learnt from large scale compositional distributional semantics, interpret them as fuzzy sets, and analyse their performance on a toy inference dataset.

2019

pdf bib abs

Evaluating Composition Models for Verb Phrase Elliptical Sentence Embeddings
Gijs Wijnholds | Mehrnoosh Sadrzadeh
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Ellipsis is a natural language phenomenon where part of a sentence is missing and its information must be recovered from its surrounding context, as in “Cats chase dogs and so do foxes.”. Formal semantics has different methods for resolving ellipsis and recovering the missing information, but the problem has not been considered for distributional semantics, where words have vector embeddings and combinations thereof provide embeddings for sentences. In elliptical sentences these combinations go beyond linear as copying of elided information is necessary. In this paper, we develop different models for embedding VP-elliptical sentences. We extend existing verb disambiguation and sentence similarity datasets to ones containing elliptical phrases and evaluate our models on these datasets for a variety of non-linear combinations and their linear counterparts. We compare results of these compositional models to state of the art holistic sentence encoders. Our results show that non-linear addition and a non-linear tensor-based composition outperform the naive non-compositional baselines and the linear models, and that sentence encoders perform well on sentence similarity, but not on verb disambiguation.

Co-authors

Venues

PaM1