Carl Vogel

2023

The Proof is in the Pudding: Using Automated Theorem Proving to Generate Cooking Recipes
Louis Mahon | Carl Vogel
Journal for Language Technology and Computational Linguistics, Vol. 36 No. 2

This paper presents FASTFOOD, a rule-based natural language generation (NLG) program for cooking recipes. We consider the representation of cooking recipes as discourse representation, because the meaning of each sentence needs to consider the context of the others. Our discourse representation system is based on states of affairs and transtions between states of affairs, and does not use discourse referents. Recipes are generated by using an automated theorem-proving procedure to select the ingredients and instructions, with ingredients corresponding to axioms and instructions to implications. FASTFOOD also contains a temporal optimization module which can rearrange the recipe to make it more time efficient for the user, e.g. the recipe specifies to chop the vegetables while the rice is boiling. The system is described in detail, including the decision to forgo discourse referents and how plausible representations of nouns and verbs emerge purely as a by-product of the practical requirements of efficiently representing recipe content. A comparison is then made with existing recipe generation systems, NLG systems more generally, and automated theorem provers.

2022

pdf bib abs

Mutual Gaze and Linguistic Repetition in a Multimodal Corpus
Anais Murat | Maria Koutsombogera | Carl Vogel
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper investigates the correlation between mutual gaze and linguistic repetition, a form of alignment, which we take as evidence of mutual understanding. We focus on a multimodal corpus made of three-party conversations and explore the question of whether mutual gaze events correspond to moments of repetition or non-repetition. Our results, although mainly significant on word unigrams and bigrams, suggest positive correlations between the presence of mutual gaze and the repetitions of tokens, lemmas, or parts-of-speech, but negative correlations when it comes to paired levels of representation (tokens or lemmas associated with their part-of-speech). No compelling correlation is found with duration of mutual gaze. Results are strongest when ignoring punctuation as representations of pauses, intonation, etc. in counting aligned tokens.

pdf bib abs

Features and Categories of Hyperbole in Cyberbullying Discourse on Social Media
Simona Ignat | Carl Vogel
Proceedings of the Second International Workshop on Resources and Techniques for User Information in Abusive Language Analysis

Cyberbullying discourse is achieved with multiple linguistic conveyances. Hyperboles witnessed in a corpus of cyberbullying utterances are studied. Linguistic features of hyperbole using the traditional grammatical indications of exaggerations are analyzed. The method relies on data selected from a larger corpus of utterances identified and labelled as “bullying”, from Twitter, from October 2020 to March 2022. An outcome is a lexicon of 250 entries. A small number of lexical level features have been isolated, and chi-squared contingency tests applied to evaluating their information value in identifying hyperbole. Words or affixes indicating superlatives or extremes of scales, with positive but not negative valency items, interact with hyperbole classification in this data set. All utterances extracted has been considered exaggerations and the stylistic status of “hyperbole” has been commented within the frame of new meanings in the context of social media.

2021

pdf bib abs

English Machine Reading Comprehension Datasets: A Survey
Daria Dzendzik | Jennifer Foster | Carl Vogel
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

This paper surveys 60 English Machine Reading Comprehension datasets, with a view to providing a convenient resource for other researchers interested in this problem. We categorize the datasets according to their question and answer form and compare them across various dimensions including size, vocabulary, data source, method of creation, human performance level, and first question word. Our analysis reveals that Wikipedia is by far the most common data source and that there is a relative lack of why, when, and where questions across datasets.

2020

pdf bib abs

An Evaluation Method for Diachronic Word Sense Induction
Ashjan Alsulaimani | Erwan Moreau | Carl Vogel
Findings of the Association for Computational Linguistics: EMNLP 2020

The task of Diachronic Word Sense Induction (DWSI) aims to identify the meaning of words from their context, taking the temporal dimension into account. In this paper we propose an evaluation method based on large-scale time-stamped annotated biomedical data, and a range of evaluation measures suited to the task. The approach is applied to two recent DWSI systems, thus demonstrating its relevance and providing an in-depth analysis of the models.

pdf bib abs

Q. Can Knowledge Graphs be used to Answer Boolean Questions? A. It’s complicated!
Daria Dzendzik | Carl Vogel | Jennifer Foster
Proceedings of the First Workshop on Insights from Negative Results in NLP

In this paper we explore the problem of machine reading comprehension, focusing on the BoolQ dataset of Yes/No questions. We carry out an error analysis of a BERT-based machine reading comprehension model on this dataset, revealing issues such as unstable model behaviour and some noise within the dataset itself. We then experiment with two approaches for integrating information from knowledge graphs: (i) concatenating knowledge graph triples to text passages and (ii) encoding knowledge with a Graph Neural Network. Neither of these approaches show a clear improvement and we hypothesize that this may be due to a combination of inaccuracies in the knowledge graph, imprecision in entity linking, and the models’ inability to capture additional information from knowledge graphs.

2019

pdf bib abs

Is It Dish Washer Safe? Automatically Answering “Yes/No” Questions Using Customer Reviews
Daria Dzendzik | Carl Vogel | Jennifer Foster
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

It has become commonplace for people to share their opinions about all kinds of products by posting reviews online. It has also become commonplace for potential customers to do research about the quality and limitations of these products by posting questions online. We test the extent to which reviews are useful in question-answering by combining two Amazon datasets and focusing our attention on yes/no questions. A manual analysis of 400 cases reveals that the reviews directly contain the answer to the question just over a third of the time. Preliminary reading comprehension experiments with this dataset prove inconclusive, with accuracy in the range 50-66%.

pdf bib abs

MSO with tests and reducts
Tim Fernando | David Woods | Carl Vogel
Proceedings of the 14th International Conference on Finite-State Methods and Natural Language Processing

Tests added to Kleene algebra (by Kozen and others) are considered within Monadic Second Order logic over strings, where they are likened to statives in natural language. Reducts are formed over tests and non-tests alike, specifying what is observable. Notions of temporal granularity are based on observable change, under the assumption that a finite set bounds what is observable (with the possibility of stretching such bounds by moving to a larger finite set). String projections at different granularities are conjoined by superpositions that provide another variant of concatenation for Booleans.

Carl Vogel

2023

2022

2021

2020

2019

2018

2017

2015

2014

2013

2012

2010

2009

1996

Co-authors

Venues