International Conference on Computational Semantics (2021)

Volumes

Proceedings of the 14th International Conference on Computational Semantics (IWCS) IWCS 24 papers
Proceedings of the 17th Joint ACL - ISO Workshop on Interoperable Semantic Annotation ISA 8 papers
Proceedings of the 1st Workshop on Multimodal Semantic Representations (MMSR) MMSR 11 papers
Proceedings of the 1st and 2nd Workshops on Natural Logic Meets Machine Learning (NALOMA) NALOMA 10 papers
Proceedings of the 2021 Workshop on Semantic Spaces at the Intersection of NLP, Physics, and Cognitive Science (SemSpace) SemSpace 9 papers

pdf (full)
bib (full) Proceedings of the 14th International Conference on Computational Semantics (IWCS)

Proceedings of the 14th International Conference on Computational Semantics (IWCS)
Sina Zarrieß | Johan Bos | Rik van Noord | Lasha Abzianidze

pdf bib abs

Switching Contexts: Transportability Measures for NLP
Guy Marshall | Mokanarangan Thayaparan | Philip Osborne | André Freitas

This paper explores the topic of transportability, as a sub-area of generalisability. By proposing the utilisation of metrics based on well-established statistics, we are able to estimate the change in performance of NLP models in new contexts. Defining a new measure for transportability may allow for better estimation of NLP system performance in new domains, and is crucial when assessing the performance of NLP systems in new tasks and domains. Through several instances of increasing complexity, we demonstrate how lightweight domain similarity measures can be used as estimators for the transportability in NLP applications. The proposed transportability measures are evaluated in the context of Named Entity Recognition and Natural Language Inference tasks.

pdf bib abs

Applied Temporal Analysis: A Complete Run of the FraCaS Test Suite
Jean-Philippe Bernardy | Stergios Chatzikyriakidis

In this paper, we propose an implementation of temporal semantics that translates syntax trees to logical formulas, suitable for consumption by the Coq proof assistant. The analysis supports a wide range of phenomena including: temporal references, temporal adverbs, aspectual classes and progressives. The new semantics are built on top of a previous system handling all sections of the FraCaS test suite except the temporal reference section, and we obtain an accuracy of 81 percent overall and 73 percent for the problems explicitly marked as related to temporal reference. To the best of our knowledge, this is the best performance of a logical system on the whole of the FraCaS.

pdf bib abs

CO-NNECT: A Framework for Revealing Commonsense Knowledge Paths as Explicitations of Implicit Knowledge in Texts
Maria Becker | Katharina Korfhage | Debjit Paul | Anette Frank

In this work we leverage commonsense knowledge in form of knowledge paths to establish connections between sentences, as a form of explicitation of implicit knowledge. Such connections can be direct (singlehop paths) or require intermediate concepts (multihop paths). To construct such paths we combine two model types in a joint framework we call Co-nnect: a relation classifier that predicts direct connections between concepts; and a target prediction model that generates target or intermediate concepts given a source concept and a relation, which we use to construct multihop paths. Unlike prior work that relies exclusively on static knowledge sources, we leverage language models finetuned on knowledge stored in ConceptNet, to dynamically generate knowledge paths, as explanations of implicit knowledge that connects sentences in texts. As a central contribution we design manual and automatic evaluation settings for assessing the quality of the generated paths. We conduct evaluations on two argumentative datasets and show that a combination of the two model types generates meaningful, high-quality knowledge paths between sentences that reveal implicit knowledge conveyed in text.

pdf bib abs

Computing All Quantifier Scopes with CCG
Miloš Stanojević | Mark Steedman

We present a method for computing all quantifer scopes that can be extracted from a single CCG derivation. To do that we build on the proposal of Steedman (1999, 2011) where all existential quantifiers are treated as Skolem functions. We extend the approach by introducing a better packed representation of all possible specifications that also includes node addresses where the specifications happen. These addresses are necessary for recovering all, and only, possible readings.

pdf bib abs

Encoding Explanatory Knowledge for Zero-shot Science Question Answering
Zili Zhou | Marco Valentino | Donal Landers | André Freitas

This paper describes N-XKT (Neural encoding based on eXplanatory Knowledge Transfer), a novel method for the automatic transfer of explanatory knowledge through neural encoding mechanisms. We demonstrate that N-XKT is able to improve accuracy and generalization on science Question Answering (QA). Specifically, by leveraging facts from background explanatory knowledge corpora, the N-XKT model shows a clear improvement on zero-shot QA. Furthermore, we show that N-XKT can be fine-tuned on a target QA dataset, enabling faster convergence and more accurate results. A systematic analysis is conducted to quantitatively analyze the performance of the N-XKT model and the impact of different categories of knowledge on the zero-shot generalization task.

pdf bib abs

Predicate Representations and Polysemy in VerbNet Semantic Parsing
James Gung | Martha Palmer

Despite recent advances in semantic role labeling propelled by pre-trained text encoders like BERT, performance lags behind when applied to predicates observed infrequently during training or to sentences in new domains. In this work, we investigate how role labeling performance on low-frequency predicates and out-of-domain data can be further improved by using VerbNet, a verb lexicon that groups verbs into hierarchical classes based on shared syntactic and semantic behavior and defines semantic representations describing relations between arguments. We find that VerbNet classes provide an effective level of abstraction, improving generalization on low-frequency predicates by allowing them to learn from the training examples of other predicates belonging to the same class. We also find that joint training of VerbNet role labeling and predicate disambiguation of VerbNet classes for polysemous verbs leads to improvements in both tasks, naturally supporting the extraction of VerbNet’s semantic representations.

pdf bib abs

Critical Thinking for Language Models
Gregor Betz | Christian Voigt | Kyle Richardson

This paper takes a first step towards a critical thinking curriculum for neural auto-regressive language models. We introduce a synthetic corpus of deductively valid arguments, and generate artificial argumentative texts to train CRiPT: a critical thinking intermediarily pre-trained transformer based on GPT-2. Significant transfer learning effects can be observed: Trained on three simple core schemes, CRiPT accurately completes conclusions of different, and more complex types of arguments, too. CRiPT generalizes the core argument schemes in a correct way. Moreover, we obtain consistent and promising results for NLU benchmarks. In particular, CRiPT’s zero-shot accuracy on the GLUE diagnostics exceeds GPT-2’s performance by 15 percentage points. The findings suggest that intermediary pre-training on texts that exemplify basic reasoning abilities (such as typically covered in critical thinking textbooks) might help language models to acquire a broad range of reasoning skills. The synthetic argumentative texts presented in this paper are a promising starting point for building such a “critical thinking curriculum for language models.”

pdf bib abs

Do Natural Language Explanations Represent Valid Logical Arguments? Verifying Entailment in Explainable NLI Gold Standards
Marco Valentino | Ian Pratt-Hartmann | André Freitas

An emerging line of research in Explainable NLP is the creation of datasets enriched with human-annotated explanations and rationales, used to build and evaluate models with step-wise inference and explanation generation capabilities. While human-annotated explanations are used as ground-truth for the inference, there is a lack of systematic assessment of their consistency and rigour. In an attempt to provide a critical quality assessment of Explanation Gold Standards (XGSs) for NLI, we propose a systematic annotation methodology, named Explanation Entailment Verification (EEV), to quantify the logical validity of human-annotated explanations. The application of EEV on three mainstream datasets reveals the surprising conclusion that a majority of the explanations, while appearing coherent on the surface, represent logically invalid arguments, ranging from being incomplete to containing clearly identifiable logical errors. This conclusion confirms that the inferential properties of explanations are still poorly formalised and understood, and that additional work on this line of research is necessary to improve the way Explanation Gold Standards are constructed.

pdf bib abs

Looking for a Role for Word Embeddings in Eye-Tracking Features Prediction: Does Semantic Similarity Help?
Lavinia Salicchi | Alessandro Lenci | Emmanuele Chersoni

Eye-tracking psycholinguistic studies have suggested that context-word semantic coherence and predictability influence language processing during the reading activity. In this study, we investigate the correlation between the cosine similarities computed with word embedding models (both static and contextualized) and eye-tracking data from two naturalistic reading corpora. We also studied the correlations of surprisal scores computed with three state-of-the-art language models. Our results show strong correlation for the scores computed with BERT and GloVe, suggesting that similarity can play an important role in modeling reading times.

pdf bib abs

Automatic Assignment of Semantic Frames in Disaster Response Team Communication Dialogues
Natalia Skachkova | Ivana Kruijff-Korbayova

We investigate frame semantics as a meaning representation framework for team communication in a disaster response scenario. We focus on the automatic frame assignment and retrain PAFIBERT, which is one of the state-of-the-art frame classifiers, on English and German disaster response team communication data, obtaining accuracy around 90%. We examine the performance of both models and discuss their adjustments, such as sampling of additional training instances from an unrelated domain and adding extra lexical and discourse features to input token representations. We show that sampling has some positive effect on the German frame classifier, discuss an unexpected impact of extra features on the models’ behaviour and perform a careful error analysis.

pdf bib abs

Implicit representations of event properties within contextual language models: Searching for “causativity neurons”
Esther Seyffarth | Younes Samih | Laura Kallmeyer | Hassan Sajjad

This paper addresses the question to which extent neural contextual language models such as BERT implicitly represent complex semantic properties. More concretely, the paper shows that the neuron activations obtained from processing an English sentence provide discriminative features for predicting the (non-)causativity of the event denoted by the verb in a simple linear classifier. A layer-wise analysis reveals that the relevant properties are mostly learned in the higher layers. Moreover, further experiments show that appr. 10% of the neuron activations are enough to already predict causativity with a relatively high accuracy.

pdf bib abs

Monotonicity Marking from Universal Dependency Trees
Zeming Chen | Qiyue Gao

Dependency parsing is a tool widely used in the field of Natural language processing and computational linguistics. However, there is hardly any work that connects dependency parsing to monotonicity, which is an essential part of logic and linguistic semantics. In this paper, we present a system that automatically annotates monotonicity information based on Universal Dependency parse trees. Our system utilizes surface-level monotonicity facts about quantifiers, lexical items, and token-level polarity information. We compared our system’s performance with existing systems in the literature, including NatLog and ccg2mono, on a small evaluation dataset. Results show that our system outperforms NatLog and ccg2mono.

pdf bib abs

Research in NLP has mainly focused on factoid questions, with the goal of finding quick and reliable ways of matching a query to an answer. However, human discourse involves more than that: it contains non-canonical questions deployed to achieve specific communicative goals. In this paper, we investigate this under-studied aspect of NLP by introducing a targeted task, creating an appropriate corpus for the task and providing baseline models of diverse nature. With this, we are also able to generate useful insights on the task and open the way for future research in this direction.

pdf bib abs

New Domain, Major Effort? How Much Data is Necessary to Adapt a Temporal Tagger to the Voice Assistant Domain
Touhidul Alam | Alessandra Zarcone | Sebastian Padó

Reliable tagging of Temporal Expressions (TEs, e.g., Book a table at L’Osteria for Sunday evening) is a central requirement for Voice Assistants (VAs). However, there is a dearth of resources and systems for the VA domain, since publicly-available temporal taggers are trained only on substantially different domains, such as news and clinical text. Since the cost of annotating large datasets is prohibitive, we investigate the trade-off between in-domain data and performance in DA-Time, a hybrid temporal tagger for the English VA domain which combines a neural architecture for robust TE recognition, with a parser-based TE normalizer. We find that transfer learning goes a long way even with as little as 25 in-domain sentences: DA-Time performs at the state of the art on the news domain, and substantially outperforms it on the VA domain.

pdf bib abs

Breeding Fillmore’s Chickens and Hatching the Eggs: Recombining Frames and Roles in Frame-Semantic Parsing
Gosse Minnema | Malvina Nissim

Frame-semantic parsers traditionally predict predicates, frames, and semantic roles in a fixed order. This paper explores the ‘chicken-or-egg’ problem of interdependencies between these components theoretically and practically. We introduce a flexible BERT-based sequence labeling architecture that allows for predicting frames and roles independently from each other or combining them in several ways. Our results show that our setups can approximate more complex traditional models’ performance, while allowing for a clearer view of the interdependencies between the pipeline’s components, and of how frame and role prediction models make different use of BERT’s layers.

pdf bib abs

Large-scale text pre-training helps with dialogue act recognition, but not without fine-tuning
Bill Noble | Vladislav Maraev

We use dialogue act recognition (DAR) to investigate how well BERT represents utterances in dialogue, and how fine-tuning and large-scale pre-training contribute to its performance. We find that while both the standard BERT pre-training and pretraining on dialogue-like data are useful, task-specific fine-tuning is essential for good performance.

pdf bib abs

Builder, we have done it: Evaluating & Extending Dialogue-AMR NLU Pipeline for Two Collaborative Domains
Claire Bonial | Mitchell Abrams | David Traum | Clare Voss

We adopt, evaluate, and improve upon a two-step natural language understanding (NLU) pipeline that incrementally tames the variation of unconstrained natural language input and maps to executable robot behaviors. The pipeline first leverages Abstract Meaning Representation (AMR) parsing to capture the propositional content of the utterance, and second converts this into “Dialogue-AMR,” which augments standard AMR with information on tense, aspect, and speech acts. Several alternative approaches and training datasets are evaluated for both steps and corresponding components of the pipeline, some of which outperform the original. We extend the Dialogue-AMR annotation schema to cover a different collaborative instruction domain and evaluate on both domains. With very little training data, we achieve promising performance in the new domain, demonstrating the scalability of this approach.

pdf bib abs

A Transition-based Parser for Unscoped Episodic Logical Forms
Gene Kim | Viet Duong | Xin Lu | Lenhart Schubert

“Episodic Logic: Unscoped Logical Form” (EL-ULF) is a semantic representation capturing predicate-argument structure as well as more challenging aspects of language within the Episodic Logic formalism. We present the first learned approach for parsing sentences into ULFs, using a growing set of annotated examples. The results provide a strong baseline for future improvement. Our method learns a sequence-to-sequence model for predicting the transition action sequence within a modified cache transition system. We evaluate the efficacy of type grammar-based constraints, a word-to-symbol lexicon, and transition system state features in this task. Our system is available at https://github.com/genelkim/ulf-transition-parser. We also present the first official annotated ULF dataset at https://www.cs.rochester.edu/u/gkim21/ulf/resources/.

pdf bib abs

“Politeness, you simpleton!” retorted [MASK]: Masked prediction of literary characters
Eric Holgate | Katrin Erk

What is the best way to learn embeddings for entities, and what can be learned from them? We consider this question for the case of literary characters. We address the highly challenging task of guessing, from a sentence in the novel, which character is being talked about, and we probe the embeddings to see what information they encode about their literary characters. We find that when continuously trained, entity embeddings do well at the masked entity prediction task, and that they encode considerable information about the traits and characteristics of the entities.

pdf bib abs

Tuning Deep Active Learning for Semantic Role Labeling
Skatje Myers | Martha Palmer

Active learning has been shown to reduce annotation requirements for numerous natural language processing tasks, including semantic role labeling (SRL). SRL involves labeling argument spans for potentially multiple predicates in a sentence, which makes it challenging to aggregate the numerous decisions into a single score for determining new instances to annotate. In this paper, we apply two ways of aggregating scores across multiple predicates in order to choose query sentences with two methods of estimating model certainty: using the neural network’s outputs and using dropout-based Bayesian Active Learning by Disagreement. We compare these methods with three passive baselines — random sentence selection, random whole-document selection, and selecting sentences with the most predicates — and analyse the effect these strategies have on the learning curve with respect to reducing the number of annotated sentences and predicates to achieve high performance.

pdf bib abs

The SemLink resource provides mappings between a variety of lexical semantic ontologies, each with their strengths and weaknesses. To take advantage of these differences, the ability to move between resources is essential. This work describes advances made to improve the usability of the SemLink resource: the automatic addition of new instances and mappings, manual corrections, sense-based vectors and collocation information, and architecture built to automatically update the resource when versions of the underlying resources change. These updates improve coverage, provide new tools to leverage the capabilities of these resources, and facilitate seamless updates, ensuring the consistency and applicability of these mappings in the future.

pdf bib abs

Variation in framing as a function of temporal reporting distance
Levi Remijnse | Marten Postma | Piek Vossen

In this paper, we measure variation in framing as a function of foregrounding and backgrounding in a co-referential corpus with a range of temporal distance. In one type of experiment, frame-annotated corpora grouped under event types were contrasted, resulting in a ranking of frames with typicality rates. In contrasting between publication dates, a different ranking of frames emerged for documents that are close to or far from the event instance. In the second type of analysis, we trained a diagnostic classifier with frame occurrences in order to let it differentiate documents based on their temporal distance class (close to or far from the event instance). The classifier performs above chance and outperforms models with words.

pdf bib abs

Automatic Classification of Attributes in German Adjective-Noun Phrases
Neele Falk | Yana Strakatova | Eva Huber | Erhard Hinrichs

Adjectives such as heavy (as in heavy rain) and windy (as in windy day) provide possible values for the attributes intensity and climate, respectively. The attributes themselves are not overtly realized and are in this sense implicit. While these attributes can be easily inferred by humans, their automatic classification poses a challenging task for computational models. We present the following contributions: (1) We gain new insights into the attribute selection task for German. More specifically, we develop computational models for this task that are able to generalize to unseen data. Moreover, we show that classification accuracy depends, inter alia, on the degree of polysemy of the lexemes involved, on the generalization potential of the training data and on the degree of semantic transparency of the adjective-noun pairs in question. (2) We provide the first resource for computational and linguistic experiments with German adjective-noun pairs that can be used for attribute selection and related tasks. In order to safeguard against unwelcome memorization effects, we present an automatic data augmentation method based on a lexical resource that can increase the size of the training data to a large extent.

pdf (full)
bib (full) Proceedings of the 17th Joint ACL - ISO Workshop on Interoperable Semantic Annotation

pdf bib

Proceedings of the 17th Joint ACL - ISO Workshop on Interoperable Semantic Annotation
Harry Bunt

pdf bib abs

In this paper, we describe the process of developing a multilayer semantic annotation scheme designed for extracting information from a European Portuguese corpus of news articles, at three levels, temporal, referential and semantic role labelling. The novelty of this scheme is the harmonization of parts 1, 4 and 9 of the ISO 24617 Language resource management - Semantic annotation framework. This annotation framework includes a set of entity structures (participants, events, times) and a set of links (temporal, aspectual, subordination, objectal and semantic roles) with several tags and attribute values that ensure adequate semantic and visual representations of news stories.

pdf bib abs

Towards the ISO 24617-2-compliant Typology of Metacognitive Events
Volha Petukhova | Hafiza Erum Manzoor

The paper presents ongoing efforts in design of a typology of metacognitive events observed in a multimodal dialogue. The typology will serve as a tool to identify relations between participants’ dispositions, dialogue actions and metacognitive indicators. It will be used to support an assessment of metacognitive knowledge, experiences and strategies of dialogue participants. Based on the mutidimensional dialogue model defined within the framework of Dynamic Interpretation Theory and ISO 24617-2 annotation standard, the proposed approach provides a systematic analysis of metacognitive events in terms of dialogue acts, i.e. concepts that dialogue research community is used to operate on in dialogue modelling and system design tasks.

pdf bib abs

Annotating Quantified Phenomena in Complex Sentence Structures Using the Example of Generalising Statements in Literary Texts
Tillmann Dönicke | Luisa Gödeke | Hanna Varachkina

We present a tagset for the annotation of quantification which we currently use to annotate certain quantified statements in fictional works of literature. Literary texts feature a rich variety in expressing quantification, including a broad range of lexemes to express quantifiers and complex sentence structures to express the restrictor and the nuclear scope of a quantification. Our tagset consists of seven tags and covers all types of quantification that occur in natural language, including vague quantification and generic quantification. In the second part of the paper, we introduce our German corpus with annotations of generalising statements, which form a proper subset of quantified statements.

pdf bib abs

The ISA-17 Quantification Challenge: Background and introduction
Harry Bunt

This paper, intended for the ISA-17 Quantification Annotation track, provides background information for the shared quantification annotation task at the ISA-17 workshop, a.k.a. the Quantification Challenge. In particular, the role of the abstract and concrete syntax of the QuantML markup language are explained, and the semantic interpretation of QuantML annotations in relation to the ISO principles of semantic annotation. Additionally, the choice is motivated of the test suite of the Quantification Challenge, along with the suggested markables for the sentences of the suite.

pdf bib abs

Discourse-based Argument Segmentation and Annotation
Ekaterina Saveleva | Volha Petukhova | Marius Mosbach | Dietrich Klakow

The paper presents a discourse-based approach to the analysis of argumentative texts departing from the assumption that the coherence of a text should capture argumentation structure as well and, therefore, existing discourse analysis tools can be successfully applied for argument segmentation and annotation tasks. We tested the widely used Penn Discourse Tree Bank full parser (Lin et al., 2010) and the state-of-the-art neural network NeuralEDUSeg (Wang et al., 2018) and XLNet (Yang et al., 2019) models on the two-stage discourse segmentation and discourse relation recognition. The two-stage approach outperformed the PDTB parser by broad margin, i.e. the best achieved F1 scores of 21.2 % for PDTB parser vs 66.37% for NeuralEDUSeg and XLNet models. Neural network models were fine-tuned and evaluated on the argumentative corpus showing a promising accuracy of 60.22%. The complete argument structures were reconstructed for further argumentation mining tasks. The reference Dagstuhl argumentative corpus containing 2,222 elementary discourse unit pairs annotated with the top-level and fine-grained PDTB relations will be released to the research community.

pdf bib abs

Converting Multilayer Glosses into Semantic and Pragmatic forms with GENLIS
Rodolfo Delmonte | Serena Trolvi | Francesco Stiffoni

This paper presents work carried out to transform glosses of a fable in Italian Sign Language (LIS) into a text which is then read by a TTS synthesizer from an SSML modified version of the same text. Whereas many systems exist that generate sign language from a text, we decided to do the reverse operation and generate text from LIS. For that purpose we used a version of the fable The Tortoise and the Hare, signed and made available on Youtube by ALBA cooperativa sociale, which was annotated manually by second author for her master’s thesis. In order to achieve our goal, we converted the multilayer glosses into linear Prolog terms to be fed to the generator. In the paper we focus on the main problems encountered in the transformation of the glosses into a semantically and pragmatically consistent representation. The main problems have been caused by the complexities of a text like a fable which requires coreference mechanisms and speech acts to be implemented in the representation which are often unexpressed and constitute implicit information.

pdf bib abs

We argue that mainly due to technical innovation in the landscape of annotation tools, a conceptual change in annotation models and processes is also on the horizon. It is diagnosed that these changes are bound up with multi-media and multi-perspective facilities of annotation tools, in particular when considering virtual reality (VR) and augmented reality (AR) applications, their potential ubiquitous use, and the exploitation of externally trained natural language pre-processing methods. Such developments potentially lead to a dynamic and exploratory heuristic construction of the annotation process. With TextAnnotator an annotation suite is introduced which focuses on multi-mediality and multi-perspectivity with an interoperable set of task-specific annotation modules (e.g., for word classification, rhetorical structures, dependency trees, semantic roles, and more) and their linkage to VR and mobile implementations. The basic architecture and usage of TextAnnotator is described and related to the above mentioned shifts in the field.

pdf (full)
bib (full) Proceedings of the 1st Workshop on Multimodal Semantic Representations (MMSR)

pdf bib

Proceedings of the 1st Workshop on Multimodal Semantic Representations (MMSR)
Lucia Donatelli | Nikhil Krishnaswamy | Kenneth Lai | James Pustejovsky

pdf bib abs

What is Multimodality?
Letitia Parcalabescu | Nils Trost | Anette Frank

The last years have shown rapid developments in the field of multimodal machine learning, combining e.g., vision, text or speech. In this position paper we explain how the field uses outdated definitions of multimodality that prove unfit for the machine learning era. We propose a new task-relative definition of (multi)modality in the context of multimodal machine learning that focuses on representations and information that are relevant for a given machine learning task. With our new definition of multimodality we aim to provide a missing foundation for multimodal research, an important component of language grounding and a crucial milestone towards NLU.

pdf bib abs

Are Gestures Worth a Thousand Words? An Analysis of Interviews in the Political Domain
Daniela Trotta | Sara Tonelli

Speaker gestures are semantically co-expressive with speech and serve different pragmatic functions to accompany oral modality. Therefore, gestures are an inseparable part of the language system: they may add clarity to discourse, can be employed to facilitate lexical retrieval and retain a turn in conversations, assist in verbalizing semantic content and facilitate speakers in coming up with the words they intend to say. This aspect is particularly relevant in political discourse, where speakers try to apply communication strategies that are both clear and persuasive using verbal and non-verbal cues. In this paper we investigate the co-speech gestures of several Italian politicians during face-to-face interviews using a multimodal linguistic approach. We first enrich an existing corpus with a novel annotation layer capturing the function of hand movements. Then, we perform an analysis of the corpus, focusing in particular on the relationship between hand movements and other information layers such as the political party or non-lexical and semi-lexical tags. We observe that the recorded differences pertain more to single politicians than to the party they belong to, and that hand movements tend to occur frequently with semi-lexical phenomena, supporting the lexical retrieval hypothesis.

pdf bib abs

Requesting clarifications with speech and gestures
Jonathan Ginzburg | Andy Luecking

In multimodal natural language interaction both speech and non-speech gestures are involved in the basic mechanism of grounding and repair. We discuss a couple of multimodal clarifica- tion requests and argue that gestures, as well as speech expressions, underlie comparable paral- lelism constraints. In order to make this precise, we slightly extend the formal dialogue frame- work KoS to cover also gestural counterparts of verbal locutionary propositions.

pdf bib abs

Seeing past words: Testing the cross-modal capabilities of pretrained V&L models on counting tasks
Letitia Parcalabescu | Albert Gatt | Anette Frank | Iacer Calixto

We investigate the reasoning ability of pretrained vision and language (V&L) models in two tasks that require multimodal integration: (1) discriminating a correct image-sentence pair from an incorrect one, and (2) counting entities in an image. We evaluate three pretrained V&L models on these tasks: ViLBERT, ViLBERT 12-in-1 and LXMERT, in zero-shot and finetuned settings. Our results show that models solve task (1) very well, as expected, since all models are pretrained on task (1). However, none of the pretrained V&L models is able to adequately solve task (2), our counting probe, and they cannot generalise to out-of-distribution quantities. We propose a number of explanations for these findings: LXMERT (and to some extent ViLBERT 12-in-1) show some evidence of catastrophic forgetting on task (1). Concerning our results on the counting probe, we find evidence that all models are impacted by dataset bias, and also fail to individuate entities in the visual input. While a selling point of pretrained V&L models is their ability to solve complex tasks, our findings suggest that understanding their reasoning and grounding capabilities requires more targeted investigations on specific phenomena.

pdf bib abs

How Vision Affects Language: Comparing Masked Self-Attention in Uni-Modal and Multi-Modal Transformer
Nikolai Ilinykh | Simon Dobnik

The problem of interpretation of knowledge learned by multi-head self-attention in transformers has been one of the central questions in NLP. However, a lot of work mainly focused on models trained for uni-modal tasks, e.g. machine translation. In this paper, we examine masked self-attention in a multi-modal transformer trained for the task of image captioning. In particular, we test whether the multi-modality of the task objective affects the learned attention patterns. Our visualisations of masked self-attention demonstrate that (i) it can learn general linguistic knowledge of the textual input, and (ii) its attention patterns incorporate artefacts from visual modality even though it has never accessed it directly. We compare our transformer’s attention patterns with masked attention in distilgpt-2 tested for uni-modal text generation of image captions. Based on the maps of extracted attention weights, we argue that masked self-attention in image captioning transformer seems to be enhanced with semantic knowledge from images, exemplifying joint language-and-vision information in its attention patterns.

pdf bib abs

We present EMISSOR: a platform to capture multimodal interactions as recordings of episodic experiences with explicit referential interpretations that also yield an episodic Knowledge Graph (eKG). The platform stores streams of multiple modalities as parallel signals. Each signal is segmented and annotated independently with interpretation. Annotations are eventually mapped to explicit identities and relations in the eKG. As we ground signal segments from different modalities to the same instance representations, we also ground different modalities across each other. Unique to our eKG is that it accepts different interpretations across modalities, sources and experiences and supports reasoning over conflicting information and uncertainties that may result from multimodal experiences. EMISSOR can record and annotate experiments in virtual and real-world, combine data, evaluate system behavior and their performance for preset goals but also model the accumulation of knowledge and interpretations in the Knowledge Graph as a result of these episodic experiences.

pdf bib abs

Annotating anaphoric phenomena in situated dialogue
Sharid Loáiciga | Simon Dobnik | David Schlangen

In recent years several corpora have been developed for vision and language tasks. With this paper, we intend to start a discussion on the annotation of referential phenomena in situated dialogue. We argue that there is still significant room for corpora that increase the complexity of both visual and linguistic domains and which capture different varieties of perceptual and conversational contexts. In addition, a rich annotation scheme covering a broad range of referential phenomena and compatible with the textual task of coreference resolution is necessary in order to take the most advantage of these corpora. Consequently, there are several open questions regarding the semantics of reference and annotation, and the extent to which standard textual coreference accounts for the situated dialogue genre. Working with two corpora on situated dialogue, we present our extension to the ARRAU (Uryupina et al., 2020) annotation scheme in order to start this discussion.

pdf bib abs

Incremental Unit Networks for Multimodal, Fine-grained Information State Representation
Casey Kennington | David Schlangen

We offer a fine-grained information state annotation scheme that follows directly from the Incremental Unit abstract model of dialogue processing when used within a multimodal, co-located, interactive setting. We explain the Incremental Unit model and give an example application using the Localized Narratives dataset, then offer avenues for future research.

pdf bib abs

Teaching Arm and Head Gestures to a Humanoid Robot through Interactive Demonstration and Spoken Instruction
Michael Brady | Han Du

We describe work in progress for training a humanoid robot to produce iconic arm and head gestures as part of task-oriented dialogue interaction. This involves the development and use of a multimodal dialog manager for non-experts to quickly ‘program’ the robot through speech and vision. Using this dialog manager, videos of gesture demonstrations are collected. Motor positions are extracted from these videos to specify motor trajectories where collections of motor trajectories are used to produce robot gestures following a Gaussian mixtures approach. Concluding discussion considers how learned representations may be used for gesture recognition by the robot, and how the framework may mature into a system to address language grounding and semantic representation.

pdf bib abs

Building a Video-and-Language Dataset with Human Actions for Multimodal Logical Inference
Riko Suzuki | Hitomi Yanaka | Koji Mineshima | Daisuke Bekki

This paper introduces a new video-and-language dataset with human actions for multimodal logical inference, which focuses on intentional and aspectual expressions that describe dynamic human actions. The dataset consists of 200 videos, 5,554 action labels, and 1,942 action triplets of the form (subject, predicate, object) that can be easily translated into logical semantic representations. The dataset is expected to be useful for evaluating multimodal inference systems between videos and semantically complicated sentences including negation and quantification.

pdf (full)
bib (full) Proceedings of the 1st and 2nd Workshops on Natural Logic Meets Machine Learning (NALOMA)

pdf bib

Proceedings of the 1st and 2nd Workshops on Natural Logic Meets Machine Learning (NALOMA)
Aikaterini-Lida Kalouli | Lawrence S. Moss

pdf bib abs

Learning General Event Schemas with Episodic Logic
Lane Lawley | Benjamin Kuehnert | Lenhart Schubert

We present a system for learning generalized, stereotypical patterns of events—or “schemas”—from natural language stories, and applying them to make predictions about other stories. Our schemas are represented with Episodic Logic, a logical form that closely mirrors natural language. By beginning with a “head start” set of protoschemas— schemas that a 1- or 2-year-old child would likely know—we can obtain useful, general world knowledge with very few story examples—often only one or two. Learned schemas can be combined into more complex, composite schemas, and used to make predictions in other stories where only partial information is available.

pdf bib abs

Applied Medical Code Mapping with Character-based Deep Learning Models and Word-based Logic
John Langton | Krishna Srihasam

Logical Observation Identifiers Names and Codes (LOINC) is a standard set of codes that enable clinicians to communicate about medical tests. Laboratories depend on LOINC to identify what tests a doctor orders for a patient. However, clinicians often use site specific, custom codes in their medical records systems that can include shorthand, spelling mistakes, and invented acronyms. Software solutions must map from these custom codes to the LOINC standard to support data interoperability. A key challenge is that LOINC is comprised of six elements. Mapping requires not only extracting those elements, but also combining them according to LOINC logic. We found that character-based deep learning excels at extracting LOINC elements while logic based methods are more effective for combining those elements into complete LOINC values. In this paper, we present an ensemble of machine learning and logic that is currently used in several medical facilities to map from

pdf bib abs

Attentive Tree-structured Network for Monotonicity Reasoning
Zeming Chen

Many state-of-art neural models designed for monotonicity reasoning perform poorly on downward inference. To address this shortcoming, we developed an attentive tree-structured neural network. It consists of a tree-based long-short-term-memory network (Tree-LSTM) with soft attention. It is designed to model the syntactic parse tree information from the sentence pair of a reasoning task. A self-attentive aggregator is used for aligning the representations of the premise and the hypothesis. We present our model and evaluate it using the Monotonicity Entailment Dataset (MED). We show and attempt to explain that our model outperforms existing models on MED.

pdf bib abs

Transferring Representations of Logical Connectives
Aaron Traylor | Ellie Pavlick | Roman Feiman

In modern natural language processing pipelines, it is common practice to “pretrain” a generative language model on a large corpus of text, and then to “finetune” the created representations by continuing to train them on a discriminative textual inference task. However, it is not immediately clear whether the logical meaning necessary to model logical entailment is captured by language models in this paradigm. We examine this pretrain-finetune recipe with language models trained on a synthetic propositional language entailment task, and present results on test sets probing models’ knowledge of axioms of first order logic.

pdf bib abs

Monotonic Inference for Underspecified Episodic Logic
Gene Kim | Mandar Juvekar | Lenhart Schubert

We present a method of making natural logic inferences from Unscoped Logical Form of Episodic Logic. We establish a correspondence between inference rules of scope resolved Episodic Logic and the natural logic treatment by Sánchez Valencia (1991a), and hence demonstrate the ability to handle foundational natural logic inferences from prior literature as well as more general nested monotonicity inferences.

pdf bib abs

Supporting Context Monotonicity Abstractions in Neural NLI Models
Julia Rozanova | Deborah Ferreira | Mokanarangan Thayaparan | Marco Valentino | André Freitas

Natural language contexts display logical regularities with respect to substitutions of related concepts: these are captured in a functional order-theoretic property called monotonicity. For a certain class of NLI problems where the resulting entailment label depends only on the context monotonicity and the relation between the substituted concepts, we build on previous techniques that aim to improve the performance of NLI models for these problems, as consistent performance across both upward and downward monotone contexts still seems difficult to attain even for state of the art models. To this end, we reframe the problem of context monotonicity classification to make it compatible with transformer-based pre-trained NLI models and add this task to the training pipeline. Furthermore, we introduce a sound and complete simplified monotonicity logic formalism which describes our treatment of contexts as abstract units. Using the notions in our formalism, we adapt targeted challenge sets to investigate whether an intermediate context monotonicity classification task can aid NLI models’ performance on examples exhibiting monotonicity reasoning.

pdf bib abs

Bayesian Classification and Inference in a Probabilistic Type Theory with Records
Staffan Larsson | Robin Cooper

We propose a probabilistic account of semantic inference and classification formulated in terms of probabilistic type theory with records, building on Cooper et. al. (2014) and Cooper et. al. (2015). We suggest probabilistic type theoretic formulations of Naive Bayes Classifiers and Bayesian Networks. A central element of these constructions is a type-theoretic version of a random variable. We illustrate this account with a simple language game combining probabilistic classification of perceptual input with probabilistic (semantic) inference.

pdf bib abs

From compositional semantics to Bayesian pragmatics via logical inference
Julian Grove | Jean-Philippe Bernardy | Stergios Chatzikyriakidis

Formal semantics in the Montagovian tradition provides precise meaning characterisations, but usually without a formal theory of the pragmatics of contextual parameters and their sensitivity to background knowledge. Meanwhile, formal pragmatic theories make explicit predictions about meaning in context, but generally without a well-defined compositional semantics. We propose a combined framework for the semantic and pragmatic interpretation of sentences in the face of probabilistic knowledge. We do so by (1) extending a Montagovian interpretation scheme to generate a distribution over possible meanings, and (2) generating a posterior for this distribution using a variant of the Rational Speech Act (RSA) models, but generalised to arbitrary propositions. These aspects of our framework are tied together by evaluating entailment under probabilistic uncertainty. We apply our model to anaphora resolution and show that it provides expected biases under suitable assumptions about the distributions of lexical and world-knowledge. Further, we observe that the model’s output is robust to variations in its parameters within reasonable ranges.

pdf bib abs

A (Mostly) Symbolic System for Monotonic Inference with Unscoped Episodic Logical Forms
Gene Kim | Mandar Juvekar | Junis Ekmekciu | Viet Duong | Lenhart Schubert

We implement the formalization of natural logic-like monotonic inference using Unscoped Episodic Logical Forms (ULFs) by Kim et al. (2020). We demonstrate this system’s capacity to handle a variety of challenging semantic phenomena using the FraCaS dataset (Cooper et al., 1996). These results give empirical evidence for prior claims that ULF is an appropriate representation to mediate natural logic-like inferences.

pdf (full)
bib (full) Proceedings of the 2021 Workshop on Semantic Spaces at the Intersection of NLP, Physics, and Cognitive Science (SemSpace)

pdf bib

Proceedings of the 2021 Workshop on Semantic Spaces at the Intersection of NLP, Physics, and Cognitive Science (SemSpace)
Martha Lewis | Mehrnoosh Sadrzadeh

pdf bib abs

Understanding the Semantic Space: How Word Meanings Dynamically Adapt in the Context of a Sentence
Nora Aguirre-Celis | Risto Miikkulainen

How do people understand the meaning of the word “small” when used to describe a mosquito, a church, or a planet? While humans have a remarkable ability to form meanings by combining existing concepts, modeling this process is challenging. This paper addresses that challenge through CEREBRA (Context-dEpendent meaning REpresentations in the BRAin) neural network model. CEREBRA characterizes how word meanings dynamically adapt in the context of a sentence by decomposing sentence fMRI into words and words into embodied brain-based semantic features. It demonstrates that words in different contexts have different representations and the word meaning changes in a way that is meaningful to human subjects. CEREBRA’s context-based representations can potentially be used to make NLP applications more human-like.

pdf bib abs

LinPP: a Python-friendly algorithm for Linear Pregroup Parsing
Irene Rizzo

We define a linear pregroup parser, by applying some key modifications to the minimal parser defined in (Preller, 2007). These include handling words as separate blocks, and thus respecting their syntactic role in the sentence. We prove correctness of our algorithm with respect to parsing sentences in a subclass of pregroup grammars. The algorithm was specifically designed for a seamless implementation in Python. This facilitates its integration within the DisCopy module for QNLP and vastly increases the applicability of pregroup grammars to parsing real-world text data.

pdf bib abs

A CCG-Based Version of the DisCoCat Framework
Richie Yeung | Dimitri Kartsaklis

While the DisCoCat model (Coecke et al., 2010) has been proved a valuable tool for studying compositional aspects of language at the level of semantics, its strong dependency on pregroup grammars poses important restrictions: first, it prevents large-scale experimentation due to the absence of a pregroup parser; and second, it limits the expressibility of the model to context-free grammars. In this paper we solve these problems by reformulating DisCoCat as a passage from Combinatory Categorial Grammar (CCG) to a category of semantics. We start by showing that standard categorial grammars can be expressed as a biclosed category, where all rules emerge as currying/uncurrying the identity; we then proceed to model permutation-inducing rules by exploiting the symmetry of the compact closed category encoding the word meaning. We provide a proof of concept for our method, converting “Alice in Wonderland” into DisCoCat form, a corpus that we make available to the community.

pdf bib abs

Grammar equations
Bob Coecke | Vincent Wang

Diagrammatically speaking, grammatical calculi such as pregroups provide wires between words in order to elucidate their interactions, and this enables one to verify grammatical correctness of phrases and sentences. In this paper we also provide wirings within words. This will enable us to identify grammatical constructs that we expect to be either equal or closely related. Hence, our work paves the way for a new theory of grammar, that provides novel ‘grammatical truths’. We give a nogo-theorem for the fact that our wirings for words make no sense for preordered monoids, the form which grammatical calculi usually take. Instead, they require diagrams – or equivalently, (free) monoidal categories.

pdf bib abs

On the Quantum-like Contextuality of Ambiguous Phrases
Daphne Wang | Mehrnoosh Sadrzadeh | Samson Abramsky | Victor Cervantes

Language is contextual as meanings of words are dependent on their contexts. Contextuality is, concomitantly, a well-defined concept in quantum mechanics where it is considered a major resource for quantum computations. We investigate whether natural language exhibits any of the quantum mechanics’ contextual features. We show that meaning combinations in ambiguous phrases can be modelled in the sheaf-theoretic framework for quantum contextuality, where they can become possibilistically contextual. Using the framework of Contextuality-by-Default (CbD), we explore the probabilistic variants of these and show that CbD-contextuality is also possible.

pdf bib abs

Conversational Negation using Worldly Context in Compositional Distributional Semantics
Benjamin Rodatz | Razin Shaikh | Lia Yeh

We propose a framework to model an operational conversational negation by applying worldly context (prior knowledge) to logical negation in compositional distributional semantics. Given a word, our framework can create its negation that is similar to how humans perceive negation. The framework corrects logical negation to weight meanings closer in the entailment hierarchy more than meanings further apart. The proposed framework is flexible to accommodate different choices of logical negations, compositions, and worldly context generation. In particular, we propose and motivate a new logical negation using matrix inverse. We validate the sensibility of our conversational negation framework by performing experiments, leveraging density matrices to encode graded entailment information. We conclude that the combination of subtraction negation and phaser in the basis of the negated word yields the highest Pearson correlation of 0.635 with human ratings.

pdf bib abs

Parsing conjunctions in DisCoCirc
Tiffany Duneau

In distributional compositional models of meaning logical words require special interpretations, that specify the way in which other words in the sentence interact with each other. So far within the DisCoCat framework, conjunctions have been implemented as merging both conjuncts into a single output, however in the new framework of DisCoCirc merging between nouns is no longer possible. We provide an account of conjunction and an interpretation for the word ‘and’ that solves this, and moreover ensures certain intuitively similar sentences can be given the same interpretations.

pdf bib abs

Should Semantic Vector Composition be Explicit? Can it be Linear?
Dominic Widdows | Kristen Howell | Trevor Cohen

Vector representations have become a central element in semantic language modelling, leading to mathematical overlaps with many fields including quantum theory. Compositionality is a core goal for such representations: given representations for ‘wet’ and ‘fish’, how should the concept ‘wet fish’ be represented? This position paper surveys this question from two points of view. The first considers the question of whether an explicit mathematical representation can be successful using only tools from within linear algebra, or whether other mathematical tools are needed. The second considers whether semantic vector composition should be explicitly described mathematically, or whether it can be a model-internal side-effect of training a neural network. A third and newer question is whether a compositional model can be implemented on a quantum computer. Given the fundamentally linear nature of quantum mechanics, we propose that these questions are related, and that this survey may help to highlight candidate operations for future quantum implementation.