Julia Rozanova


2024

pdf bib
Estimating the Causal Effects of Natural Logic Features in Transformer-Based NLI Models
Julia Rozanova | Marco Valentino | André Freitas
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Rigorous evaluation of the causal effects of semantic features on language model predictions can be hard to achieve for natural language reasoning problems. However, this is such a desirable form of analysis from both an interpretability and model evaluation perspective, that it is valuable to investigate specific patterns of reasoning with enough structure and regularity to identify and quantify systematic reasoning failures in widely-used models. In this vein, we pick a portion of the NLI task for which an explicit causal diagram can be systematically constructed: the case where across two sentences (the premise and hypothesis), two related words/terms occur in a shared context. In this work, we apply causal effect estimation strategies to measure the effect of context interventions (whose effect on the entailment label is mediated by the semantic monotonicity characteristic) and interventions on the inserted word-pair (whose effect on the entailment label is mediated by the relation between these words). Extending related work on causal analysis of NLP models in different settings, we perform an extensive interventional study on the NLI task to investigate robustness to irrelevant changes and sensitivity to impactful changes of Transformers. The results strongly bolster the fact that similar benchmark accuracy scores may be observed for models that exhibit very different behaviour. Moreover, our methodology reinforces previously suspected biases from a causal perspective, including biases in favour of upward-monotone contexts and ignoring the effects of negation markers.

2023

pdf bib
Interventional Probing in High Dimensions: An NLI Case Study
Julia Rozanova | Marco Valentino | Lucas Cordeiro | André Freitas
Findings of the Association for Computational Linguistics: EACL 2023

Probing strategies have been shown to detectthe presence of various linguistic features inlarge language models; in particular, seman-tic features intermediate to the “natural logic”fragment of the Natural Language Inferencetask (NLI). In the case of natural logic, the rela-tion between the intermediate features and theentailment label is explicitly known: as such,this provides a ripe setting for interventionalstudies on the NLI models’ representations, al-lowing for stronger causal conjectures and adeeper critical analysis of interventional prob-ing methods. In this work, we carry out newand existing representation-level interventionsto investigate the effect of these semantic fea-tures on NLI classification: we perform am-nesic probing (which removes features as di-rected by learned linear probes) and introducethe mnestic probing variation (which forgetsall dimensions except the probe-selected ones).Furthermore, we delve into the limitations ofthese methods and outline some pitfalls havebeen obscuring the effectivity of interventionalprobing studies.

2022

pdf bib
Decomposing Natural Logic Inferences for Neural NLI
Julia Rozanova | Deborah Ferreira | Mokanarangan Thayaparan | Marco Valentino | Andre Freitas
Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

In the interest of interpreting neural NLI models and their reasoning strategies, we carry out a systematic probing study which investigates whether these modelscapture the crucial semantic features central to natural logic: monotonicity and concept inclusion. Correctly identifying valid inferences in downward-monotone contexts is a known stumbling block for NLI performance,subsuming linguistic phenomena such as negation scope and generalized quantifiers. To understand this difficulty, we emphasize monotonicity as a property of a context and examine the extent to which models capture relevant monotonicity information in the vector representations which are intermediate to their decision making process. Drawing on the recent advancement of the probing paradigm,we compare the presence of monotonicity features across various models. We find that monotonicity information is notably weak in the representations of popularNLI models which achieve high scores on benchmarks, and observe that previous improvements to these models based on fine-tuning strategies have introduced stronger monotonicity features together with their improved performance on challenge sets.

pdf bib
To be or not to be an Integer? Encoding Variables for Mathematical Text
Deborah Ferreira | Mokanarangan Thayaparan | Marco Valentino | Julia Rozanova | Andre Freitas
Findings of the Association for Computational Linguistics: ACL 2022

The application of Natural Language Inference (NLI) methods over large textual corpora can facilitate scientific discovery, reducing the gap between current research and the available large-scale scientific knowledge. However, contemporary NLI models are still limited in interpreting mathematical knowledge written in Natural Language, even though mathematics is an integral part of scientific argumentation for many disciplines. One of the fundamental requirements towards mathematical language understanding, is the creation of models able to meaningfully represent variables. This problem is particularly challenging since the meaning of a variable should be assigned exclusively from its defining type, i.e., the representation of a variable should come from its context. Recent research has formalised the variable typing task, a benchmark for the understanding of abstract mathematical types and variables in a sentence. In this work, we propose VarSlot, a Variable Slot-based approach, which not only delivers state-of-the-art results in the task of variable typing, but is also able to create context-based representations for variables.

pdf bib
Systematicity, Compositionality and Transitivity of Deep NLP Models: a Metamorphic Testing Perspective
Edoardo Manino | Julia Rozanova | Danilo Carvalho | Andre Freitas | Lucas Cordeiro
Findings of the Association for Computational Linguistics: ACL 2022

Metamorphic testing has recently been used to check the safety of neural NLP models. Its main advantage is that it does not rely on a ground truth to generate test cases. However, existing studies are mostly concerned with robustness-like metamorphic relations, limiting the scope of linguistic properties they can test. We propose three new classes of metamorphic relations, which address the properties of systematicity, compositionality and transitivity. Unlike robustness, our relations are defined over multiple source inputs, thus increasing the number of test cases that we can produce by a polynomial factor. With them, we test the internal consistency of state-of-the-art NLP models, and show that they do not always behave according to their expected linguistic properties. Lastly, we introduce a novel graphical notation that efficiently summarises the inner structure of metamorphic relations.

pdf bib
Diff-Explainer: Differentiable Convex Optimization for Explainable Multi-hop Inference
Mokanarangan Thayaparan | Marco Valentino | Deborah Ferreira | Julia Rozanova | André Freitas
Transactions of the Association for Computational Linguistics, Volume 10

This paper presents Diff-Explainer, the first hybrid framework for explainable multi-hop inference that integrates explicit constraints with neural architectures through differentiable convex optimization. Specifically, Diff- Explainer allows for the fine-tuning of neural representations within a constrained optimization framework to answer and explain multi-hop questions in natural language. To demonstrate the efficacy of the hybrid framework, we combine existing ILP-based solvers for multi-hop Question Answering (QA) with Transformer-based representations. An extensive empirical evaluation on scientific and commonsense QA tasks demonstrates that the integration of explicit constraints in a end-to-end differentiable framework can significantly improve the performance of non- differentiable ILP solvers (8.91%–13.3%). Moreover, additional analysis reveals that Diff-Explainer is able to achieve strong performance when compared to standalone Transformers and previous multi-hop approaches while still providing structured explanations in support of its predictions.

2021

pdf bib
Does My Representation Capture X? Probe-Ably
Deborah Ferreira | Julia Rozanova | Mokanarangan Thayaparan | Marco Valentino | André Freitas
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations

Probing (or diagnostic classification) has become a popular strategy for investigating whether a given set of intermediate features is present in the representations of neural models. Naive probing studies may have misleading results, but various recent works have suggested more reliable methodologies that compensate for the possible pitfalls of probing. However, these best practices are numerous and fast-evolving. To simplify the process of running a set of probing experiments in line with suggested methodologies, we introduce Probe-Ably: an extendable probing framework which supports and automates the application of probing methods to the user’s inputs.

pdf bib
Supporting Context Monotonicity Abstractions in Neural NLI Models
Julia Rozanova | Deborah Ferreira | Mokanarangan Thayaparan | Marco Valentino | André Freitas
Proceedings of the 1st and 2nd Workshops on Natural Logic Meets Machine Learning (NALOMA)

Natural language contexts display logical regularities with respect to substitutions of related concepts: these are captured in a functional order-theoretic property called monotonicity. For a certain class of NLI problems where the resulting entailment label depends only on the context monotonicity and the relation between the substituted concepts, we build on previous techniques that aim to improve the performance of NLI models for these problems, as consistent performance across both upward and downward monotone contexts still seems difficult to attain even for state of the art models. To this end, we reframe the problem of context monotonicity classification to make it compatible with transformer-based pre-trained NLI models and add this task to the training pipeline. Furthermore, we introduce a sound and complete simplified monotonicity logic formalism which describes our treatment of contexts as abstract units. Using the notions in our formalism, we adapt targeted challenge sets to investigate whether an intermediate context monotonicity classification task can aid NLI models’ performance on examples exhibiting monotonicity reasoning.