Joshua Bensemann


2022

pdf bib
Eye Gaze and Self-attention: How Humans and Transformers Attend Words in Sentences
Joshua Bensemann | Alex Peng | Diana Benavides-Prado | Yang Chen | Neset Tan | Paul Michael Corballis | Patricia Riddle | Michael Witbrock
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

Attention describes cognitive processes that are important to many human phenomena including reading. The term is also used to describe the way in which transformer neural networks perform natural language processing. While attention appears to be very different under these two contexts, this paper presents an analysis of the correlations between transformer attention and overt human attention during reading tasks. An extensive analysis of human eye tracking datasets showed that the dwell times of human eye movements were strongly correlated with the attention patterns occurring in the early layers of pre-trained transformers such as BERT. Additionally, the strength of a correlation was not related to the number of parameters within a transformer. This suggests that something about the transformers’ architecture determined how closely the two measures were correlated.

pdf bib
AbductionRules: Training Transformers to Explain Unexpected Inputs
Nathan Young | Qiming Bao | Joshua Bensemann | Michael Witbrock
Findings of the Association for Computational Linguistics: ACL 2022

Transformers have recently been shown to be capable of reliably performing logical reasoning over facts and rules expressed in natural language, but abductive reasoning - inference to the best explanation of an unexpected observation - has been underexplored despite significant applications to scientific discovery, common-sense reasoning, and model interpretability. This paper presents AbductionRules, a group of natural language datasets designed to train and test generalisable abduction over natural-language knowledge bases. We use these datasets to finetune pretrained Transformers and discuss their performance, finding that our models learned generalisable abductive techniques but also learned to exploit the structure of our data. Finally, we discuss the viability of this approach to abductive reasoning and ways in which it may be improved in future work.