Jesse Roberts


2024

pdf bib
Large Language Model Recall Uncertainty is Modulated by the Fan Effect
Jesse Roberts | Kyle Moore | Douglas Fisher | Oseremhen Ewaleifoh | Thao Pham
Proceedings of the 28th Conference on Computational Natural Language Learning

This paper evaluates whether large language models (LLMs) exhibit cognitive fan effects, similar to those discovered by Anderson in humans, after being pre-trained on human textual data. We conduct two sets of in-context recall experiments designed to elicit fan effects. Consistent with human results, we find that LLM recall uncertainty, measured via token probability, is influenced by the fan effect. Our results show that removing uncertainty disrupts the observed effect. The experiments suggest the fan effect is consistent whether the fan value is induced in-context or in the pre-training data. Finally, these findings provide in-silico evidence that fan effects and typicality are expressions of the same phenomena.

pdf bib
The Base-Rate Effect on LLM Benchmark Performance: Disambiguating Test-Taking Strategies from Benchmark Performance
Kyle Moore | Jesse Roberts | Thao Pham | Oseremhen Ewaleifoh | Douglas Fisher
Findings of the Association for Computational Linguistics: EMNLP 2024

Cloze testing is a common method for measuring the behavior of large language models on a number of benchmark tasks. Using the MMLU dataset, we show that the base-rate probability (BRP) differences across answer tokens are significant and affect task performance ie. guess A if uncertain. We find that counterfactual prompting does sufficiently mitigate the BRP effect. The BRP effect is found to have a similar effect to test taking strategies employed by humans leading to the conflation of task performance and test-taking ability. We propose the Nvr-X-MMLU task, a variation of MMLU, which helps to disambiguate test-taking ability from task performance and reports the latter.

pdf bib
Investigating Expert-in-the-Loop LLM Discourse Patterns for Ancient Intertextual Analysis
Ray Umphrey | Jesse Roberts | Lindsey Roberts
Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities

This study explores the potential of large language models (LLMs) for identifying and examining intertextual relationships within biblical, koine Greek texts. By evaluating the performance of LLMs on various intertextuality scenarios the study demonstrates that these models can detect direct quotations, allusions, and echoes between texts. The LLM’s ability to generate novel intertextual observations and connections highlights its potential to uncover new insights. However, the model also struggles with long query passages and the inclusion of false intertextual dependences, emphasizing the importance of expert evaluation. The expert-in-the-loop methodology presented offers a scalable approach for intertextual research into the complex web of intertextuality within and beyond the biblical corpus.