Alexander Hoyle

2024

pdf bib abs
A SMART Mnemonic Sounds like “Glue Tonic”: Mixing LLMs with Student Feedback to Make Mnemonic Learning Stick
Nishant Balepur | Matthew Shu | Alexander Hoyle | Alison Robey | Shi Feng | Seraphina Goldfarb-Tarrant | Jordan Lee Boyd-Graber
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Keyword mnemonics are memorable explanations that link new terms to simpler keywords.Prior work generates mnemonics for students, but they do not train models using mnemonics students prefer and aid learning.We build SMART, a mnemonic generator trained on feedback from real students learning new terms.To train SMART, we first fine-tune LLaMA-2 on a curated set of user-written mnemonics.We then use LLM alignment to enhance SMART: we deploy mnemonics generated by SMART in a flashcard app to find preferences on mnemonics students favor.We gather 2684 preferences from 45 students across two types: **expressed** (inferred from ratings) and **observed** (inferred from student learning), yielding three key findings.First, expressed and observed preferences disagree; what students *think* is helpful does not always capture what is *truly* helpful.Second, Bayesian models can synthesize complementary data from multiple preference types into a single effectiveness signal.SMART is tuned via Direct Preference Optimization on this signal, which resolves ties and missing labels in the typical method of pairwise comparisons, augmenting data for LLM output quality gains. Third, mnemonic experts assess SMART as matching GPT-4 at much lower deployment costs, showing the utility of capturing diverse student feedback to align LLMs in education.

pdf bib abs
TopicGPT: A Prompt-based Topic Modeling Framework
Chau Minh Pham | Alexander Hoyle | Simeng Sun | Philip Resnik | Mohit Iyyer
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Topic modeling is a well-established technique for exploring text corpora. Conventional topic models (e.g., LDA) represent topics as bags of words that often require “reading the tea leaves” to interpret; additionally, they offer users minimal control over the formatting and specificity of resulting topics. To tackle these issues, we introduce TopicGPT, a prompt-based framework that uses large language models (LLMs) to uncover latent topics in a text collection. TopicGPT produces topics that align better with human categorizations compared to competing methods: it achieves a harmonic mean purity of 0.74 against human-annotated Wikipedia topics compared to 0.64 for the strongest baseline. Its topics are also more interpretable, dispensing with ambiguous bags of words in favor of topics with natural language labels and associated free-form descriptions. Moreover, the framework is highly adaptable, allowing users to specify constraints and modify topics without the need for model retraining. By streamlining access to high-quality and interpretable topics, TopicGPT represents a compelling, human-centered approach to topic modeling.

pdf bib abs
A First Step towards Measuring Interdisciplinary Engagement in Scientific Publications: A Case Study on NLP + CSS Research
Alexandria Leto | Shamik Roy | Alexander Hoyle | Daniel Acuna | Maria Leonor Pacheco
Proceedings of the Sixth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS 2024)

With the rise in the prevalence of cross-disciplinary research, there is a need to develop methods to characterize its practices. Current computational methods to evaluate interdisciplinary engagement—such as affiliation diversity, keywords, and citation patterns—are insufficient to model the degree of engagement between disciplines, as well as the way in which the complementary expertise of co-authors is harnessed. In this paper, we propose an automated framework to address some of these issues on a large scale. Our framework tracks interdisciplinary citations in scientific articles and models: 1) the section and position in which they appear, and 2) the argumentative role that they play in the writing. To showcase our framework, we perform a preliminary analysis of interdisciplinary engagement in published work at the intersection of natural language processing and computational social science in the last decade.

2023

pdf bib abs
Revisiting Automated Topic Model Evaluation with Large Language Models
Dominik Stammbach | Vilém Zouhar | Alexander Hoyle | Mrinmaya Sachan | Elliott Ash
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Topic models help us make sense of large text collections. Automatically evaluating their output and determining the optimal number of topics are both longstanding challenges, with no effective automated solutions to date. This paper proposes using large language models (LLMs) for these tasks. We find that LLMs appropriately assess the resulting topics, correlating more strongly with human judgments than existing automated metrics. However, the setup of the evaluation task is crucial — LLMs perform better on coherence ratings of word sets than on intrustion detection. We find that LLMs can also assist us in guiding us towards a reasonable number of topics. In actual applications, topic models are typically used to answer a research question related to a collection of texts. We can incorporate this research question in the prompt to the LLM, which helps estimating the optimal number of topics.

pdf bib abs
Natural Language Decompositions of Implicit Content Enable Better Text Representations
Alexander Hoyle | Rupak Sarkar | Pranav Goel | Philip Resnik
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

When people interpret text, they rely on inferences that go beyond the observed language itself. Inspired by this observation, we introduce a method for the analysis of text that takes implicitly communicated content explicitly into account. We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed, then validate the plausibility of the generated content via human judgments. Incorporating these explicit representations of implicit content proves useful in multiple problem settings that involve the human interpretation of utterances: assessing the similarity of arguments, making sense of a body of opinion data, and modeling legislative behavior. Our results suggest that modeling the meanings behind observed language, rather than the literal text alone, is a valuable direction for NLP and particularly its applications to social science.