Piper Vasicek


2024

pdf bib
This Reference Does Not Exist: An Exploration of LLM Citation Accuracy and Relevance
Courtni Byun | Piper Vasicek | Kevin Seppi
Proceedings of the Third Workshop on Bridging Human--Computer Interaction and Natural Language Processing

Citations are a fundamental and indispensable part of research writing. They provide support and lend credibility to research findings. Recent GPT-fueled interest in large language models (LLMs) has shone a spotlight on the capabilities and limitations of these models when generating relevant citations for a document. Recent work has focused largely on title and author accuracy. We underline this effort and expand on it with a preliminary exploration in relevance of model-recommended citations. We define three citation-recommendation tasks. We also collect and annotate a dataset of model-recommended citations for those tasks. We find that GPT-4 largely outperforms earlier models on both author and title accuracy in two markedly different CS venues, but may not recommend references that are more relevant than those recommended by the earlier models. The two venues we compare are CHI and EMNLP. All models appear to perform better at recommending EMNLP papers than CHI papers.

pdf bib
It is a Truth Individually Acknowledged: Cross-references On Demand
Piper Vasicek | Courtni Byun | Kevin Seppi
Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities

Cross-references link source passages of text to other passages that elucidate the source passage in some way and can deepen human understanding. Despite their usefulness, however, good cross-references are hard to find, and extensive sets of cross-references only exist for the few most highly studied books such as the Bible, for which scholars have been collecting cross-references for hundreds of years. Therefore, we propose a new task: generate cross-references for user-selected text on demand. We define a metric, coverage, to evaluate task performance. We adapt several models to generate cross references, including an Anchor Words topic model, SBERT SentenceTransformers, and ChatGPT, and evaluate their coverage in both English and German on existing cross-reference datasets. While ChatGPT outperforms other models on these datasets, this is likely due to data contamination. We hand-evaluate performance on the well-known works of Jane Austen and a less-known science fiction series Sons of the Starfarers by Joe Vasicek, finding that ChatGPT does not perform as well on these works; sentence embeddings perform best. We experiment with newer LLMs and large context windows, and suggest that future work should focus on deploying cross-references on-demand with readers to determine their effectiveness in the wild.