It is a Truth Individually Acknowledged: Cross-references On Demand

Piper Vasicek; Courtni Byun; Kevin Seppi

doi:10.18653/v1/2024.nlp4dh-1.7

It is a Truth Individually Acknowledged: Cross-references On Demand

Piper Vasicek, Courtni Byun, Kevin Seppi

Abstract

Cross-references link source passages of text to other passages that elucidate the source passage in some way and can deepen human understanding. Despite their usefulness, however, good cross-references are hard to find, and extensive sets of cross-references only exist for the few most highly studied books such as the Bible, for which scholars have been collecting cross-references for hundreds of years. Therefore, we propose a new task: generate cross-references for user-selected text on demand. We define a metric, coverage, to evaluate task performance. We adapt several models to generate cross references, including an Anchor Words topic model, SBERT SentenceTransformers, and ChatGPT, and evaluate their coverage in both English and German on existing cross-reference datasets. While ChatGPT outperforms other models on these datasets, this is likely due to data contamination. We hand-evaluate performance on the well-known works of Jane Austen and a less-known science fiction series Sons of the Starfarers by Joe Vasicek, finding that ChatGPT does not perform as well on these works; sentence embeddings perform best. We experiment with newer LLMs and large context windows, and suggest that future work should focus on deploying cross-references on-demand with readers to determine their effectiveness in the wild.

Anthology ID:: 2024.nlp4dh-1.7
Volume:: Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities
Month:: November
Year:: 2024
Address:: Miami, USA
Editors:: Mika Hämäläinen, Emily Öhman, So Miyagawa, Khalid Alnajjar, Yuri Bizzoni
Venues:: NLP4DH | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 63–74
Language:
URL:: https://aclanthology.org/2024.nlp4dh-1.7/
DOI:: 10.18653/v1/2024.nlp4dh-1.7
Bibkey:
Cite (ACL):: Piper Vasicek, Courtni Byun, and Kevin Seppi. 2024. It is a Truth Individually Acknowledged: Cross-references On Demand. In Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities, pages 63–74, Miami, USA. Association for Computational Linguistics.
Cite (Informal):: It is a Truth Individually Acknowledged: Cross-references On Demand (Vasicek et al., NLP4DH 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.nlp4dh-1.7.pdf

PDF Cite Search Fix data