Tracing the Genealogies of Ideas with Sentence Embeddings

Lucian Li

Tracing the Genealogies of Ideas with Sentence Embeddings

Abstract

Detecting intellectual influence in unstructured text is an important problem for a wide range of fields, including intellectual history, social science, and bibliometrics. A wide range of previous studies in computational social science and digital humanities have attempted to resolve this through a range of dictionary, embedding, and language model based methods. I introduce an approach which leverages a sentence embedding index to efficiently search for similar ideas in a large historical corpus. This method remains robust in conditions of high OCR error found in real mass digitized historical corpora that disrupt previous published methods, while also capturing paraphrase and indirect influence. I evaluate this method on a large corpus of 250,000 nonfiction texts from the 19th century, and find that discovered influence is in line with history of science literature. By expanding the scope of our search for influence and the origins of ideas beyond traditional structured corpora and canonical works and figures, we can get a more nuanced perspective on influence and idea dissemination that can encompass epistemically marginalized groups.

Anthology ID:: 2024.nlp4dh-1.2
Volume:: Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities
Month:: November
Year:: 2024
Address:: Miami, USA
Editors:: Mika Hämäläinen, Emily Öhman, So Miyagawa, Khalid Alnajjar, Yuri Bizzoni
Venue:: NLP4DH
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9–16
Language:
URL:: https://aclanthology.org/2024.nlp4dh-1.2
DOI:
Bibkey:
Cite (ACL):: Lucian Li. 2024. Tracing the Genealogies of Ideas with Sentence Embeddings. In Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities, pages 9–16, Miami, USA. Association for Computational Linguistics.
Cite (Informal):: Tracing the Genealogies of Ideas with Sentence Embeddings (Li, NLP4DH 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.nlp4dh-1.2.pdf

PDF Cite Search