Pierre Nugues

2025

Matching and Linking Entries in Historical Swedish Encyclopedias
Simon Börjesson | Erik Ersmark | Pierre Nugues
Proceedings of the 9th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2025)

The Nordisk familjebok is a Swedish encyclopedia from the 19th and 20th centuries. It was written by a team of experts and aimed to be an intellectual reference, stressing precision and accuracy. This encyclopedia had four main editions remarkable by their size, ranging from 20 to 38 volumes. As a consequence, the Nordisk familjebok had a considerable influence in universities, schools, the media, and society overall. As new editions were released, the selection of entries and their content evolved, reflecting intellectual changes in Sweden.In this paper, we used digitized versions from Project Runeberg. We first resegmented the raw text into entries and matched pairs of entries between the first and second editions using semantic sentence embeddings. We then extracted the geographical entries from both editions using a transformer-based classifier and linked them to Wikidata. This enabled us to identify geographic trends and possible shifts between the first and second editions, written between 1876–1899 and 1904–1926, respectively.Interpreting the results, we observe a small but significant shift in geographic focus away from Europe and towards North America, Africa, Asia, Australia, and northern Scandinavia from the first to the second edition, confirming the influence of the First World War and the rise of new powers. The code and data are available on GitHub at https://github.com/sibbo/nordisk-familjebok.

2024

pdf bib abs

Mapping the Past: Geographically Linking an Early 20th Century Swedish Encyclopedia with Wikidata
Axel Ahlin | Alfred Myrne Blåder | Pierre Nugues
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In this paper, we describe the extraction of all the location entries from a prominent Swedish encyclopedia from the early 20th century, the Nordisk Familjebok ‘Nordic Family Book’, focusing on the second edition called Uggleupplagan. This edition comprises 38 volumes and over 182,000 articles, making it one of the most extensive Swedish encyclopedia editions. Using a classifier, we first determined the category of the entities. We found that approximately 22 percent of the encyclopedia entries were locations. We applied a named entity recognition to these entries and we linked them to Wikidata. Wikidata enabled us to extract their precise geographic locations resulting in almost 18,000 valid coordinates. We then analyzed the distribution of these locations and the entry selection process. It showed a concentration within Sweden, Germany, and the United Kingdom. The paper sheds light on the selection and representation of geographic information in the Nordisk Familjebok, providing insights into historical and societal perspectives. It also paves the way for future investigations into entry selection in different time periods and comparative analyses among various encyclopedias.

pdf bib abs

Linking Named Entities in Diderot’s Encyclopédie to Wikidata
Pierre Nugues
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Diderot’s Encyclopédie is a reference work from XVIIIth century in Europe that aimed at collecting the knowledge of its era. Wikipedia has the same ambition with a much greater scope. However, the lack of digital connection between the two encyclopedias may hinder their comparison and the study of how knowledge has evolved. A key element of Wikipedia is Wikidata that backs the articles with a graph of structured data. In this paper, we describe the annotation of more than 9,100 of the Encyclopédie entries with Wikidata identifiers enabling us to connect these entries to the graph. We considered geographic and human entities. The Encyclopédie does not contain biographic entries as they mostly appear as subentries of locations. We extracted all the geographic entries and we completely annotated all the entries containing a description of human entities. This represents more than 2,600 links referring to locations or human entities. In addition, we annotated more than 8,300 entries having a geographic content only. We describe the annotation process as well as application examples. This resource is available at https://github.com/pnugues/encyclopedie_1751.

Pierre Nugues

2025

2024

2022

2020

2019

2018

2017

2016

2015

2014

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Co-authors

Venues