The Early Modern Dutch Mediascape. Detecting Media Mentions in Chronicles Using Word Embeddings and CRF

Alie Lassche, Roser Morante


Abstract
While the production of information in the European early modern period is a well-researched topic, the question how people were engaging with the information explosion that occurred in early modern Europe, is still underexposed. This paper presents the annotations and experiments aimed at exploring whether we can automatically extract media related information (source, perception, and receiver) from a corpus of early modern Dutch chronicles in order to get insight in the mediascape of early modern middle class people from a historic perspective. In a number of classification experiments with Conditional Random Fields, three categories of features are tested: (i) raw and binary word embedding features, (ii) lexicon features, and (iii) character features. Overall, the classifier that uses raw embeddings performs slightly better. However, given that the best F-scores are around 0.60, we conclude that the machine learning approach needs to be combined with a close reading approach for the results to be useful to answer history research questions.
Anthology ID:
2021.latechclfl-1.1
Volume:
Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic (online)
Venues:
CLFL | EMNLP | LaTeCH | LaTeCHCLfL
SIG:
SIGHUM
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–10
Language:
URL:
https://aclanthology.org/2021.latechclfl-1.1
DOI:
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2021.latechclfl-1.1.pdf
Code
 awlassche/media-mentions-latech