Automatic identification of unknown names with specific roles

Samia Touileb, Truls Pedersen, Helle Sjøvaag


Abstract
Automatically identifying persons in a particular role within a large corpus can be a difficult task, especially if you don’t know who you are actually looking for. Resources compiling names of persons can be available, but no exhaustive lists exist. However, such lists usually contain known names that are “visible” in the national public sphere, and tend to ignore the marginal and international ones. In this article we propose a method for automatically generating suggestions of names found in a corpus of Norwegian news articles, and which “naturally” belong to a given initial list of members, and that were not known (compiled in a list) beforehand. The approach is based, in part, on the assumption that surface level syntactic features reveal parts of the underlying semantic content and can help uncover the structure of the language.
Anthology ID:
W18-4517
Volume:
Proceedings of the Second Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico
Editors:
Beatrice Alex, Stefania Degaetano-Ortlieb, Anna Feldman, Anna Kazantseva, Nils Reiter, Stan Szpakowicz
Venue:
LaTeCH
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
150–158
Language:
URL:
https://aclanthology.org/W18-4517
DOI:
Bibkey:
Cite (ACL):
Samia Touileb, Truls Pedersen, and Helle Sjøvaag. 2018. Automatic identification of unknown names with specific roles. In Proceedings of the Second Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pages 150–158, Santa Fe, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Automatic identification of unknown names with specific roles (Touileb et al., LaTeCH 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-4517.pdf