Learning and Evaluating Character Representations in Novels

Naoya Inoue, Charuta Pethe, Allen Kim, Steven Skiena


Abstract
We address the problem of learning fixed-length vector representations of characters in novels. Recent advances in word embeddings have proven successful in learning entity representations from short texts, but fall short on longer documents because they do not capture full book-level information. To overcome the weakness of such text-based embeddings, we propose two novel methods for representing characters: (i) graph neural network-based embeddings from a full corpus-based character network; and (ii) low-dimensional embeddings constructed from the occurrence pattern of characters in each novel. We test the quality of these character embeddings using a new benchmark suite to evaluate character representations, encompassing 12 different tasks. We show that our representation techniques combined with text-based embeddings lead to the best character representations, outperforming text-based embeddings in four tasks. Our dataset and evaluation script will be made publicly available to stimulate additional work in this area.
Anthology ID:
2022.findings-acl.81
Volume:
Findings of the Association for Computational Linguistics: ACL 2022
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venues:
ACL | Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1008–1019
Language:
URL:
https://aclanthology.org/2022.findings-acl.81
DOI:
10.18653/v1/2022.findings-acl.81
Bibkey:
Cite (ACL):
Naoya Inoue, Charuta Pethe, Allen Kim, and Steven Skiena. 2022. Learning and Evaluating Character Representations in Novels. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1008–1019, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Learning and Evaluating Character Representations in Novels (Inoue et al., Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-acl.81.pdf
Code
 naoya-i/charembench