Automatic Alignment and Annotation Projection for Literary Texts

Uli Steinbach, Ines Rehbein


Abstract
This paper presents a modular NLP pipeline for the creation of a parallel literature corpus, followed by annotation transfer from the source to the target language. The test case we use to evaluate our pipeline is the automatic transfer of quote and speaker mention annotations from English to German. We evaluate the different components of the pipeline and discuss challenges specific to literary texts. Our experiments show that after applying a reasonable amount of semi-automatic postprocessing we can obtain high-quality aligned and annotated resources for a new language.
Anthology ID:
W19-2505
Volume:
Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Month:
June
Year:
2019
Address:
Minneapolis, USA
Editors:
Beatrice Alex, Stefania Degaetano-Ortlieb, Anna Kazantseva, Nils Reiter, Stan Szpakowicz
Venue:
LaTeCH
SIG:
SIGHUM
Publisher:
Association for Computational Linguistics
Note:
Pages:
35–45
Language:
URL:
https://aclanthology.org/W19-2505
DOI:
10.18653/v1/W19-2505
Bibkey:
Cite (ACL):
Uli Steinbach and Ines Rehbein. 2019. Automatic Alignment and Annotation Projection for Literary Texts. In Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pages 35–45, Minneapolis, USA. Association for Computational Linguistics.
Cite (Informal):
Automatic Alignment and Annotation Projection for Literary Texts (Steinbach & Rehbein, LaTeCH 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-2505.pdf