Semantic Indexing of Multilingual Corpora and its Application on the History Domain

Alessandro Raganato, Jose Camacho-Collados, Antonio Raganato, Yunseo Joung


Abstract
The increasing amount of multilingual text collections available in different domains makes its automatic processing essential for the development of a given field. However, standard processing techniques based on statistical clues and keyword searches have clear limitations. Instead, we propose a knowledge-based processing pipeline which overcomes most of the limitations of these techniques. This, in turn, enables direct comparison across texts in different languages without the need of translation. In this paper we show the potential of this approach for semantically indexing multilingual text collections in the history domain. In our experiments we used a version of the Bible translated in four different languages, evaluating the precision of our semantic indexing pipeline and showing its reliability on the cross-lingual text retrieval task.
Anthology ID:
W16-4019
Volume:
Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Erhard Hinrichs, Marie Hinrichs, Thorsten Trippel
Venue:
LT4DH
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
140–147
Language:
URL:
https://aclanthology.org/W16-4019
DOI:
Bibkey:
Cite (ACL):
Alessandro Raganato, Jose Camacho-Collados, Antonio Raganato, and Yunseo Joung. 2016. Semantic Indexing of Multilingual Corpora and its Application on the History Domain. In Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH), pages 140–147, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Semantic Indexing of Multilingual Corpora and its Application on the History Domain (Raganato et al., LT4DH 2016)
Copy Citation:
PDF:
https://aclanthology.org/W16-4019.pdf