Evaluating Language Models in Location Referring Expression Extraction from Early Modern and Contemporary Japanese Texts

Ayuki Katayama, Yusuke Sakai, Shohei Higashiyama, Hiroki Ouchi, Ayano Takeuchi, Ryo Bando, Yuta Hashimoto, Toshinobu Ogiso, Taro Watanabe


Abstract
Automatic extraction of geographic information, including Location Referring Expressions (LREs), can aid humanities research in analyzing large collections of historical texts. In this study, to investigate how accurate pretrained Transformer language models (LMs) can extract LREs from historical texts, we evaluate two representative types of LMs, namely, masked language model and causal language model, using early modern and contemporary Japanese datasets. Our experimental results demonstrated the potential of contemporary LMs for historical texts, but also suggest the need for further model enhancement, such as pretraining on historical texts.
Anthology ID:
2024.nlp4dh-1.33
Volume:
Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities
Month:
November
Year:
2024
Address:
Miami, USA
Editors:
Mika Hämäläinen, Emily Öhman, So Miyagawa, Khalid Alnajjar, Yuri Bizzoni
Venue:
NLP4DH
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
331–338
Language:
URL:
https://aclanthology.org/2024.nlp4dh-1.33
DOI:
Bibkey:
Cite (ACL):
Ayuki Katayama, Yusuke Sakai, Shohei Higashiyama, Hiroki Ouchi, Ayano Takeuchi, Ryo Bando, Yuta Hashimoto, Toshinobu Ogiso, and Taro Watanabe. 2024. Evaluating Language Models in Location Referring Expression Extraction from Early Modern and Contemporary Japanese Texts. In Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities, pages 331–338, Miami, USA. Association for Computational Linguistics.
Cite (Informal):
Evaluating Language Models in Location Referring Expression Extraction from Early Modern and Contemporary Japanese Texts (Katayama et al., NLP4DH 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.nlp4dh-1.33.pdf