Domain-specific Evaluation of Word Embeddings for Philosophical Text using Direct Intrinsic Evaluation

Goya van Boven, Jelke Bloem


Abstract
We perform a direct intrinsic evaluation of word embeddings trained on the works of a single philosopher. Six models are compared to human judgements elicited using two tasks: a synonym detection task and a coherence task. We apply a method that elicits judgements based on explicit knowledge from experts, as the linguistic intuition of non-expert participants might differ from that of the philosopher. We find that an in-domain SVD model has the best 1-nearest neighbours for target terms, while transfer learning-based Nonce2Vec performs better for low frequency target terms.
Anthology ID:
2022.nlp4dh-1.14
Volume:
Proceedings of the 2nd International Workshop on Natural Language Processing for Digital Humanities
Month:
November
Year:
2022
Address:
Taipei, Taiwan
Editors:
Mika Hämäläinen, Khalid Alnajjar, Niko Partanen, Jack Rueter
Venue:
NLP4DH
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
101–107
Language:
URL:
https://aclanthology.org/2022.nlp4dh-1.14
DOI:
Bibkey:
Cite (ACL):
Goya van Boven and Jelke Bloem. 2022. Domain-specific Evaluation of Word Embeddings for Philosophical Text using Direct Intrinsic Evaluation. In Proceedings of the 2nd International Workshop on Natural Language Processing for Digital Humanities, pages 101–107, Taipei, Taiwan. Association for Computational Linguistics.
Cite (Informal):
Domain-specific Evaluation of Word Embeddings for Philosophical Text using Direct Intrinsic Evaluation (van Boven & Bloem, NLP4DH 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.nlp4dh-1.14.pdf