We present a systematic procedure for interrater disagreement resolution. The procedure is general, but of particular use in multiple-annotator tasks geared towards ground truth construction. We motivate our proposal by arguing that, barring cases in which the researchers’ goal is to elicit different viewpoints, interrater disagreement is a sign of poor quality in the design or the description of a task. Consensus among annotators, we maintain, should be striven for, through a systematic procedure for disagreement resolution such as the one we describe.
Computational linguistic research on language change through distributional semantic (DS) models has inspired researchers from fields such as philosophy and literary studies, who use these methods for the exploration and comparison of comparatively small datasets traditionally analyzed by close reading. Research on methods for small data is still in early stages and it is not clear which methods achieve the best results. We investigate the possibilities and limitations of using distributional semantic models for analyzing philosophical data by means of a realistic use-case. We provide a ground truth for evaluation created by philosophy experts and a blueprint for using DS models in a sound methodological setup. We compare three methods for creating specialized models from small datasets. Though the models do not perform well enough to directly support philosophers yet, we find that models designed for small data yield promising directions for future work.
We present a novel, domain expert-controlled, replicable procedure for the construction of concept-modeling ground truths with the aim of evaluating the application of word embeddings. In particular, our method is designed to evaluate the application of word and paragraph embeddings in concept-focused textual domains, where a generic ontology does not provide enough information. We illustrate the procedure, and validate it by describing the construction of an expert ground truth, QuiNE-GT. QuiNE-GT is built to answer research questions concerning the concept of naturalized epistemology in QUINE, a 2-million-token, single-author, 20th-century English philosophy corpus of outstanding quality, cleaned up and enriched for the purpose. To the best of our ken, expert concept-modeling ground truths are extremely rare in current literature, nor has the theoretical methodology behind their construction ever been explicitly conceptualised and properly systematised. Expert-controlled concept-modeling ground truths are however essential to allow proper evaluation of word embeddings techniques, and increase their trustworthiness in specialised domains in which the detection of concepts through their expression in texts is important. We highlight challenges, requirements, and prospects for future work.
We address the problem of creating and evaluating quality Neo-Latin word embeddings for the purpose of philosophical research, adapting the Nonce2Vec tool to learn embeddings from Neo-Latin sentences. This distributional semantic modeling tool can learn from tiny data incrementally, using a larger background corpus for initialization. We conduct two evaluation tasks: definitional learning of Latin Wikipedia terms, and learning consistent embeddings from 18th century Neo-Latin sentences pertaining to the concept of mathematical method. Our results show that consistent Neo-Latin word embeddings can be learned from this type of data. While our evaluation results are promising, they do not reveal to what extent the learned models match domain expert knowledge of our Neo-Latin texts. Therefore, we propose an additional evaluation method, grounded in expert-annotated data, that would assess whether learned representations are conceptually sound in relation to the domain of study.