Zachary Pardos


2020

pdf bib
Lexical Relation Mining in Neural Word Embeddings
Aishwarya Jadhav | Yifat Amir | Zachary Pardos
Proceedings of the 28th International Conference on Computational Linguistics

Work with neural word embeddings and lexical relations has largely focused on confirmatory experiments which use human-curated examples of semantic and syntactic relations to validate against. In this paper, we explore the degree to which lexical relations, such as those found in popular validation sets, can be derived and extended from a variety of neural embeddings using classical clustering methods. We show that the Word2Vec space of word-pairs (i.e., offset vectors) significantly outperforms other more contemporary methods, even in the presence of a large number of noisy offsets. Moreover, we show that via a simple nearest neighbor approach in the offset space, new examples of known relations can be discovered. Our results speak to the amenability of offset vectors from non-contextual neural embeddings to find semantically coherent clusters. This simple approach has implications for the exploration of emergent regularities and their examples, such as emerging trends on social media and their related posts.

pdf bib
Understanding the Source of Semantic Regularities in Word Embeddings
Hsiao-Yu Chiang | Jose Camacho-Collados | Zachary Pardos
Proceedings of the 24th Conference on Computational Natural Language Learning

Semantic relations are core to how humans understand and express concepts in the real world using language. Recently, there has been a thread of research aimed at modeling these relations by learning vector representations from text corpora. Most of these approaches focus strictly on leveraging the co-occurrences of relationship word pairs within sentences. In this paper, we investigate the hypothesis that examples of a lexical relation in a corpus are fundamental to a neural word embedding’s ability to complete analogies involving the relation. Our experiments, in which we remove all known examples of a relation from training corpora, show only marginal degradation in analogy completion performance involving the removed relation. This finding enhances our understanding of neural word embeddings, showing that co-occurrence information of a particular semantic relation is not the main source of their structural regularity.