Andrew Schneider
2022
COIN – an Inexpensive and Strong Baseline for Predicting Out of Vocabulary Word Embeddings
Andrew Schneider
|
Lihong He
|
Zhijia Chen
|
Arjun Mukherjee
|
Eduard Dragut
Proceedings of the 29th International Conference on Computational Linguistics
Social media is the ultimate challenge for many natural language processing tools. The constant emergence of linguistic constructs challenge even the most sophisticated NLP tools. Predicting word embeddings for out of vocabulary words is one of those challenges. Word embedding models only include terms that occur a sufficient number of times in their training corpora. Word embedding vector models are unable to directly provide any useful information about a word not in their vocabularies. We propose a fast method for predicting vectors for out of vocabulary terms that makes use of the surrounding terms of the unknown term and the hidden context layer of the word2vec model. We propose this method as a strong baseline in the sense that 1) while it does not surpass all state-of-the-art methods, it surpasses several techniques for vector prediction on benchmark tasks, 2) even when it underperforms, the margin is very small retaining competitive performance in downstream tasks, and 3) it is inexpensive to compute, requiring no additional training stage. We also show that our technique can be incorporated into existing methods to achieve a new state-of-the-art on the word vector prediction problem.
2018
DebugSL: An Interactive Tool for Debugging Sentiment Lexicons
Andrew Schneider
|
John Male
|
Saroja Bhogadhi
|
Eduard Dragut
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations
We introduce DebugSL, a visual (Web) debugging tool for sentiment lexicons (SLs). Its core component implements our algorithms for the automatic detection of polarity inconsistencies in SLs. An inconsistency is a set of words and/or word-senses whose polarity assignments cannot all be simultaneously satisfied. DebugSL finds inconsistencies of small sizes in SLs and has a rich user interface which helps users in the correction process. The project source code is available at https://github.com/atschneid/DebugSL A screencast of DebugSL can be viewed at https://cis.temple.edu/~edragut/DebugSL.webm
2015
Towards Debugging Sentiment Lexicons
Andrew Schneider
|
Eduard Dragut
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Search
Fix data
Co-authors
- Eduard Dragut 3
- Saroja Bhogadhi 1
- Zhijia Chen 1
- Lihong He 1
- John Male 1
- show all...