Domain Adapted Word Embeddings for Improved Sentiment Classification

Prathusha Kameswara Sarma, Yingyu Liang, Bill Sethares


Abstract
Generic word embeddings are trained on large-scale generic corpora; Domain Specific (DS) word embeddings are trained only on data from a domain of interest. This paper proposes a method to combine the breadth of generic embeddings with the specificity of domain specific embeddings. The resulting embeddings, called Domain Adapted (DA) word embeddings, are formed by first aligning corresponding word vectors using Canonical Correlation Analysis (CCA) or the related nonlinear Kernel CCA (KCCA) and then combining them via convex optimization. Results from evaluation on sentiment classification tasks show that the DA embeddings substantially outperform both generic, DS embeddings when used as input features to standard or state-of-the-art sentence encoding algorithms for classification.
Anthology ID:
W18-3407
Volume:
Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP
Month:
July
Year:
2018
Address:
Melbourne
Editors:
Reza Haffari, Colin Cherry, George Foster, Shahram Khadivi, Bahar Salehi
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
51–59
Language:
URL:
https://aclanthology.org/W18-3407/
DOI:
10.18653/v1/W18-3407
Bibkey:
Cite (ACL):
Prathusha Kameswara Sarma, Yingyu Liang, and Bill Sethares. 2018. Domain Adapted Word Embeddings for Improved Sentiment Classification. In Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP, pages 51–59, Melbourne. Association for Computational Linguistics.
Cite (Informal):
Domain Adapted Word Embeddings for Improved Sentiment Classification (Kameswara Sarma et al., ACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-3407.pdf
Note:
 W18-3407.Notes.pdf