Searching for the X-Factor: Exploring Corpus Subjectivity for Word Embeddings

Maksim Tkachenko, Chong Cher Chia, Hady Lauw


Abstract
We explore the notion of subjectivity, and hypothesize that word embeddings learnt from input corpora of varying levels of subjectivity behave differently on natural language processing tasks such as classifying a sentence by sentiment, subjectivity, or topic. Through systematic comparative analyses, we establish this to be the case indeed. Moreover, based on the discovery of the outsized role that sentiment words play on subjectivity-sensitive tasks such as sentiment classification, we develop a novel word embedding SentiVec which is infused with sentiment information from a lexical resource, and is shown to outperform baselines on such tasks.
Anthology ID:
P18-1112
Volume:
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Iryna Gurevych, Yusuke Miyao
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1212–1221
Language:
URL:
https://aclanthology.org/P18-1112
DOI:
10.18653/v1/P18-1112
Bibkey:
Cite (ACL):
Maksim Tkachenko, Chong Cher Chia, and Hady Lauw. 2018. Searching for the X-Factor: Exploring Corpus Subjectivity for Word Embeddings. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1212–1221, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
Searching for the X-Factor: Exploring Corpus Subjectivity for Word Embeddings (Tkachenko et al., ACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/P18-1112.pdf
Presentation:
 P18-1112.Presentation.pdf
Video:
 https://aclanthology.org/P18-1112.mp4