A Document Descriptor using Covariance of Word Vectors

Marwan Torki


Abstract
In this paper, we address the problem of finding a novel document descriptor based on the covariance matrix of the word vectors of a document. Our descriptor has a fixed length, which makes it easy to use in many supervised and unsupervised applications. We tested our novel descriptor in different tasks including supervised and unsupervised settings. Our evaluation shows that our document covariance descriptor fits different tasks with competitive performance against state-of-the-art methods.
Anthology ID:
P18-2084
Volume:
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Iryna Gurevych, Yusuke Miyao
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
527–532
Language:
URL:
https://aclanthology.org/P18-2084
DOI:
10.18653/v1/P18-2084
Bibkey:
Cite (ACL):
Marwan Torki. 2018. A Document Descriptor using Covariance of Word Vectors. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 527–532, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
A Document Descriptor using Covariance of Word Vectors (Torki, ACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/P18-2084.pdf
Poster:
 P18-2084.Poster.pdf
Data
IMDb Movie ReviewsSICK