Empirical Linguistic Study of Sentence Embeddings

Katarzyna Krasnowska-Kieraś, Alina Wróblewska


Abstract
The purpose of the research is to answer the question whether linguistic information is retained in vector representations of sentences. We introduce a method of analysing the content of sentence embeddings based on universal probing tasks, along with the classification datasets for two contrasting languages. We perform a series of probing and downstream experiments with different types of sentence embeddings, followed by a thorough analysis of the experimental results. Aside from dependency parser-based embeddings, linguistic information is retained best in the recently proposed LASER sentence embeddings.
Anthology ID:
P19-1573
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2019
Address:
Florence, Italy
Editors:
Anna Korhonen, David Traum, Lluís Màrquez
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5729–5739
Language:
URL:
https://aclanthology.org/P19-1573
DOI:
10.18653/v1/P19-1573
Bibkey:
Cite (ACL):
Katarzyna Krasnowska-Kieraś and Alina Wróblewska. 2019. Empirical Linguistic Study of Sentence Embeddings. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5729–5739, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Empirical Linguistic Study of Sentence Embeddings (Krasnowska-Kieraś & Wróblewska, ACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/P19-1573.pdf
Video:
 https://aclanthology.org/P19-1573.mp4
Data
SentEvalUniversal Dependencies