Meta-Embedding Sentence Representation for Textual Similarity

Amir Hazem, Nicolas Hernandez


Abstract
Word embedding models are now widely used in most NLP applications. Despite their effectiveness, there is no clear evidence about the choice of the most appropriate model. It often depends on the nature of the task and on the quality and size of the used data sets. This remains true for bottom-up sentence embedding models. However, no straightforward investigation has been conducted so far. In this paper, we propose a systematic study of the impact of the main word embedding models on sentence representation. By contrasting in-domain and pre-trained embedding models, we show under which conditions they can be jointly used for bottom-up sentence embeddings. Finally, we propose the first bottom-up meta-embedding representation at the sentence level for textual similarity. Significant improvements are observed in several tasks including question-to-question similarity, paraphrasing and next utterance ranking.
Anthology ID:
R19-1055
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Month:
September
Year:
2019
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
465–473
Language:
URL:
https://aclanthology.org/R19-1055
DOI:
10.26615/978-954-452-056-4_055
Bibkey:
Cite (ACL):
Amir Hazem and Nicolas Hernandez. 2019. Meta-Embedding Sentence Representation for Textual Similarity. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 465–473, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Meta-Embedding Sentence Representation for Textual Similarity (Hazem & Hernandez, RANLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/R19-1055.pdf