Language in a (Search) Box: Grounding Language Learning in Real-World Human-Machine Interaction

Federico Bianchi, Ciro Greco, Jacopo Tagliabue


Abstract
We investigate grounded language learning through real-world data, by modelling a teacher-learner dynamics through the natural interactions occurring between users and search engines; in particular, we explore the emergence of semantic generalization from unsupervised dense representations outside of synthetic environments. A grounding domain, a denotation function and a composition function are learned from user data only. We show how the resulting semantics for noun phrases exhibits compositional properties while being fully learnable without any explicit labelling. We benchmark our grounded semantics on compositionality and zero-shot inference tasks, and we show that it provides better results and better generalizations than SOTA non-grounded models, such as word2vec and BERT.
Anthology ID:
2021.naacl-main.348
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4409–4415
Language:
URL:
https://aclanthology.org/2021.naacl-main.348
DOI:
10.18653/v1/2021.naacl-main.348
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2021.naacl-main.348.pdf