Challenges in including extra-linguistic context in pre-trained language models

Ionut Sorodoc, Laura Aina, Gemma Boleda


Abstract
To successfully account for language, computational models need to take into account both the linguistic context (the content of the utterances) and the extra-linguistic context (for instance, the participants in a dialogue). We focus on a referential task that asks models to link entity mentions in a TV show to the corresponding characters, and design an architecture that attempts to account for both kinds of context. In particular, our architecture combines a previously proposed specialized module (an “entity library”) for character representation with transfer learning from a pre-trained language model. We find that, although the model does improve linguistic contextualization, it fails to successfully integrate extra-linguistic information about the participants in the dialogue. Our work shows that it is very challenging to incorporate extra-linguistic information into pre-trained language models.
Anthology ID:
2022.insights-1.18
Volume:
Proceedings of the Third Workshop on Insights from Negative Results in NLP
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Shabnam Tafreshi, João Sedoc, Anna Rogers, Aleksandr Drozd, Anna Rumshisky, Arjun Akula
Venue:
insights
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
134–138
Language:
URL:
https://aclanthology.org/2022.insights-1.18
DOI:
10.18653/v1/2022.insights-1.18
Bibkey:
Cite (ACL):
Ionut Sorodoc, Laura Aina, and Gemma Boleda. 2022. Challenges in including extra-linguistic context in pre-trained language models. In Proceedings of the Third Workshop on Insights from Negative Results in NLP, pages 134–138, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Challenges in including extra-linguistic context in pre-trained language models (Sorodoc et al., insights 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.insights-1.18.pdf
Video:
 https://aclanthology.org/2022.insights-1.18.mp4