New or Old? Exploring How Pre-Trained Language Models Represent Discourse Entities

Sharid Loáiciga, Anne Beyer, David Schlangen


Abstract
Recent research shows that pre-trained language models, built to generate text conditioned on some context, learn to encode syntactic knowledge to a certain degree. This has motivated researchers to move beyond the sentence-level and look into their ability to encode less studied discourse-level phenomena. In this paper, we add to the body of probing research by investigating discourse entity representations in large pre-trained language models in English. Motivated by early theories of discourse and key pieces of previous work, we focus on the information-status of entities as discourse-new or discourse-old. We present two probing models, one based on binary classification and another one on sequence labeling. The results of our experiments show that pre-trained language models do encode information on whether an entity has been introduced before or not in the discourse. However, this information alone is not sufficient to find the entities in a discourse, opening up interesting questions about the definition of entities for future work.
Anthology ID:
2022.coling-1.73
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
875–886
Language:
URL:
https://aclanthology.org/2022.coling-1.73
DOI:
Bibkey:
Cite (ACL):
Sharid Loáiciga, Anne Beyer, and David Schlangen. 2022. New or Old? Exploring How Pre-Trained Language Models Represent Discourse Entities. In Proceedings of the 29th International Conference on Computational Linguistics, pages 875–886, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
New or Old? Exploring How Pre-Trained Language Models Represent Discourse Entities (Loáiciga et al., COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.73.pdf
Code
 clp-research/new-old-discourse-entities