Speakers Fill Lexical Semantic Gaps with Context

Tiago Pimentel, Rowan Hall Maudslay, Damian Blasi, Ryan Cotterell


Abstract
Lexical ambiguity is widespread in language, allowing for the reuse of economical word forms and therefore making language more efficient. If ambiguous words cannot be disambiguated from context, however, this gain in efficiency might make language less clear—resulting in frequent miscommunication. For a language to be clear and efficiently encoded, we posit that the lexical ambiguity of a word type should correlate with how much information context provides about it, on average. To investigate whether this is the case, we operationalise the lexical ambiguity of a word as the entropy of meanings it can take, and provide two ways to estimate this—one which requires human annotation (using WordNet), and one which does not (using BERT), making it readily applicable to a large number of languages. We validate these measures by showing that, on six high-resource languages, there are significant Pearson correlations between our BERT-based estimate of ambiguity and the number of synonyms a word has in WordNet (e.g. 𝜌 = 0.40 in English). We then test our main hypothesis—that a word’s lexical ambiguity should negatively correlate with its contextual uncertainty—and find significant correlations on all 18 typologically diverse languages we analyse. This suggests that, in the presence of ambiguity, speakers compensate by making contexts more informative.
Anthology ID:
2020.emnlp-main.328
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Editors:
Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4004–4015
Language:
URL:
https://aclanthology.org/2020.emnlp-main.328
DOI:
10.18653/v1/2020.emnlp-main.328
Bibkey:
Cite (ACL):
Tiago Pimentel, Rowan Hall Maudslay, Damian Blasi, and Ryan Cotterell. 2020. Speakers Fill Lexical Semantic Gaps with Context. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4004–4015, Online. Association for Computational Linguistics.
Cite (Informal):
Speakers Fill Lexical Semantic Gaps with Context (Pimentel et al., EMNLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.emnlp-main.328.pdf
Video:
 https://slideslive.com/38938746
Code
 tpimentelms/lexical-ambiguity-in-context