Lexical Generalization Improves with Larger Models and Longer Training

Elron Bandel, Yoav Goldberg, Yanai Elazar


Abstract
While fine-tuned language models perform well on many language tasks, they were also shown to rely on superficial surface features such as lexical overlap. Excessive utilization of such heuristics can lead to failure on challenging inputs. We analyze the use of lexical overlap heuristics in natural language inference, paraphrase detection, and reading comprehension (using a novel contrastive dataset),and find that larger models are much less susceptible to adopting lexical overlap heuristics. We also find that longer training leads models to abandon lexical overlap heuristics. Finally, We provide evidence that the disparity between models size has its source in the pre-trained model.
Anthology ID:
2022.findings-emnlp.323
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2022
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4398–4410
Language:
URL:
https://aclanthology.org/2022.findings-emnlp.323
DOI:
10.18653/v1/2022.findings-emnlp.323
Bibkey:
Cite (ACL):
Elron Bandel, Yoav Goldberg, and Yanai Elazar. 2022. Lexical Generalization Improves with Larger Models and Longer Training. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 4398–4410, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Lexical Generalization Improves with Larger Models and Longer Training (Bandel et al., Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-emnlp.323.pdf
Software:
 2022.findings-emnlp.323.software.zip
Dataset:
 2022.findings-emnlp.323.dataset.zip