On the Role of Corpus Ordering in Language Modeling

Ameeta Agrawal, Suresh Singh, Lauren Schneider, Michael Samuels


Abstract
Language models pretrained on vast corpora of unstructured text using self-supervised learning framework are used in numerous natural language understanding and generation tasks. Many studies show that language acquisition in humans follows a rather structured simple-to-complex pattern and guided by this intuition, curriculum learning, which enables training of computational models in a meaningful order, such as processing easy samples before hard ones, has been shown to potentially reduce training time. The question remains whether curriculum learning can benefit pretraining of language models. In this work, we perform comprehensive experiments involving multiple curricula strategies varying the criteria for complexity and the training schedules. Empirical results of training transformer language models on English corpus and evaluating it intrinsically as well as after fine-tuning across eight tasks from the GLUE benchmark, show consistent improvement gains over conventional vanilla training. Interestingly, in our experiments, when evaluated on one epoch, the best model following a document-level hard-to-easy curriculum, outperforms the vanilla model by 1.7 points (average GLUE score) and it takes the vanilla model twice as many training steps to reach comparable performance.
Anthology ID:
2021.sustainlp-1.15
Volume:
Proceedings of the Second Workshop on Simple and Efficient Natural Language Processing
Month:
November
Year:
2021
Address:
Virtual
Venues:
EMNLP | sustainlp
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
142–154
Language:
URL:
https://aclanthology.org/2021.sustainlp-1.15
DOI:
10.18653/v1/2021.sustainlp-1.15
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2021.sustainlp-1.15.pdf
Data
GLUEQNLIWikiText-103