Better Language Model with Hypernym Class Prediction

He Bai, Tong Wang, Alessandro Sordoni, Peng Shi


Abstract
Class-based language models (LMs) have been long devised to address context sparsity in n-gram LMs. In this study, we revisit this approach in the context of neural LMs. We hypothesize that class-based prediction leads to an implicit context aggregation for similar words and thus can improve generalization for rare words. We map words that have a common WordNet hypernym to the same class and train large neural LMs by gradually annealing from predicting the class to token prediction during training. Empirically, this curriculum learning strategy consistently improves perplexity over various large, highly-performant state-of-the-art Transformer-based models on two datasets, WikiText-103 and ARXIV. Our analysis shows that the performance improvement is achieved without sacrificing performance on rare words. Finally, we document other attempts that failed to yield empirical gains, and discuss future directions for the adoption of class-based LMs on a larger scale.
Anthology ID:
2022.acl-long.96
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1352–1362
Language:
URL:
https://aclanthology.org/2022.acl-long.96
DOI:
10.18653/v1/2022.acl-long.96
Bibkey:
Cite (ACL):
He Bai, Tong Wang, Alessandro Sordoni, and Peng Shi. 2022. Better Language Model with Hypernym Class Prediction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1352–1362, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Better Language Model with Hypernym Class Prediction (Bai et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-long.96.pdf
Software:
 2022.acl-long.96.software.zip
Code
 richardbaihe/robustlm
Data
WikiText-103WikiText-2