Woojin Chung
Also published as: WooJin Chung
2024
Stable Language Model Pre-training by Reducing Embedding Variability
Woojin Chung
|
Jiwoo Hong
|
Na Min An
|
James Thorne
|
Se-Young Yun
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Stable pre-training is essential for achieving better-performing language models. However, tracking pre-training stability is impractical due to high computational costs. We study Token Embedding Variability as a simple proxy to estimate pre-training stability. We theoretically and empirically demonstrate that Multi-head Low-Rank Attention acts as a fundamental approach to reducing instability. This is supported by empirical findings on variants on GPT-2, demonstrating improved stability and lower perplexities, even at deeper layer counts.
2018
The Lifted Matrix-Space Model for Semantic Composition
WooJin Chung
|
Sheng-Fu Wang
|
Samuel Bowman
Proceedings of the 22nd Conference on Computational Natural Language Learning
Tree-structured neural network architectures for sentence encoding draw inspiration from the approach to semantic composition generally seen in formal linguistics, and have shown empirical improvements over comparable sequence models by doing so. Moreover, adding multiplicative interaction terms to the composition functions in these models can yield significant further improvements. However, existing compositional approaches that adopt such a powerful composition function scale poorly, with parameter counts exploding as model dimension or vocabulary size grows. We introduce the Lifted Matrix-Space model, which uses a global transformation to map vector word embeddings to matrices, which can then be composed via an operation based on matrix-matrix multiplication. Its composition function effectively transmits a larger number of activations across layers with relatively few model parameters. We evaluate our model on the Stanford NLI corpus, the Multi-Genre NLI corpus, and the Stanford Sentiment Treebank and find that it consistently outperforms TreeLSTM (Tai et al., 2015), the previous best known composition function for tree-structured models.
Search
Co-authors
- Jiwoo Hong 1
- Na Min An 1
- James Thorne 1
- Se-Young Yun 1
- Sheng-Fu Wang 1
- show all...