Bin Liu
2024
Llama SLayer 8B: Shallow Layers Hold the Key to Knowledge Injection
Tianxiang Chen
|
Zhentao Tan
|
Tao Gong
|
Yue Wu
|
Qi Chu
|
Bin Liu
|
Jieping Ye
|
Nenghai Yu
Findings of the Association for Computational Linguistics: EMNLP 2024
As a manner to augment pretrained large language models (LLM), knowledge injection is critical to develop vertical domain large models and has been widely studied. While most current approaches, including parameter-efficient fine-tuning (PEFT) and block expansion methods, uniformly apply knowledge across all LLM layers, it raises the question: are all layers equally crucial for knowledge injection? We embark upon evaluating the importance of each layer to locate the optimal layer range for knowledge injection. Intuitively, more important layers should play more critical roles in knowledge injection and deserve denser injection. We observe performance dips in question-answering benchmarks after the removal or expansion of the shallow layers, and the degradation shrinks as the layer gets deeper, indicating that the shallow layers hold the key to knowledge injection. This insight leads us to propose the S strategy, a post-pretraining strategy of selectively enhancing shallow layers while pruning the less effective deep ones. Based on this strategy, we introduce Llama Slayer 8B. We experimented on the corpus of code & math and demonstrated the effectiveness of our strategy. Further experiments across different LLM, Mistral-7B, and a legal corpus confirmed the approach’s general applicability, underscoring its wide-ranging efficacy.
2020
Learning distributed sentence vectors with bi-directional 3D convolutions
Bin Liu
|
Liang Wang
|
Guosheng Yin
Proceedings of the 28th International Conference on Computational Linguistics
We propose to learn distributed sentence representation using text’s visual features as input. Different from the existing methods that render the words or characters of a sentence into images separately, we further fold these images into a 3-dimensional sentence tensor. Then, multiple 3-dimensional convolutions with different lengths (the third dimension) are applied to the sentence tensor, which act as bi-gram, tri-gram, quad-gram, and even five-gram detectors jointly. Similar to the Bi-LSTM, these n-gram detectors learn both forward and backward distributional semantic knowledge from the sentence tensor. That is, the proposed model using bi-directional convolutions to learn text embedding according to the semantic order of words. The feature maps from the two directions are concatenated for final sentence embedding learning. Our model involves only a single-layer of convolution which makes it easy and fast to train. Finally, we evaluate the sentence embeddings on several downstream Natural Language Processing (NLP) tasks, which demonstrate a surprisingly excellent performance of the proposed model.
2014
Cross-lingual Opinion Analysis via Negative Transfer Detection
Lin Gui
|
Ruifeng Xu
|
Qin Lu
|
Jun Xu
|
Jian Xu
|
Bin Liu
|
Xiaolong Wang
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Search
Co-authors
- Liang Wang 1
- Guosheng Yin 1
- Tianxiang Chen 1
- Zhentao Tan 1
- Tao Gong 1
- show all...