Xiaowei Yuan
2024
On the In-context Generation of Language Models
Zhongtao Jiang
|
Yuanzhe Zhang
|
Kun Luo
|
Xiaowei Yuan
|
Jun Zhao
|
Kang Liu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) are found to have the ability of in-context generation (ICG): when they are fed with an in-context prompt concatenating a few somehow similar examples, they can implicitly recognize the pattern of them and then complete the prompt in the same pattern. ICG is curious, since language models are usually not explicitly trained in the same way as the in-context prompt, and the distribution of examples in the prompt differs from that of sequences in the pretrained corpora. This paper provides a systematic study of the ICG ability of language models, covering discussions about its source and influential factors, in the view of both theory and empirical experiments. Concretely, we first propose a plausible latent variable model to model the distribution of the pretrained corpora, and then formalize ICG as a problem of next topic prediction. With this framework, we can prove that the repetition nature of a few topics ensures the ICG ability on them theoretically. Then, we use this controllable pretrained distribution to generate several medium-scale synthetic datasets (token scale: 2.1B-3.9B) and experiment with different settings of Transformer architectures (parameter scale: 4M-234M). Our experimental results further offer insights into how the data and model architectures influence ICG.
Improving Zero-shot LLM Re-Ranker with Risk Minimization
Xiaowei Yuan
|
Zhao Yang
|
Yequan Wang
|
Jun Zhao
|
Kang Liu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Discerning and Resolving Knowledge Conflicts through Adaptive Decoding with Contextual Information-Entropy Constraint
Xiaowei Yuan
|
Zhao Yang
|
Yequan Wang
|
Shengping Liu
|
Jun Zhao
|
Kang Liu
Findings of the Association for Computational Linguistics: ACL 2024
Large language models (LLMs) internalize enormous parametric knowledge during pre-training. Concurrently, realistic applications necessitate external contextual knowledge to aid models on the underlying tasks. This raises a crucial dilemma known as knowledge conflicts, where the contextual knowledge clashes with the parametric knowledge. However, existing decoding works are specialized in resolving knowledge conflicts and could inadvertently deteriorate performance in absence of conflicts. In this paper, we propose an adaptive decoding method, termed as contextual information-entropy constraint decoding (COIECD), to discern whether the knowledge conflicts occur and resolve them. It can improve the model’s faithfulness to conflicting context, and simultaneously maintain high performance among non-conflicting context. Our experiments show that COIECD exhibits strong performance and robustness over knowledge conflicts in realistic datasets.
Search
Co-authors
- Jun Zhao 3
- Kang Liu 3
- Zhao Yang 2
- Yequan Wang 2
- Zhongtao Jiang 1
- show all...