Lu Yaojie
2024
Pattern Shifting or Knowledge Losing? A Forgetting Perspective for Understanding the Effect of Instruction Fine-Tuning
Zhang Chunkang
|
Cao Boxi
|
Lu Yaojie
|
Lin Hongyu
|
Cao Liu
|
Zeng Ke
|
Wan Guanglu
|
Cai Xunliang
|
Han Xianpei
|
Sun Le
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
“Instruction Fine-Tuning(IFT) emerges as an essential step of training large language models torobustly carry out tasks of interest. However, there lacks a systematic investigation about theunderlying mechanisms of instruction fine-tuning, particularly on the forgetting phenomenonafter IFT, known as alignment tax. Therefore, to understand the mechanism of IFT from theforgetting perspective, we investigate the alternation of the text pattern and knowledge withinmodels throughout the entire IFT process. Specifically, we restore fine-tuned models to their baseversion by training them on the data sharing a similar distribution with the pre-training corpusand compare their results Our experiment indicates that there is a stage transition of forgettingduring IFT process: (1) Pseudo Forgetting: in this stage, models mainly shift their familiar textpattern away from pre-training data format while the world knowledge is preserved. Consequently,models will recover to their original performance when they are restored to the base version. (2)Actual Forgetting: in this stage, models forget the acquired knowledge as well. Therefore, theyfail to reach the original performance even if they are restored to the base version.”
2023
Document Information Extraction via Global Tagging
He Shaojie
|
Wang Tianshu
|
Lu Yaojie
|
Lin Hongyu
|
Han Xianpei
|
Sun Yingfei
|
Sun Le
Proceedings of the 22nd Chinese National Conference on Computational Linguistics
“Document Information Extraction (DIE) is a crucial task for extracting key information fromvisually-rich documents. The typical pipeline approach for this task involves Optical Charac-ter Recognition (OCR), serializer, Semantic Entity Recognition (SER), and Relation Extraction(RE) modules. However, this pipeline presents significant challenges in real-world scenariosdue to issues such as unnatural text order and error propagation between different modules. Toaddress these challenges, we propose a novel tagging-based method – Global TaggeR (GTR),which converts the original sequence labeling task into a token relation classification task. Thisapproach globally links discontinuous semantic entities in complex layouts, and jointly extractsentities and relations from documents. In addition, we design a joint training loss and a jointdecoding strategy for SER and RE tasks based on GTR. Our experiments on multiple datasetsdemonstrate that GTR not only mitigates the issue of text in the wrong order but also improvesRE performance. Introduction”
Search
Fix data
Co-authors
- Lin Hongyu 2
- Sun Le 2
- Han Xianpei 2
- Cao Boxi 1
- Zhang Chunkang 1
- show all...
Venues
- ccl2