Beiming Yu


2025

pdf bib
MoKA:Parameter Efficiency Fine-Tuning via Mixture of Kronecker Product Adaption
Beiming Yu | Zhenfei Yang | Xiushuang Yi
Proceedings of the 31st International Conference on Computational Linguistics

With the rapid development of large language models (LLMs), traditional full-parameter fine-tuning methods have become increasingly expensive in terms of computational resources and time costs. For this reason, parameter efficient fine-tuning (PEFT) methods have emerged. Among them, Low-Rank Adaptation (LoRA) is one of the current popular PEFT methods, which is widely used in large language models. However, the low-rank update mechanism of LoRA somewhat limits its ability to approximate full-parameter fine-tuning during the training process. In this paper, we propose a novel PEFT framework, MoKA (Mixture of Kronecker Product Adaptation), which combines the Kronecker product with the Mixture-of-Experts (MoE) method. By replacing the low-rank decomposition of the weight update matrix with Kronecker products and utilizing a sparse MoE architecture, MoKA achieves parameter efficiency and better model performance. Additionally, we design an efficient routing module to further compress the parameter size. We conduct extensive experiments on the GLUE benchmark, E2E NLG Challenge, and instruction tuning tasks for LLMs. The results demonstrate that MoKA outperforms existing PEFT methods.

2024

pdf bib
BERT-BC: A Unified Alignment and Interaction Model over Hierarchical BERT for Response Selection
Zhenfei Yang | Beiming Yu | Yuan Cui | Shi Feng | Daling Wang | Yifei Zhang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Recently, we have witnessed a significant performance boosting for dialogue response selection task achieved by Cross-Encoder based models. However, such models directly feed the concatenation of context and response into the pre-trained model for interactive inference, ignoring the comprehensively independent representation modeling of context and response. Moreover, randomly sampling negative responses from other dialogue contexts is simplistic, and the learned models have poor generalization capability in realistic scenarios. In this paper, we propose a response selection model called BERT-BC that combines the representation-based Bi-Encoder and interaction-based Cross-Encoder. Three contrastive learning methods are devised for the Bi-Encoder to align context and response to obtain the better semantic representation. Meanwhile, according to the alignment difficulty of context and response semantics, the harder samples are dynamically selected from the same batch with negligible cost and sent to Cross-Encoder to enhance the model’s interactive reasoning ability. Experimental results show that BERT-BC can achieve state-of-the-art performance on three benchmark datasets for multi-turn response selection.