Yifan Yang


2024

pdf bib
AdaZeta: Adaptive Zeroth-Order Tensor-Train Adaption for Memory-Efficient Large Language Models Fine-Tuning
Yifan Yang | Kai Zhen | Ershad Banijamali | Athanasios Mouchtaris | Zheng Zhang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Fine-tuning large language models (LLMs) has achieved remarkable performance across various natural language processing tasks, yet it demands more and more memory as model sizes keep growing. To address this issue, the recently proposed Memory-efficient Zeroth-order (MeZO) methods attempt to fine-tune LLMs using only forward passes, thereby avoiding the need for a backpropagation graph. However, significant performance drops and a high risk of divergence have limited their widespread adoption. In this paper, we propose the Adaptive Zeroth-order Tensor-Train Adaption (AdaZeta) framework, specifically designed to improve the performance and convergence of the ZO methods. To enhance dimension-dependent ZO estimation accuracy, we introduce a fast-forward, low-parameter tensorized adapter. To tackle the frequently observed divergence issue in large-scale ZO fine-tuning tasks, we propose an adaptive query number schedule that guarantees convergence. Detailed theoretical analysis and extensive experimental results on Roberta-Large and Llama-2-7B models substantiate the efficacy of our AdaZeta framework in terms of accuracy, memory efficiency, and convergence speed.

pdf bib
Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM Game
Pengyu Cheng | Yifan Yang | Jian Li | Yong Dai | Tianhao Hu | Peixin Cao | Nan Du | Xiaolong Li
Findings of the Association for Computational Linguistics: ACL 2024

Human preference alignment is essential to improve the interaction quality of large language models (LLMs). Existing alignment methods depend on manually annotated preference data to guide the LLM optimization directions. However, continuously updating LLMs for alignment raises a distribution gap between model-generated samples and human-annotated responses, hindering training effectiveness. To mitigate this issue, previous methods require additional preference annotation on newly generated samples to adapt to the shifted distribution, which consumes a large amount of annotation resources. Targeting more efficient human preference optimization, we propose an Adversarial Preference Optimization (APO) framework, in which the LLM and the reward model update alternatively via a min-max game. Through adversarial training, the reward model can adapt to the shifted generation distribution of the LLM without any additional annotation. With comprehensive experiments, we find the proposed adversarial training framework further enhances existing alignment baselines in terms of LLM helpfulness and harmlessness. The code is at https://github.com/Linear95/APO.

pdf bib
LoRASC: Expressive and Generalizable Low-rank Adaptation for Large Models via Slow Cascaded Learning
Siwei Li | Yifan Yang | Yifei Shen | Fangyun Wei | Zongqing Lu | Lili Qiu | Yuqing Yang
Findings of the Association for Computational Linguistics: EMNLP 2024

Efficient fine-tuning plays a fundamental role in modern large models, with low-rank adaptation emerging as a particularly promising approach. However, the existing variants of LoRA are hampered by limited expressiveness, a tendency to overfit, and sensitivity to hyperparameter settings. This paper presents LoRA Slow Cascade Learning (LoRASC), an innovative technique designed to enhance LoRA’s expressiveness and generalization capabilities while preserving its training efficiency. Our approach augments expressiveness through a cascaded learning strategy that enables a mixture-of-low-rank adaptation, thereby increasing the model’s ability to capture complex patterns. Additionally, we introduce a slow-fast update mechanism and cascading noisy tuning to bolster generalization. The extensive experiments on various language and vision datasets, as well as robustness benchmarks, demonstrate that the proposed method not only significantly outperforms existing baselines, but also mitigates overfitting, enhances model stability, and improves OOD robustness.

pdf bib
LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models
Yifan Yang | Jiajun Zhou | Ngai Wong | Zheng Zhang
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Various parameter-efficient fine-tuning (PEFT) techniques have been proposed to enable computationally efficient fine-tuning while maintaining model performance. However, existing PEFT methods are still limited by the growing number of trainable parameters with the rapid deployment of Large Language Models (LLMs). To address this challenge, we present LoRETTA, an ultra-parameter-efficient framework that significantly reduces trainable parameters through tensor-train decomposition. Specifically, we propose two methods, named LoRETTA_adp and LoRETTA_rep. The former employs tensorized adapters, offering a high-performance yet lightweight approach for the fine-tuning of LLMs. The latter emphasizes fine-tuning via weight reparameterization with a set of small tensor factors. LoRETTA achieves comparable or better performance than most widely used PEFT methods with up to 100× fewer parameters on the LLaMA-2-7B models. Furthermore, empirical results demonstrate that the proposed methods exhibit remarkable anti-overfitting capability, effectively improve training efficiency, and enjoy better multi-task learning performance. Plug-and-play loretta library built upon the Huggingface framework and PEFT library are provided.

2023

pdf bib
An Empirical Study of Sentiment-Enhanced Pre-Training for Aspect-Based Sentiment Analysis
Yice Zhang | Yifan Yang | Bin Liang | Shiwei Chen | Bing Qin | Ruifeng Xu
Findings of the Association for Computational Linguistics: ACL 2023

Aspect-Based Sentiment Analysis (ABSA) aims to recognize fine-grained opinions and sentiments of users, which is an important problem in sentiment analysis. Recent work has shown that Sentiment-enhanced Pre-Training (SPT) can substantially improve the performance of various ABSA tasks. However, there is currently a lack of comprehensive evaluation and fair comparison of existing SPT approaches. Therefore, this paper performs an empirical study to investigate the effectiveness of different SPT approaches. First, we develop an effective knowledge-mining method and leverage it to build a large-scale knowledge-annotated SPT corpus. Second, we systematically analyze the impact of integrating sentiment knowledge and other linguistic knowledge in pre-training. For each type of sentiment knowledge, we also examine and compare multiple integration methods. Finally, we conduct extensive experiments on a wide range of ABSA tasks to see how much SPT can facilitate the understanding of aspect-level sentiments.

pdf bib
Target-to-Source Augmentation for Aspect Sentiment Triplet Extraction
Yice Zhang | Yifan Yang | Meng Li | Bin Liang | Shiwei Chen | Ruifeng Xu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Aspect Sentiment Triplet Extraction (ASTE) is an important task in sentiment analysis, aiming to extract aspect-level opinions and sentiments from user-generated reviews. The fine-grained nature of ASTE incurs a high annotation cost, while the scarcity of annotated data limits the performance of existing methods. This paper exploits data augmentation to address this issue. Traditional augmentation methods typically modify the input sentences of existing samples via heuristic rules or language models, which have shown success in text classification tasks. However, applying these methods to fine-grained tasks like ASTE poses challenges in generating diverse augmented samples while maintaining alignment between modified sentences and origin labels. Therefore, this paper proposes a target-to-source augmentation approach for ASTE. Our approach focuses on learning a generator that can directly generate new sentences based on labels and syntactic templates. With this generator, we can generate a substantial number of diverse augmented samples by mixing labels and syntactic templates from different samples. Besides, to ensure the quality of the generated sentence, we introduce fluency and alignment discriminators to provide feedback on the generated sentence and then use this feedback to optimize the generator via a reinforcement learning framework. Experiments demonstrate that our approach significantly enhances the performance of existing ASTE models.

2022

pdf bib
Boundary-Driven Table-Filling for Aspect Sentiment Triplet Extraction
Yice Zhang | Yifan Yang | Yihui Li | Bin Liang | Shiwei Chen | Yixue Dang | Min Yang | Ruifeng Xu
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Aspect Sentiment Triplet Extraction (ASTE) aims to extract the aspect terms along with the corresponding opinion terms and the expressed sentiments in the review, which is an important task in sentiment analysis. Previous research efforts generally address the ASTE task in an end-to-end fashion through the table-filling formalization, in which the triplets are represented by a two-dimensional (2D) table of word-pair relations. Under this formalization, a term-level relation is decomposed into multiple independent word-level relations, which leads to relation inconsistency and boundary insensitivity in the face of multi-word aspect terms and opinion terms. To overcome these issues, we propose Boundary-Driven Table-Filling (BDTF), which represents each triplet as a relation region in the 2D table and transforms the ASTE task into detection and classification of relation regions. We also notice that the quality of the table representation greatly affects the performance of BDTF. Therefore, we develop an effective relation representation learning approach to learn the table representation, which can fully exploit both word-to-word interactions and relation-to-relation interactions. Experiments on several public benchmarks show that the proposed approach achieves state-of-the-art performances.

pdf bib
HITSZ-HLT at SemEval-2022 Task 10: A Span-Relation Extraction Framework for Structured Sentiment Analysis
Yihui Li | Yifan Yang | Yice Zhang | Ruifeng Xu
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper describes our system that participated in the SemEval-2022 Task 10: Structured Sentiment Analysis, which aims to extract opinion tuples from texts.A full opinion tuple generally contains an opinion holder, an opinion target, the sentiment expression, and the corresponding polarity. The complex structure of the opinion tuple makes the task challenging. To address this task, we formalize it as a span-relation extraction problem and propose a two-stage extraction framework accordingly. In the first stage, we employ the span module to enumerate spans and then recognize the type of every span. In the second stage, we employ the relation module to determine the relation between spans. Our system achieves competitive results and ranks among the top-10 systems in almost subtasks.

2021

pdf bib
PRGC: Potential Relation and Global Correspondence Based Joint Relational Triple Extraction
Hengyi Zheng | Rui Wen | Xi Chen | Yifan Yang | Yunyan Zhang | Ziheng Zhang | Ningyu Zhang | Bin Qin | Xu Ming | Yefeng Zheng
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Joint extraction of entities and relations from unstructured texts is a crucial task in information extraction. Recent methods achieve considerable performance but still suffer from some inherent limitations, such as redundancy of relation prediction, poor generalization of span-based extraction and inefficiency. In this paper, we decompose this task into three subtasks, Relation Judgement, Entity Extraction and Subject-object Alignment from a novel perspective and then propose a joint relational triple extraction framework based on Potential Relation and Global Correspondence (PRGC). Specifically, we design a component to predict potential relations, which constrains the following entity extraction to the predicted relation subset rather than all relations; then a relation-specific sequence tagging component is applied to handle the overlapping problem between subjects and objects; finally, a global correspondence component is designed to align the subject and object into a triple with low-complexity. Extensive experiments show that PRGC achieves state-of-the-art performance on public benchmarks with higher efficiency and delivers consistent performance gain on complex scenarios of overlapping triples. The source code has been submitted as the supplementary material and will be made publicly available after the blind review.