2024
pdf
bib
abs
Can Large Language Models Always Solve Easy Problems if They Can Solve Harder Ones?
Zhe Yang
|
Yichang Zhang
|
Tianyu Liu
|
Jian Yang
|
Junyang Lin
|
Chang Zhou
|
Zhifang Sui
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) have demonstrated impressive capabilities, but still suffer from inconsistency issues (e.g. LLMs can react differently to disturbances like rephrasing or inconsequential order change). In addition to these inconsistencies, we also observe that LLMs, while capable of solving hard problems, can paradoxically fail at easier ones. To evaluate this hard-to-easy inconsistency, we develop the ConsisEval benchmark, where each entry comprises a pair of questions with a strict order of difficulty. Furthermore, we introduce the concept of consistency score to quantitatively measure this inconsistency and analyze the potential for improvement in consistency by relative consistency score. Based on comprehensive experiments across a variety of existing models, we find: (1) GPT-4 achieves the highest consistency score of 92.2% but is still inconsistent to specific questions due to distraction by redundant information, misinterpretation of questions, etc.; (2) models with stronger capabilities typically exhibit higher consistency, but exceptions also exist; (3) hard data enhances consistency for both fine-tuning and in-context learning. Our data and code will be publicly available on GitHub.
pdf
bib
abs
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding
Heming Xia
|
Zhe Yang
|
Qingxiu Dong
|
Peiyi Wang
|
Yongqi Li
|
Tao Ge
|
Tianyu Liu
|
Wenjie Li
|
Zhifang Sui
Findings of the Association for Computational Linguistics: ACL 2024
To mitigate the high inference latency stemming from autoregressive decoding in Large Language Models (LLMs), Speculative Decoding has emerged as a novel decoding paradigm for LLM inference. In each decoding step, this method first drafts several future tokens efficiently and then verifies them in parallel. Unlike autoregressive decoding, Speculative Decoding facilitates the simultaneous decoding of multiple tokens per step, thereby accelerating inference. This paper presents a comprehensive overview and analysis of this promising decoding paradigm. We begin by providing a formal definition and formulation of Speculative Decoding. Then, we organize in-depth discussions on its key facets, such as drafter selection and verification strategies. Furthermore, we present a comparative analysis of leading methods under third-party testing environments. We aim for this work to serve as a catalyst for further research on Speculative Decoding, ultimately contributing to more efficient LLM inference.
pdf
bib
abs
LLM as a metric critic for low resource relation identification
Zhe Yang
|
Yi Huang
|
Yaqin Chen
|
Xiaoting Wu
|
Junlan Feng
|
Chao Deng
Findings of the Association for Computational Linguistics: EMNLP 2024
In extremely low resource relation identification scenario, small language models (SLMs) incline to overfit, which significantly diminishes their accuracy. Recently, large language models (LLMs) are gradually applied to classification tasks with converting original objective into the generation task via in-context learning. However, abundance of the classifier categories poses challenges in selecting demonstrations. Moreover, the mapping between category labels and textual descriptions requires expensive expert knowledge, thereby constraining the efficacy of in-context learning for LLMs. We uphold that SLM is optimal for handling classification tasks, and its shortcomings in the low resource setting can be mitigated by leveraging LLM. Hence, we propose a co-evolution strategy on SLM & LLM for relation identification. Specifically, LLM provides essential background knowledge to assist training process of the SLM classifier, while evaluation metrics from the classifier, in turn, offer valuable insights to refine the generation prompts of the LLM. We conduct experiments on several datasets which demonstrates preponderance of the proposed model.
2023
pdf
bib
abs
Learning to Leverage High-Order Medical Knowledge Graph for Joint Entity and Relation Extraction
Zhe Yang
|
Yi Huang
|
Junlan Feng
Findings of the Association for Computational Linguistics: ACL 2023
Automatic medical entity and relation extraction is essential for daily electronic medical record (EMR) analysis, and has attracted a lot of academic attention. Tremendous progress has been made in recent years. However, medical terms are difficult to understand, and their relations are more complicated than general ones. Based on this situation, domain knowledge gives better background and contexts for medical terms. Despite the benefits of medical domain knowledge, the utilization way of it for joint entity and relation extraction is inadequate. To foster this line of research, in this work, we propose to leverage the medical knowledge graph for extracting entities and relations for Chinese Medical Texts in a collective way. Specifically, we propose to construct a high-order heterogeneous graph based on medical knowledge graph, which is linked to the entity mentions in the text. In this way, neighbors from the high-order heterogeneous graph can pass the message to each other for better global context representations. Our experiments on real Chinese Medical Texts show that our method is more effective than state-of-the-art methods.
pdf
bib
abs
Not All Demonstration Examples are Equally Beneficial: Reweighting Demonstration Examples for In-Context Learning
Zhe Yang
|
Damai Dai
|
Peiyi Wang
|
Zhifang Sui
Findings of the Association for Computational Linguistics: EMNLP 2023
Large Language Models (LLMs) have recently gained the In-Context Learning (ICL) ability with the models scaling up, allowing them to quickly adapt to downstream tasks with only a few demonstration examples prepended in the input sequence. Nonetheless, the current practice of ICL treats all demonstration examples equally, which still warrants improvement, as the quality of examples is usually uneven. In this paper, we investigate how to determine approximately optimal weights for demonstration examples and how to apply them during ICL. To assess the quality of weights in the absence of additional validation data, we design a masked self-prediction (MSP) score that exhibits a strong correlation with the final ICL performance. To expedite the weight-searching process, we discretize the continuous weight space and adopt beam search. With approximately optimal weights obtained, we further propose two strategies to apply them to demonstrations at different model positions. Experimental results on 8 text classification tasks show that our approach outperforms conventional ICL by a large margin. Our code are publicly available at https:github.com/Zhe-Young/WICL.
2022
pdf
bib
abs
Low-resource Neural Machine Translation with Cross-modal Alignment
Zhe Yang
|
Qingkai Fang
|
Yang Feng
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
How to achieve neural machine translation with limited parallel data? Existing techniques often rely on large-scale monolingual corpus, which is impractical for some low-resource languages. In this paper, we turn to connect several low-resource languages to a particular high-resource one by additional visual modality. Specifically, we propose a cross-modal contrastive learning method to learn a shared space for all languages, where both a coarse-grained sentence-level objective and a fine-grained token-level one are introduced. Experimental results and further analysis show that our method can effectively learn the cross-modal and cross-lingual alignment with a small amount of image-text pairs, and achieves significant improvements over the text-only baseline under both zero-shot and few-shot scenarios.
2020
pdf
bib
abs
A Joint Model for Aspect-Category Sentiment Analysis with Shared Sentiment Prediction Layer
Yuncong Li
|
Zhe Yang
|
Cunxiang Yin
|
Xu Pan
|
Lunan Cui
|
Qiang Huang
|
Ting Wei
Proceedings of the 19th Chinese National Conference on Computational Linguistics
Aspect-category sentiment analysis (ACSA) aims to predict the aspect categories mentioned in texts and their corresponding sentiment polarities. Some joint models have been proposed to address this task. Given a text, these joint models detect all the aspect categories mentioned in the text and predict the sentiment polarities toward them at once. Although these joint models obtain promising performances, they train separate parameters for each aspect category and therefore suffer from data deficiency of some aspect categories. To solve this problem, we propose a novel joint model which contains a shared sentiment prediction layer. The shared sentiment prediction layer transfers sentiment knowledge between aspect categories and alleviates the problem caused by data deficiency. Experiments conducted on SemEval-2016 Datasets demonstrate the effectiveness of our model.