Yizhe Yang


2024

pdf bib
Speaker Verification in Agent-generated Conversations
Yizhe Yang | Palakorn Achananuparp | Heyan Huang | Jing Jiang | Ee-Peng Lim
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The recent success of large language models (LLMs) has attracted widespread interest to develop role-playing conversational agents personalized to the characteristics and styles of different speakers to enhance their abilities to perform both general and special purpose dialogue tasks. However, the ability to personalize the generated utterances to speakers, whether conducted by human or LLM, has not been well studied. To bridge this gap, our study introduces a novel evaluation challenge: speaker verification in agent-generated conversations, which aimed to verify whether two sets of utterances originate from the same speaker. To this end, we assemble a large dataset collection encompassing thousands of speakers and their utterances. We also develop and evaluate speaker verification models under experiment setups. We further utilize the speaker verification models to evaluate the personalization abilities of LLM-based role-playing models. Comprehensive experiments suggest that the current role-playing models fail in accurately mimicking speakers, primarily due to their inherent linguistic characteristics.

pdf bib
Fundamental Capabilities of Large Language Models and their Applications in Domain Scenarios: A Survey
Jiawei Li | Yizhe Yang | Yu Bai | Xiaofeng Zhou | Yinghao Li | Huashan Sun | Yuhang Liu | Xingpeng Si | Yuhao Ye | Yixiao Wu | 林一冠 林一冠 | Bin Xu | Ren Bowen | Chong Feng | Yang Gao | Heyan Huang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large Language Models (LLMs) demonstrate significant value in domain-specific applications, benefiting from their fundamental capabilities. Nevertheless, it is still unclear which fundamental capabilities contribute to success in specific domains. Moreover, the existing benchmark-based evaluation cannot effectively reflect the performance of real-world applications. In this survey, we review recent advances of LLMs in domain applications, aiming to summarize the fundamental capabilities and their collaboration. Furthermore, we establish connections between fundamental capabilities and specific domains, evaluating the varying importance of different capabilities. Based on our findings, we propose a reliable strategy for domains to choose more robust backbone LLMs for real-world applications.

pdf bib
PSST: A Benchmark for Evaluation-driven Text Public-Speaking Style Transfer
Huashan Sun | Yixiao Wu | Yizhe Yang | Yinghao Li | Jiawei Li | Yuhao Ye | Yang Gao
Findings of the Association for Computational Linguistics: EMNLP 2024

Language style is necessary for AI systems to accurately understand and generate diverse human language. However, previous text style transfer primarily focused on sentence-level data-driven approaches, limiting exploration of potential problems in large language models (LLMs) and the ability to meet complex application needs. To overcome these limitations, we introduce a novel task called Public-Speaking Style Transfer (PSST), which aims to simulate humans to transform passage-level, official texts into a public-speaking style. Grounded in the analysis of real-world data from a linguistic perspective, we decompose public-speaking style into key sub-styles to pose challenges and quantify the style modeling capability of LLMs. For such intricate text style transfer, we further propose a fine-grained evaluation framework to analyze the characteristics and identify the problems of stylized texts. Comprehensive experiments suggest that current LLMs struggle to generate public speaking texts that align with human preferences, primarily due to excessive stylization and loss of semantic information. We will release our data, code, and model upon acceptance.

pdf bib
Bit_numeval at SemEval-2024 Task 7: Enhance Numerical Sensitivity and Reasoning Completeness for Quantitative Understanding
Xinyue Liang | Jiawei Li | Yizhe Yang | Yang Gao
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

In this paper, we describe the methods used for Quantitative Natural Language Inference (QNLI), and Quantitative Question Answering (QQA) in task1 of Semeval2024 NumEval. The challenge’s focus is to enhance the model’s quantitative understanding consequently improving its performance on certain tasks. We accomplish this task from two perspectives: (1) By integrating real-world numerical comparison data during the supervised fine-tuning (SFT) phase, we enhanced the model’s numerical sensitivity. (2) We develop an innovative reward model scoring mechanism, leveraging reinforcement learning from human feedback (RLHF) techniques to improve the model’s reasoning completeness.

2023

pdf bib
Graph vs. Sequence: An Empirical Study on Knowledge Forms for Knowledge-Grounded Dialogue
Yizhe Yang | Heyan Huang | Yuhang Liu | Yang Gao
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Knowledge-grounded dialogue is a task of gener- ating an informative response based on both the dialogue history and external knowledge source. In general, there are two forms of knowledge: manu- ally annotated knowledge graphs and knowledge text from website. From various evaluation viewpoints, each type of knowledge has advantages and downsides. To further distinguish the principles and determinants from the intricate factors, we conduct a thorough experiment and study on the task to answer three essential questions. The ques- tions involve the choice of appropriate knowledge form, the degree of mutual effects between knowl- edge and the model selection, and the few-shot performance of knowledge. Supported by statistical shreds of evidence, we offer conclusive solutions and sensible suggestions for directions and standards of future research.