Huashan Sun


2024

pdf bib
PSST: A Benchmark for Evaluation-driven Text Public-Speaking Style Transfer
Huashan Sun | Yixiao Wu | Yizhe Yang | Yinghao Li | Jiawei Li | Yuhao Ye | Yang Gao
Findings of the Association for Computational Linguistics: EMNLP 2024

Language style is necessary for AI systems to accurately understand and generate diverse human language. However, previous text style transfer primarily focused on sentence-level data-driven approaches, limiting exploration of potential problems in large language models (LLMs) and the ability to meet complex application needs. To overcome these limitations, we introduce a novel task called Public-Speaking Style Transfer (PSST), which aims to simulate humans to transform passage-level, official texts into a public-speaking style. Grounded in the analysis of real-world data from a linguistic perspective, we decompose public-speaking style into key sub-styles to pose challenges and quantify the style modeling capability of LLMs. For such intricate text style transfer, we further propose a fine-grained evaluation framework to analyze the characteristics and identify the problems of stylized texts. Comprehensive experiments suggest that current LLMs struggle to generate public speaking texts that align with human preferences, primarily due to excessive stylization and loss of semantic information. We will release our data, code, and model upon acceptance.

pdf bib
How Far Can In-Context Alignment Go? Exploring the State of In-Context Alignment
Heyan Huang | Yinghao Li | Huashan Sun | Yu Bai | Yang Gao
Findings of the Association for Computational Linguistics: EMNLP 2024

Recent studies have demonstrated that In-Context Learning (ICL), through the use of specific demonstrations, can align Large Language Models (LLMs) with human preferences known as In-Context Alignment (ICA), indicating that models can comprehend human instructions without requiring parameter adjustments. However, the exploration of the mechanism and applicability of ICA remains limited. In this paper, we begin by dividing the context text used in ICA into three categories: format, system prompt, and example. Through ablation experiments, we investigate the effectiveness of each part in enabling ICA to function effectively. We then examine how variants in these parts impact the model’s alignment performance. Our findings indicate that the example part is crucial for enhancing the model’s alignment capabilities, with changes in examples significantly affecting alignment performance. We also conduct a comprehensive evaluation of ICA’s zero-shot capabilities in various alignment tasks. The results indicate that compared to parameter fine-tuning methods, ICA demonstrates superior performance in knowledge-based tasks and tool-use tasks. However, it still exhibits certain limitations in areas such as multi-turn dialogues and instruction following. Source codes and scripts are available at https://github.com/li-aolong/how-far-can-ica-go.

pdf bib
Fundamental Capabilities of Large Language Models and their Applications in Domain Scenarios: A Survey
Jiawei Li | Yizhe Yang | Yu Bai | Xiaofeng Zhou | Yinghao Li | Huashan Sun | Yuhang Liu | Xingpeng Si | Yuhao Ye | Yixiao Wu | 林一冠 林一冠 | Bin Xu | Ren Bowen | Chong Feng | Yang Gao | Heyan Huang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large Language Models (LLMs) demonstrate significant value in domain-specific applications, benefiting from their fundamental capabilities. Nevertheless, it is still unclear which fundamental capabilities contribute to success in specific domains. Moreover, the existing benchmark-based evaluation cannot effectively reflect the performance of real-world applications. In this survey, we review recent advances of LLMs in domain applications, aiming to summarize the fundamental capabilities and their collaboration. Furthermore, we establish connections between fundamental capabilities and specific domains, evaluating the varying importance of different capabilities. Based on our findings, we propose a reliable strategy for domains to choose more robust backbone LLMs for real-world applications.