Lingfeng Shen


2023

pdf bib
Sen2Pro: A Probabilistic Perspective to Sentence Embedding from Pre-trained Language Model
Lingfeng Shen | Haiyun Jiang | Lemao Liu | Shuming Shi
Proceedings of the 8th Workshop on Representation Learning for NLP (RepL4NLP 2023)

pdf bib
Flatness-Aware Prompt Selection Improves Accuracy and Sample Efficiency
Lingfeng Shen | Weiting Tan | Boyuan Zheng | Daniel Khashabi
Findings of the Association for Computational Linguistics: EMNLP 2023

With growing capabilities of large language models, prompting them has become the dominant way to access them. This has motivated the development of strategies for automatically selecting effective language prompts. In this paper, we introduce **pFlat** (prompt flatness), a new metric to quantify the expected utility of a language prompt. This metric is inspired by *flatness* regularization in statistical learning that quantifies the robustness of the model towards its parameter perturbations. We provide theoretical foundations for this metric and its relationship with other prompt selection metrics, providing a comprehensive understanding of existing methods. Empirically, we show that combining **pFlat** with existing metrics improves both performance and sample efficiency. Our metric outperforms the previous prompt selection metrics with an average increase of 10% in Pearson correlation across 6 classification benchmarks, and the prompt selected by our metric gains 5% higher accuracy than previous metrics across the benchmarks.

2022

pdf bib
On the Evaluation Metrics for Paraphrase Generation
Lingfeng Shen | Lemao Liu | Haiyun Jiang | Shuming Shi
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

In this paper we revisit automatic metrics for paraphrase evaluation and obtain two findings that disobey conventional wisdom: (1) Reference-free metrics achieve better performance than their reference-based counterparts. (2) Most commonly used metrics do not align well with human annotation. Underlying reasons behind the above findings are explored through additional experiments and in-depth analyses. Based on the experiments and analyses, we propose ParaScore, a new evaluation metric for paraphrase generation. It possesses the merits of reference-based and reference-free metrics and explicitly models lexical divergence. Based on our analysis and improvements, our proposed reference-based outperforms than reference-free metrics. Experimental results demonstrate that ParaScore significantly outperforms existing metrics.