Hongkun Hao


2024

pdf bib
CLEANEVAL: Clean Evaluation on Contaminated Large Language Models
Wenhong Zhu | Hongkun Hao | Zhiwei He | Yun-Ze Song | Jiao Yueyang | Yumeng Zhang | Hanxu Hu | Yiran Wei | Rui Wang | Hongyuan Lu
Findings of the Association for Computational Linguistics: NAACL 2024

We are currently in an era of fierce competition among various large language models (LLMs), continuously pushing the boundaries of benchmark performance. However, genuinely assessing the capabilities of these LLMs has become a challenging and critical issue due to potential data contamination. In this paper, we propose a novel and valuable method, Clean-Eval, which mitigates the issue of data contamination and evaluates the LLMs more cleanly. Clean-Eval employs a neural-based model to paraphrase and back-translate the contaminated data into a candidate set, generating expressions with the same meaning but in different surface forms. A semantic detector is then used to filter those generated low-quality samples to narrow down this candidate set. Candidates with moderate BLEURT scores against the original samples are selected as the final evaluation set. According to human assessment, this set is almost semantically equivalent to the original contamination set but expressed differently. We conduct experiments on 20 existing benchmarks across diverse tasks, and results demonstrate that Clean-Eval substantially restores the actual evaluation results on contaminated LLMs under both few-shot learning and fine-tuning scenarios.

2023

pdf bib
Penalty Decoding: Well Suppress the Self-Reinforcement Effect in Open-Ended Text Generation
Wenhong Zhu | Hongkun Hao | Rui Wang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

The decoding algorithm is critical for open-ended text generation, transforming latent representations into coherent and meaningful outputs. This paper investigates the self-reinforcement effect in text generation and the effectiveness of a repetition penalty to mitigate it. However, determining the optimal repetition penalty value is challenging. To tackle this, we propose a forgetting mechanism that disregards distant tokens, reducing the burden of penalty selection. In addition, we introduce a length penalty to address overly short sentences caused by excessive penalties. Our penalty decoding approach incorporating three strategies helps resolve issues with sampling methods deviating from factual information. Experimental results demonstrate the efficacy of our approach in generating high-quality sentences resembling human output.

pdf bib
Rethinking Translation Memory Augmented Neural Machine Translation
Hongkun Hao | Guoping Huang | Lemao Liu | Zhirui Zhang | Shuming Shi | Rui Wang
Findings of the Association for Computational Linguistics: ACL 2023

This paper rethinks translation memory augmented neural machine translation (TM-augmented NMT) from two perspectives, i.e., a probabilistic view of retrieval and the variance-bias decomposition principle. The finding demonstrates that TM-augmented NMT is good at the ability of fitting data (i.e., lower bias) but is more sensitive to the fluctuations in the training data (i.e., higher variance), which provides an explanation to a recently reported contradictory phenomenon on the same translation task: TM-augmented NMT substantially advances NMT without TM under the high resource scenario whereas it fails under the low resource scenario. Then this paper proposes a simple yet effective TM-augmented NMT model to promote the variance and address the contradictory phenomenon. Extensive experiments show that the proposed TM-augmented NMT achieves consistent gains over both conventional NMT and existing TM-augmented NMT under two variance-preferable (low resource and plug-and-play) scenarios as well as the high resource scenario.