Hao An


2024

pdf bib
Soul-Mix: Enhancing Multimodal Machine Translation with Manifold Mixup
Xuxin Cheng | Ziyu Yao | Yifei Xin | Hao An | Hongxiang Li | Yaowei Li | Yuexian Zou
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Multimodal machine translation (MMT) aims to improve the performance of machine translation with the help of visual information, which has received widespread attention recently. It has been verified that visual information brings greater performance gains when the textual information is limited. However, most previous works ignore to take advantage of the complete textual inputs and the limited textual inputs at the same time, which limits the overall performance. To solve this issue, we propose a mixup method termed Soul-Mix to enhance MMT by using visual information more effectively. We mix the predicted translations of complete textual input and the limited textual inputs. Experimental results on the Multi30K dataset of three translation directions show that our Soul-Mix significantly outperforms existing approaches and achieves new state-of-the-art performance with fewer parameters than some previous models. Besides, the strength of Soul-Mix is more obvious on more challenging MSCOCO dataset which includes more out-of-domain instances with lots of ambiguous verbs.

pdf bib
Detecting Subtle Differences between Human and Model Languages Using Spectrum of Relative Likelihood
Yang Xu | Yu Wang | Hao An | Zhichen Liu | Yongyuan Li
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Human and model-generated texts can be distinguished by examining the magnitude of likelihood in language. However, it is becoming increasingly difficult as language model’s capabilities of generating human-like texts keep evolving. This study provides a new perspective by using the relative likelihood values instead of absolute ones, and extracting useful features from the spectrum-view of likelihood for the human-model text detection task. We propose a detection procedure with two classification methods, supervised and heuristic-based, respectively, which results in competitive performances with previous zero-shot detection methods and a new state-of-the-art on short-text detection. Our method can also reveal subtle differences between human and model languages, which find theoretical roots in psycholinguistics studies.

pdf bib
Knowledge-enhanced Prompt Tuning for Dialogue-based Relation Extraction with Trigger and Label Semantic
Hao An | Zhihong Zhu | Xuxin Cheng | Zhiqi Huang | Yuexian Zou
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Dialogue-based relation extraction (DRE) aims to determine the semantic relation of a given pair of arguments from a piece of dialogue, which has received increasing attention. Due to the low information density of dialogue text, it is difficult for the model to focus on key information. To this end, in this paper, we propose a Knowledge-Enhanced Prompt-Tuning (KEPT) method to effectively enhance DRE model by exploiting trigger and label semantic. Specifically, we propose two beneficial tasks, masked trigger prediction, and verbalizer representation learning, to effectively inject trigger knowledge and label semantic knowledge respectively. Furthermore, we convert the DRE task to a masked language modeling task to unify the format of knowledge injection and utilization, aiming to better promote DRE performance. Experimental results on the DialogRE dataset show that our KEPT achieves state-of-the-art performance in F1 and F1c scores. Detailed analyses demonstrate the effectiveness and efficiency of our proposed approach. Code is available at https://github.com/blackbookay/KEPT.

pdf bib
Zero-Shot Spoken Language Understanding via Large Language Models: A Preliminary Study
Zhihong Zhu | Xuxin Cheng | Hao An | Zhichang Wang | Dongsheng Chen | Zhiqi Huang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Zero-shot Spoken Language Understanding (SLU) aims to enable task-oriented dialogue systems to understand user needs without training data. Challenging but worthwhile, zero-shot SLU reduces the time and effort that data labeling takes. Recent advancements in large language models (LLMs), such as GPT3.5 and ChatGPT, have shown promising results in zero-shot settings, which motivates us to explore prompt-based methods. In this study, we investigate whether strong SLU models can be constructed by directly prompting LLMs. Specifically, we propose a simple yet effective two-stage framework dubbed GPT-SLU, which transforms the SLU task into a question-answering problem. Powered by multi-stage mutual guided prompts, GPT-SLU can leverage the correlations between two subtasks in SLU to achieve better predictions, which is greatly explored in the traditional fine-tuning paradigm. Experimental results on three SLU benchmark datasets demonstrate the significant potential of LLMs for zero-shot SLU. Comprehensive analyses validate the effectiveness of our proposed framework and also indicate that there is still room for further improvement of LLMs in SLU scenarios.