Yilun Liu


pdf bib
Leveraging Multilingual Knowledge Graph to Boost Domain-specific Entity Translation of ChatGPT
Min Zhang | Limin Liu | Zhao Yanqing | Xiaosong Qiao | Su Chang | Xiaofeng Zhao | Junhao Zhu | Ming Zhu | Song Peng | Yinglu Li | Yilun Liu | Wenbing Ma | Mengyao Piao | Shimin Tao | Hao Yang | Yanfei Jiang
Proceedings of Machine Translation Summit XIX, Vol. 2: Users Track

Recently, ChatGPT has shown promising results for Machine Translation (MT) in general domains and is becoming a new paradigm for translation. In this paper, we focus on how to apply ChatGPT to domain-specific translation and propose to leverage Multilingual Knowledge Graph (MKG) to help ChatGPT improve the domain entity translation quality. To achieve this, we extract the bilingual entity pairs from MKG for the domain entities that are recognized from source sentences. We then introduce these pairs into translation prompts, instructing ChatGPT to use the correct translations of the domain entities. To evaluate the novel MKG method for ChatGPT, we conduct comparative experiments on three Chinese-English (zh-en) test datasets constructed from three specific domains, of which one domain is from biomedical science, and the other two are from the Information and Communications Technology (ICT) industry — Visible Light Communication (VLC) and wireless domains. Experimental results demonstrate that both the overall translation quality of ChatGPT (+6.21, +3.13 and +11.25 in BLEU scores) and the translation accuracy of domain entities (+43.2%, +30.2% and +37.9% absolute points) are significantly improved with MKG on the three test datasets.

pdf bib
HW-TSC at SemEval-2023 Task 7: Exploring the Natural Language Inference Capabilities of ChatGPT and Pre-trained Language Model for Clinical Trial
Xiaofeng Zhao | Min Zhang | Miaomiao Ma | Chang Su | Yilun Liu | Minghan Wang | Xiaosong Qiao | Jiaxin Guo | Yinglu Li | Wenbing Ma
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

In this paper, we describe the multi strategy system for SemEval-2022 Task 7, This task aims to determine whether a given statement is supported by one or two Clinical Trial reports, and to identify evidence that supports the statement. This is a task that requires high natural language inference capabilities. In Subtask 1, we compare our strategy based on prompt learning and ChatGPT with a baseline constructed using BERT in zero-shot setting, and validate the effectiveness of our strategy. In Subtask 2, we fine-tune DeBERTaV3 for classification without relying on the results from Subtask 1, and we observe that early stopping can effectively prevent model overfitting, which performs well in Subtask 2. In addition, we did not use any ensemble strategies. Ultimately, we achieved the 10th place in Subtask 1 and the 2nd place in Subtask 2.

pdf bib
Empowering a Metric with LLM-assisted Named Entity Annotation: HW-TSC’s Submission to the WMT23 Metrics Shared Task
Zhanglin Wu | Yilun Liu | Min Zhang | Xiaofeng Zhao | Junhao Zhu | Ming Zhu | Xiaosong Qiao | Jingfei Zhang | Ma Miaomiao | Zhao Yanqing | Song Peng | Shimin Tao | Hao Yang | Yanfei Jiang
Proceedings of the Eighth Conference on Machine Translation

This paper presents the submission of Huawei Translation Service Center (HW-TSC) to the WMT23 metrics shared task, in which we submit two metrics: KG-BERTScore and HWTSC-EE-Metric. Among them, KG-BERTScore is our primary submission for the reference-free metric, which can provide both segment-level and system-level scoring. While HWTSC-EE-Metric is our primary submission for the reference-based metric, which can only provide system-level scoring. Overall, our metrics show relatively high correlations with MQM scores on the metrics tasks of previous years. Especially on system-level scoring tasks, our metrics achieve new state-of-the-art in many language pairs.


pdf bib
Partial Could Be Better than Whole. HW-TSC 2022 Submission for the Metrics Shared Task
Yilun Liu | Xiaosong Qiao | Zhanglin Wu | Su Chang | Min Zhang | Yanqing Zhao | Song Peng | Shimin Tao | Hao Yang | Ying Qin | Jiaxin Guo | Minghan Wang | Yinglu Li | Peng Li | Xiaofeng Zhao
Proceedings of the Seventh Conference on Machine Translation (WMT)

In this paper, we present the contribution of HW-TSC to WMT 2022 Metrics Shared Task. We propose one reference-based metric, HWTSC-EE-BERTScore*, and four referencefree metrics including HWTSC-Teacher-Sim, HWTSC-TLM, KG-BERTScore and CROSSQE. Among these metrics, HWTSC-Teacher-Sim and CROSS-QE are supervised, whereas HWTSC-EE-BERTScore*, HWTSC-TLM and KG-BERTScore are unsupervised. We use these metrics in the segment-level and systemlevel tracks. Overall, our systems achieve strong results for all language pairs on previous test sets and a new state-of-the-art in many sys-level case sets.

pdf bib
Part Represents Whole: Improving the Evaluation of Machine Translation System Using Entropy Enhanced Metrics
Yilun Liu | Shimin Tao | Chang Su | Min Zhang | Yanqing Zhao | Hao Yang
Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022

Machine translation (MT) metrics often experience poor correlations with human assessments. In terms of MT system evaluation, most metrics pay equal attentions to every sample in an evaluation set, while in human evaluation, difficult sentences often make candidate systems distinguishable via notable fluctuations in human scores, especially when systems are competitive. We find that samples with high entropy values, which though usually count less than 5%, tend to play a key role in MT evaluation: when the evaluation set is shrunk to only the high-entropy portion, correlations with human assessments are actually improved. Thus, in this paper, we propose a fast and unsupervised approach to enhance MT metrics using entropy, expanding the dimension of evaluation by introducing sentence-level difficulty. A translation hypothesis with a significantly high entropy value is considered difficult and receives a large weight in aggregation of system-level scores. Experimental results on five sub-tracks in the WMT19 Metrics shared tasks show that our proposed method significantly enhanced the performance of commonly-used MT metrics in terms of system-level correlations with human assessments, even outperforming existing SOTA metrics. In particular, all enhanced metrics exhibit overall stability in correlations with human assessments in circumstances where only competitive MT systems are included, while the corresponding vanilla metrics fail to correlate with human assessments.

pdf bib
HwTscSU’s Submissions on WAT 2022 Shared Task
Yilun Liu | Zhen Zhang | Shimin Tao | Junhui Li | Hao Yang
Proceedings of the 9th Workshop on Asian Translation

In this paper we describe our submission to the shared tasks of the 9th Workshop on Asian Translation (WAT 2022) on NICT–SAP under the team name ”HwTscSU”. The tasks involve translation from 5 languages into English and vice-versa in two domains: IT domain and Wikinews domain. The purpose is to determine the feasibility of multilingualism, domain adaptation or document-level knowledge given very little to none clean parallel corpora for training. Our approach for all translation tasks mainly focused on pre-training NMT models on general datasets and fine-tuning them on domain-specific datasets. Due to the small amount of parallel corpora, we collected and cleaned the OPUS dataset including three IT domain corpora, i.e., GNOME, KDE4, and Ubuntu. We then trained Transformer models on the collected dataset and fine-tuned on corresponding dev set. The BLEU scores greatly improved in comparison with other systems. Our submission ranked 1st in all IT-domain tasks and in one out of eight ALT domain tasks.