2024
pdf
bib
abs
HW-TSC 2024 Submission for the Quality Estimation Shared Task
Weiqiao Shan
|
Ming Zhu
|
Yuang Li
|
Mengyao Piao
|
Xiaofeng Zhao
|
Chang Su
|
Min Zhang
|
Hao Yang
|
Yanfei Jiang
Proceedings of the Ninth Conference on Machine Translation
Quality estimation (QE) is a crucial technique for evaluating the quality of machine translations without the need for reference translations. This paper focuses on Huawei Translation Services Center’s (HW-TSC’s) submission to the sentence-level QE shared task, named LLMs-enhanced-CrossQE. Our system builds upon the CrossQE architecture from our submission from last year, which consists of a multilingual base model and a task-specific downstream layer. The model input is a concatenation of the source and the translated sentences. To enhance performance, we fine-tuned and ensembled multiple base models, including XLM-R, InfoXLM, RemBERT, and CometKiwi. Specifically, we employed two pseudo-data generation methods: 1) a diverse pseudo-data generation method based on the corruption-based data augmentation technique introduced last year, and 2) a pseudo-data generation method that simulates machine translation errors using large language models (LLMs). Our results demonstrate that the system achieves outstanding performance on sentence-level QE test sets.
pdf
bib
abs
CB-Whisper: Contextual Biasing Whisper Using Open-Vocabulary Keyword-Spotting
Yuang Li
|
Yinglu Li
|
Min Zhang
|
Chang Su
|
Jiawei Yu
|
Mengyao Piao
|
Xiaosong Qiao
|
Miaomiao Ma
|
Yanqing Zhao
|
Hao Yang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
End-to-end automatic speech recognition (ASR) systems often struggle to recognize rare name entities, such as personal names, organizations and terminologies that are not frequently encountered in the training data. This paper presents Contextual Biasing Whisper (CB-Whisper), a novel ASR system based on OpenAI’s Whisper model that can recognize user-defined name entities by performing open-vocabulary keyword-spotting (KWS) before the decoder. The KWS module leverages text-to-speech (TTS) techniques and a convolutional neural network (CNN) classifier to match the features between the entities and the utterances. To integrate the recognized entities into the Whipser decoder and avoid hallucinations, we carefully crafted multiple prompts with spoken form hints. Experiments show that the KWS module based on Whisper encoder’s features can recognize unseen user-defined keywords effectively. More importantly, the proposed CB-Whisper substantially improves the mixed-error-rate (MER) and entity recall compared to the original Whisper model on three internal datasets and two publicly available datasets including Aishell and ACL datasets that cover English-only, Chinese-only, and code-switching scenarios.
pdf
bib
abs
HW-TSC 2024 Submission for the SemEval-2024 Task 1: Semantic Textual Relatedness (STR)
Mengyao Piao
|
Su Chang
|
Yuang Li
|
Xiaosong Qiao
|
Xiaofeng Zhao
|
Yinglu Li
|
Min Zhang
|
Hao Yang
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
The degree of semantic relatedness of two units of language has long been considered fundamental to understanding meaning. In this paper, we present the system of Huawei Translation Services Center (HW-TSC) for Task 1 of SemEval 2024, which aims to automatically measure the semantic relatedness of sentence pairs in African and Asian languages. The task dataset for this task covers about 14 different languages, These languages originate from five distinct language families and are predominantly spoken in Africa and Asia. For this shared task, we describe our proposed solutions, including ideas and the implementation steps of the task, as well as the outcomes of each experiment on the development dataset. To enhance the performance, we leverage these experimental outcomes and construct an ensemble one. Our results demonstrate that our system achieves impressive performance on test datasets in unsupervised track B and ranked first place for the Punjabi language pair.
pdf
bib
abs
HW-TSC at SemEval-2024 Task 5: Self-Eval? A Confident LLM System for Auto Prediction and Evaluation for the Legal Argument Reasoning Task
Xiaofeng Zhao
|
Xiaosong Qiao
|
Kaiwen Ou
|
Min Zhang
|
Su Chang
|
Mengyao Piao
|
Yuang Li
|
Yinglu Li
|
Ming Zhu
|
Yilun Liu
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
In this article, we present an effective system for semeval-2024 task 5. The task involves assessing the feasibility of a given solution in civil litigation cases based on relevant legal provisions, issues, solutions, and analysis. This task demands a high level of proficiency in U.S. law and natural language reasoning. In this task, we designed a self-eval LLM system that simultaneously performs reasoning and self-assessment tasks. We created a confidence interval and a prompt instructing the LLM to output the answer to a question along with its confidence level. We designed a series of experiments to prove the effectiveness of the self-eval mechanism. In order to avoid the randomness of the results, the final result is obtained by voting on three results generated by the GPT-4. Our submission was conducted under zero-resource setting, and we achieved first place in the task with an F1-score of 0.8231 and an accuracy of 0.8673.
2023
pdf
bib
abs
HW-TSC 2023 Submission for the Quality Estimation Shared Task
Yuang Li
|
Chang Su
|
Ming Zhu
|
Mengyao Piao
|
Xinglin Lyu
|
Min Zhang
|
Hao Yang
Proceedings of the Eighth Conference on Machine Translation
Quality estimation (QE) is an essential technique to assess machine translation quality without reference translations. In this paper, we focus on Huawei Translation Services Center’s (HW-TSC’s) submission to the sentence-level QE shared task, named Ensemble-CrossQE. Our system uses CrossQE, the same model architecture as our last year’s submission, which consists of a multilingual base model and a task-specific downstream layer. The input is the concatenation of the source and the translated sentences. To enhance the performance, we finetuned and ensembled multiple base models such as XLM-R, InfoXLM, RemBERT and CometKiwi. Moreover, we introduce a new corruption-based data augmentation method, which generates deletion, substitution and insertion errors in the original translation and uses a reference-based QE model to obtain pseudo scores. Results show that our system achieves impressive performance on sentence-level QE test sets and ranked the first place for three language pairs: English-Hindi, English-Tamil and English-Telegu. In addition, we participated in the error span detection task. The submitted model outperforms the baseline on Chinese-English and Hebrew-English language pairs.
pdf
bib
abs
Leveraging Multilingual Knowledge Graph to Boost Domain-specific Entity Translation of ChatGPT
Min Zhang
|
Limin Liu
|
Zhao Yanqing
|
Xiaosong Qiao
|
Su Chang
|
Xiaofeng Zhao
|
Junhao Zhu
|
Ming Zhu
|
Song Peng
|
Yinglu Li
|
Yilun Liu
|
Wenbing Ma
|
Mengyao Piao
|
Shimin Tao
|
Hao Yang
|
Yanfei Jiang
Proceedings of Machine Translation Summit XIX, Vol. 2: Users Track
Recently, ChatGPT has shown promising results for Machine Translation (MT) in general domains and is becoming a new paradigm for translation. In this paper, we focus on how to apply ChatGPT to domain-specific translation and propose to leverage Multilingual Knowledge Graph (MKG) to help ChatGPT improve the domain entity translation quality. To achieve this, we extract the bilingual entity pairs from MKG for the domain entities that are recognized from source sentences. We then introduce these pairs into translation prompts, instructing ChatGPT to use the correct translations of the domain entities. To evaluate the novel MKG method for ChatGPT, we conduct comparative experiments on three Chinese-English (zh-en) test datasets constructed from three specific domains, of which one domain is from biomedical science, and the other two are from the Information and Communications Technology (ICT) industry — Visible Light Communication (VLC) and wireless domains. Experimental results demonstrate that both the overall translation quality of ChatGPT (+6.21, +3.13 and +11.25 in BLEU scores) and the translation accuracy of domain entities (+43.2%, +30.2% and +37.9% absolute points) are significantly improved with MKG on the three test datasets.