Jiayi Wang


pdf bib
Distinguishing Non-natural from Natural Adversarial Samples for More Robust Pre-trained Language Model
Jiayi Wang | Rongzhou Bao | Zhuosheng Zhang | Hai Zhao
Findings of the Association for Computational Linguistics: ACL 2022

Recently, the problem of robustness of pre-trained language models (PrLMs) has received increasing research interest. Latest studies on adversarial attacks achieve high attack success rates against PrLMs, claiming that PrLMs are not robust. However, we find that the adversarial samples that PrLMs fail are mostly non-natural and do not appear in reality. We question the validity of the current evaluation of robustness of PrLMs based on these non-natural adversarial samples and propose an anomaly detector to evaluate the robustness of PrLMs with more natural adversarial samples. We also investigate two applications of the anomaly detector: (1) In data augmentation, we employ the anomaly detector to force generating augmented data that are distinguished as non-natural, which brings larger gains to the accuracy of PrLMs. (2) We apply the anomaly detector to a defense framework to enhance the robustness of PrLMs. It can be used to defend all types of attacks and achieves higher accuracy on both adversarial samples and compliant samples than other defense frameworks.


pdf bib
QEMind: Alibaba’s Submission to the WMT21 Quality Estimation Shared Task
Jiayi Wang | Ke Wang | Boxing Chen | Yu Zhao | Weihua Luo | Yuqi Zhang
Proceedings of the Sixth Conference on Machine Translation

Quality Estimation, as a crucial step of quality control for machine translation, has been explored for years. The goal is to to investigate automatic methods for estimating the quality of machine translation results without reference translations. In this year’s WMT QE shared task, we utilize the large-scale XLM-Roberta pre-trained model and additionally propose several useful features to evaluate the uncertainty of the translations to build our QE system, named QEMind . The system has been applied to the sentence-level scoring task of Direct Assessment and the binary score prediction task of Critical Error Detection. In this paper, we present our submissions to the WMT 2021 QE shared task and an extensive set of experimental results have shown us that our multilingual systems outperform the best system in the Direct Assessment QE task of WMT 2020.

pdf bib
Defending Pre-trained Language Models from Adversarial Word Substitution Without Performance Sacrifice
Rongzhou Bao | Jiayi Wang | Hai Zhao
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Beyond Glass-Box Features: Uncertainty Quantification Enhanced Quality Estimation for Neural Machine Translation
Ke Wang | Yangbin Shi | Jiayi Wang | Yuqi Zhang | Yu Zhao | Xiaolin Zheng
Findings of the Association for Computational Linguistics: EMNLP 2021

Quality Estimation (QE) plays an essential role in applications of Machine Translation (MT). Traditionally, a QE system accepts the original source text and translation from a black-box MT system as input. Recently, a few studies indicate that as a by-product of translation, QE benefits from the model and training data’s information of the MT system where the translations come from, and it is called the “glass-box QE”. In this paper, we extend the definition of “glass-box QE” generally to uncertainty quantification with both “black-box” and “glass-box” approaches and design several features deduced from them to blaze a new trial in improving QE’s performance. We propose a framework to fuse the feature engineering of uncertainty quantification into a pre-trained cross-lingual language model to predict the translation quality. Experiment results show that our method achieves state-of-the-art performances on the datasets of WMT 2020 QE shared task.


pdf bib
Computer Assisted Translation with Neural Quality Estimation and Automatic Post-Editing
Ke Wang | Jiayi Wang | Niyu Ge | Yangbin Shi | Yu Zhao | Kai Fan
Findings of the Association for Computational Linguistics: EMNLP 2020

With the advent of neural machine translation, there has been a marked shift towards leveraging and consuming the machine translation results. However, the gap between machine translation systems and human translators needs to be manually closed by post-editing. In this paper, we propose an end-to-end deep learning framework of the quality estimation and automatic post-editing of the machine translation output. Our goal is to provide error correction suggestions and to further relieve the burden of human translators through an interpretable model. To imitate the behavior of human translators, we design three efficient delegation modules – quality estimation, generative post-editing, and atomic operation post-editing and construct a hierarchical model based on them. We examine this approach with the English–German dataset from WMT 2017 APE shared task and our experimental results can achieve the state-of-the-art performance. We also verify that the certified translators can significantly expedite their post-editing processing with our model in human evaluation.

pdf bib
Alibaba’s Submission for the WMT 2020 APE Shared Task: Improving Automatic Post-Editing with Pre-trained Conditional Cross-Lingual BERT
Jiayi Wang | Ke Wang | Kai Fan | Yuqi Zhang | Jun Lu | Xin Ge | Yangbin Shi | Yu Zhao
Proceedings of the Fifth Conference on Machine Translation

The goal of Automatic Post-Editing (APE) is basically to examine the automatic methods for correcting translation errors generated by an unknown machine translation (MT) system. This paper describes Alibaba’s submissions to the WMT 2020 APE Shared Task for the English-German language pair. We design a two-stage training pipeline. First, a BERT-like cross-lingual language model is pre-trained by randomly masking target sentences alone. Then, an additional neural decoder on the top of the pre-trained model is jointly fine-tuned for the APE task. We also apply an imitation learning strategy to augment a reasonable amount of pseudo APE training data, potentially preventing the model to overfit on the limited real training data and boosting the performance on held-out data. To verify our proposed model and data augmentation, we examine our approach with the well-known benchmarking English-German dataset from the WMT 2017 APE task. The experiment results demonstrate that our system significantly outperforms all other baselines and achieves the state-of-the-art performance. The final results on the WMT 2020 test dataset show that our submission can achieve +5.56 BLEU and -4.57 TER with respect to the official MT baseline.


pdf bib
Alibaba Submission for WMT18 Quality Estimation Task
Jiayi Wang | Kai Fan | Bo Li | Fengming Zhou | Boxing Chen | Yangbin Shi | Luo Si
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

The goal of WMT 2018 Shared Task on Translation Quality Estimation is to investigate automatic methods for estimating the quality of machine translation results without reference translations. This paper presents the QE Brain system, which proposes the neural Bilingual Expert model as a feature extractor based on conditional target language model with a bidirectional transformer and then processes the semantic representations of source and the translation output with a Bi-LSTM predictive model for automatic quality estimation. The system has been applied to the sentence-level scoring and ranking tasks as well as the word-level tasks for finding errors for each word in translations. An extensive set of experimental results have shown that our system outperformed the best results in WMT 2017 Quality Estimation tasks and obtained top results in WMT 2018.