Jie Hao


2022

pdf bib
Overcoming Catastrophic Forgetting During Domain Adaptation of Seq2seq Language Generation
Dingcheng Li | Zheng Chen | Eunah Cho | Jie Hao | Xiaohu Liu | Fan Xing | Chenlei Guo | Yang Liu
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Seq2seq language generation models that are trained offline with multiple domains in a sequential fashion often suffer from catastrophic forgetting. Lifelong learning has been proposed to handle this problem. However, existing work such as experience replay or elastic weighted consolidation requires incremental memory space. In this work, we propose an innovative framework, RMR_DSEthat leverages a recall optimization mechanism to selectively memorize important parameters of previous tasks via regularization, and uses a domain drift estimation algorithm to compensate the drift between different do-mains in the embedding space. These designs enable the model to be trained on the current task while keep-ing the memory of previous tasks, and avoid much additional data storage. Furthermore, RMR_DSE can be combined with existing lifelong learning approaches. Our experiments on two seq2seq language generation tasks, paraphrase and dialog response generation, show thatRMR_DSE outperforms SOTA models by a considerable margin and reduces forgetting greatly.

2021

pdf bib
The Mininglamp Machine Translation System for WMT21
Shiyu Zhao | Xiaopu Li | Minghui Wu | Jie Hao
Proceedings of the Sixth Conference on Machine Translation

This paper describes Mininglamp neural machine translation systems of the WMT2021 news translation tasks. We have participated in eight directions translation tasks for news text including Chinese to/from English, Hausa to/from English, German to/from English and French to/from German. Our fundamental system was based on Transformer architecture, with wider or smaller construction for different news translation tasks. We mainly utilized the method of back-translation, knowledge distillation and fine-tuning to boost single model, while the ensemble was used to combine single models. Our final submission has ranked first for the English to/from Hausa task.

pdf bib
Personalized Search-based Query Rewrite System for Conversational AI
Eunah Cho | Ziyan Jiang | Jie Hao | Zheng Chen | Saurabh Gupta | Xing Fan | Chenlei Guo
Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI

Query rewrite (QR) is an emerging component in conversational AI systems, reducing user defect. User defect is caused by various reasons, such as errors in the spoken dialogue system, users’ slips of the tongue or their abridged language. Many of the user defects stem from personalized factors, such as user’s speech pattern, dialect, or preferences. In this work, we propose a personalized search-based QR framework, which focuses on automatic reduction of user defect. We build a personalized index for each user, which encompasses diverse affinity layers to reflect personal preferences for each user in the conversational AI. Our personalized QR system contains retrieval and ranking layers. Supported by user feedback based learning, training our models does not require hand-annotated data. Experiments on personalized test set showed that our personalized QR system is able to correct systematic and user errors by utilizing phonetic and semantic inputs.

pdf bib
Contextual Rephrase Detection for Reducing Friction in Dialogue Systems
Zhuoyi Wang | Saurabh Gupta | Jie Hao | Xing Fan | Dingcheng Li | Alexander Hanbo Li | Chenlei Guo
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

For voice assistants like Alexa, Google Assistant, and Siri, correctly interpreting users’ intentions is of utmost importance. However, users sometimes experience friction with these assistants, caused by errors from different system components or user errors such as slips of the tongue. Users tend to rephrase their queries until they get a satisfactory response. Rephrase detection is used to identify the rephrases and has long been treated as a task with pairwise input, which does not fully utilize the contextual information (e.g. users’ implicit feedback). To this end, we propose a contextual rephrase detection model ContReph to automatically identify rephrases from multi-turn dialogues. We showcase how to leverage the dialogue context and user-agent interaction signals, including the user’s implicit feedback and the time gap between different turns, which can help significantly outperform the pairwise rephrase detection models.

pdf bib
RAST: Domain-Robust Dialogue Rewriting as Sequence Tagging
Jie Hao | Linfeng Song | Liwei Wang | Kun Xu | Zhaopeng Tu | Dong Yu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

The task of dialogue rewriting aims to reconstruct the latest dialogue utterance by copying the missing content from the dialogue context. Until now, the existing models for this task suffer from the robustness issue, i.e., performances drop dramatically when testing on a different dataset. We address this robustness issue by proposing a novel sequence-tagging-based model so that the search space is significantly reduced, yet the core of this task is still well covered. As a common issue of most tagging models for text generation, the model’s outputs may lack fluency. To alleviate this issue, we inject the loss signal from BLEU or GPT-2 under a REINFORCE framework. Experiments show huge improvements of our model over the current state-of-the-art systems when transferring to another dataset.

2020

pdf bib
OPPO’s Machine Translation System for the IWSLT 2020 Open Domain Translation Task
Qian Zhang | Xiaopu Li | Dawei Dang | Tingxun Shi | Di Ai | Zhengshan Xue | Jie Hao
Proceedings of the 17th International Conference on Spoken Language Translation

In this paper, we demonstrate our machine translation system applied for the Chinese-Japanese bidirectional translation task (aka. open domain translation task) for the IWSLT 2020. Our model is based on Transformer (Vaswani et al., 2017), with the help of many popular, widely proved effective data preprocessing and augmentation methods. Experiments show that these methods can improve the baseline model steadily and significantly.

pdf bib
XLP at SemEval-2020 Task 9: Cross-lingual Models with Focal Loss for Sentiment Analysis of Code-Mixing Language
Yili Ma | Liang Zhao | Jie Hao
Proceedings of the Fourteenth Workshop on Semantic Evaluation

In this paper, we present an approach for sentiment analysis in code-mixed language on twitter defined in SemEval-2020 Task 9. Our team (referred as LiangZhao) employ different multilingual models with weighted loss focused on complexity of code-mixing in sentence, in which the best model achieved f1-score of 0.806 and ranked 1st of subtask- Sentimix Spanglish. The performance of method is analyzed and each component of our architecture is demonstrated.

pdf bib
OPPO’s Machine Translation Systems for WMT20
Tingxun Shi | Shiyu Zhao | Xiaopu Li | Xiaoxue Wang | Qian Zhang | Di Ai | Dawei Dang | Xue Zhengshan | Jie Hao
Proceedings of the Fifth Conference on Machine Translation

In this paper we demonstrate our (OPPO’s) machine translation systems for the WMT20 Shared Task on News Translation for all the 22 language pairs. We will give an overview of the common aspects across all the systems firstly, including two parts: the data preprocessing part will show how the data are preprocessed and filtered, and the system part will show our models architecture and the techniques we followed. Detailed information, such as training hyperparameters and the results generated by each technique will be depicted in the corresponding subsections. Our final submissions ranked top in 6 directions (English Czech, English Russian, French German and Tamil English), third in 2 directions (English German, English Japanese), and fourth in 2 directions (English Pashto and and English Tamil).

2019

pdf bib
Multi-Granularity Self-Attention for Neural Machine Translation
Jie Hao | Xing Wang | Shuming Shi | Jinfeng Zhang | Zhaopeng Tu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Current state-of-the-art neural machine translation (NMT) uses a deep multi-head self-attention network with no explicit phrase information. However, prior work on statistical machine translation has shown that extending the basic translation unit from words to phrases has produced substantial improvements, suggesting the possibility of improving NMT performance from explicit modeling of phrases. In this work, we present multi-granularity self-attention (Mg-Sa): a neural network that combines multi-head self-attention and phrase modeling. Specifically, we train several attention heads to attend to phrases in either n-gram or syntactic formalisms. Moreover, we exploit interactions among phrases to enhance the strength of structure modeling – a commonly-cited weakness of self-attention. Experimental results on WMT14 English-to-German and NIST Chinese-to-English translation tasks show the proposed approach consistently improves performance. Targeted linguistic analysis reveal that Mg-Sa indeed captures useful phrase information at various levels of granularities.

pdf bib
Towards Better Modeling Hierarchical Structure for Self-Attention with Ordered Neurons
Jie Hao | Xing Wang | Shuming Shi | Jinfeng Zhang | Zhaopeng Tu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Recent studies have shown that a hybrid of self-attention networks (SANs) and recurrent neural networks RNNs outperforms both individual architectures, while not much is known about why the hybrid models work. With the belief that modeling hierarchical structure is an essential complementary between SANs and RNNs, we propose to further enhance the strength of hybrid models with an advanced variant of RNNs – Ordered Neurons LSTM (ON-LSTM), which introduces a syntax-oriented inductive bias to perform tree-like composition. Experimental results on the benchmark machine translation task show that the proposed approach outperforms both individual architectures and a standard hybrid model. Further analyses on targeted linguistic evaluation and logical inference tasks demonstrate that the proposed approach indeed benefits from a better modeling of hierarchical structure.

pdf bib
OPPO NMT System for IWSLT 2019
Xiaopu Li | Zhengshan Xue | Jie Hao
Proceedings of the 16th International Conference on Spoken Language Translation

This paper illustrates the OPPO's submission for IWSLT2019 text translation task Our system is based on Transformer architecture. Besides, we also study the effect of model ensembling. On the devsets of IWSLT 2019, the BLEU of our system reaches 19.94.

pdf bib
Modeling Recurrence for Transformer
Jie Hao | Xing Wang | Baosong Yang | Longyue Wang | Jinfeng Zhang | Zhaopeng Tu
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Recently, the Transformer model that is based solely on attention mechanisms, has advanced the state-of-the-art on various machine translation tasks. However, recent studies reveal that the lack of recurrence modeling hinders its further improvement of translation capacity. In response to this problem, we propose to directly model recurrence for Transformer with an additional recurrence encoder. In addition to the standard recurrent neural network, we introduce a novel attentive recurrent network to leverage the strengths of both attention models and recurrent networks. Experimental results on the widely-used WMT14 English⇒German and WMT17 Chinese⇒English translation tasks demonstrate the effectiveness of the proposed approach. Our studies also reveal that the proposed model benefits from a short-cut that bridges the source and target sequences with a single recurrent layer, which outperforms its deep counterpart.

2015

pdf bib
Well-Formed Dependency to String translation with BTG Grammar
Xiaoqing Li | Kun Wang | Dakun Zhang | Jie Hao
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation

2014

pdf bib
Query Lattice for Translation Retrieval
Meiping Dong | Yong Cheng | Yang Liu | Jia Xu | Maosong Sun | Tatsuya Izuha | Jie Hao
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers