Yu Sun


2021

pdf bib
ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora
Xuan Ouyang | Shuohuan Wang | Chao Pang | Yu Sun | Hao Tian | Hua Wu | Haifeng Wang
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Recent studies have demonstrated that pre-trained cross-lingual models achieve impressive performance in downstream cross-lingual tasks. This improvement benefits from learning a large amount of monolingual and parallel corpora. Although it is generally acknowledged that parallel corpora are critical for improving the model performance, existing methods are often constrained by the size of parallel corpora, especially for low-resource languages. In this paper, we propose Ernie-M, a new training method that encourages the model to align the representation of multiple languages with monolingual corpora, to overcome the constraint that the parallel corpus size places on the model performance. Our key insight is to integrate back-translation into the pre-training process. We generate pseudo-parallel sentence pairs on a monolingual corpus to enable the learning of semantic alignments between different languages, thereby enhancing the semantic modeling of cross-lingual models. Experimental results show that Ernie-M outperforms existing cross-lingual models and delivers new state-of-the-art results in various cross-lingual downstream tasks. The codes and pre-trained models will be made publicly available.

pdf bib
Correcting Chinese Spelling Errors with Phonetic Pre-training
Ruiqing Zhang | Chao Pang | Chuanqiang Zhang | Shuohuan Wang | Zhongjun He | Yu Sun | Hua Wu | Haifeng Wang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Latent Reasoning for Low-Resource Question Generation
Xinting Huang | Jianzhong Qi | Yu Sun | Rui Zhang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
CVAE-based Re-anchoring for Implicit Discourse Relation Classification
Zujun Dou | Yu Hong | Yu Sun | Guodong Zhou
Findings of the Association for Computational Linguistics: EMNLP 2021

Training implicit discourse relation classifiers suffers from data sparsity. Variational AutoEncoder (VAE) appears to be the proper solution. It is because ideally VAE is capable of generating inexhaustible varying samples, and this facilitates selective data augmentation. However, our experiments show that coupling VAE with the RoBERTa-based classifier results in severe performance degradation. We ascribe the unusual phenomenon to erroneous sampling that would happen when VAE pursued variations. To overcome the problem, we develop a re-anchoring strategy, where Conditional VAE (CVAE) is used for estimating the risk of erroneous sampling, and meanwhile migrating the anchor to reduce the risk. The test results on PDTB v2.0 illustrate that, compared to the RoBERTa-based baseline, re-anchoring yields substantial improvements. Besides, we observe that re-anchoring can cooperate with other auxiliary strategies (transfer learning and interactive attention mechanism) to further improve the baseline, obtaining the F-scores of about 55%, 63%, 80% and 44% for the four main relation types (Comparison, Contingency, Expansion, Temporality) in the binary classification (Yes/No) scenario.

pdf bib
Alpha at SemEval-2021 Task 6: Transformer Based Propaganda Classification
Zhida Feng | Jiji Tang | Jiaxiang Liu | Weichong Yin | Shikun Feng | Yu Sun | Li Chen
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper describes our system participated in Task 6 of SemEval-2021: the task focuses on multimodal propaganda technique classification and it aims to classify given image and text into 22 classes. In this paper, we propose to use transformer based architecture to fuse the clues from both image and text. We explore two branches of techniques including fine-tuning the text pretrained transformer with extended visual features, and fine-tuning the multimodal pretrained transformers. For the visual features, we have tested both grid features based on ResNet and salient region features from pretrained object detector. Among the pretrained multimodal transformers, we choose ERNIE-ViL, a two-steam cross-attended transformers pretrained on large scale image-caption aligned data. Fine-tuing ERNIE-ViL for our task produce a better performance due to general joint multimodal representation for text and image learned by ERNIE-ViL. Besides, as the distribution of the classification labels is very unbalanced, we also make a further attempt on the loss function and the experiment result shows that focal loss would perform better than cross entropy loss. Last we have won first for subtask C in the final competition.

pdf bib
abcbpc at SemEval-2021 Task 7: ERNIE-based Multi-task Model for Detecting and Rating Humor and Offense
Chao Pang | Xiaoran Fan | Weiyue Su | Xuyi Chen | Shuohuan Wang | Jiaxiang Liu | Xuan Ouyang | Shikun Feng | Yu Sun
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper describes our system participated in Task 7 of SemEval-2021: Detecting and Rating Humor and Offense. The task is designed to detect and score humor and offense which are influenced by subjective factors. In order to obtain semantic information from a large amount of unlabeled data, we applied unsupervised pre-trained language models. By conducting research and experiments, we found that the ERNIE 2.0 and DeBERTa pre-trained models achieved impressive performance in various subtasks. Therefore, we applied the above pre-trained models to fine-tune the downstream neural network. In the process of fine-tuning the model, we adopted multi-task training strategy and ensemble learning method. Based on the above strategy and method, we achieved RMSE of 0.4959 for subtask 1b, and finally won the first place.

pdf bib
ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding
Dongling Xiao | Yu-Kun Li | Han Zhang | Yu Sun | Hao Tian | Hua Wu | Haifeng Wang
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Coarse-grained linguistic information, such as named entities or phrases, facilitates adequately representation learning in pre-training. Previous works mainly focus on extending the objective of BERT’s Masked Language Modeling (MLM) from masking individual tokens to contiguous sequences of n tokens. We argue that such contiguously masking method neglects to model the intra-dependencies and inter-relation of coarse-grained linguistic information. As an alternative, we propose ERNIE-Gram, an explicitly n-gram masking method to enhance the integration of coarse-grained information into pre-training. In ERNIE-Gram, n-grams are masked and predicted directly using explicit n-gram identities rather than contiguous sequences of n tokens. Furthermore, ERNIE-Gram employs a generator model to sample plausible n-gram identities as optional n-gram masks and predict them in both coarse-grained and fine-grained manners to enable comprehensive n-gram prediction and relation modeling. We pre-train ERNIE-Gram on English and Chinese text corpora and fine-tune on 19 downstream tasks. Experimental results show that ERNIE-Gram outperforms previous pre-training models like XLNet and RoBERTa by a large margin, and achieves comparable results with state-of-the-art methods. The source codes and pre-trained models have been released at https://github.com/PaddlePaddle/ERNIE.

pdf bib
Parallel sentences mining with transfer learning in an unsupervised setting
Yu Sun | Shaolin Zhu | Feng Yifan | Chenggang Mi
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

The quality and quantity of parallel sentences are known as very important training data for constructing neural machine translation (NMT) systems. However, these resources are not available for many low-resource language pairs. Many existing methods need strong supervision are not suitable. Although several attempts at developing unsupervised models, they ignore the language-invariant between languages. In this paper, we propose an approach based on transfer learning to mine parallel sentences in the unsupervised setting.With the help of bilingual corpora of rich-resource language pairs, we can mine parallel sentences without bilingual supervision of low-resource language pairs. Experiments show that our approach improves the performance of mined parallel sentences compared with previous methods. In particular, we achieve excellent results at two real-world low-resource language pairs.

pdf bib
ERNIE-Doc: A Retrospective Long-Document Modeling Transformer
SiYu Ding | Junyuan Shang | Shuohuan Wang | Yu Sun | Hao Tian | Hua Wu | Haifeng Wang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Transformers are not suited for processing long documents, due to their quadratically increasing memory and time consumption. Simply truncating a long document or applying the sparse attention mechanism will incur the context fragmentation problem or lead to an inferior modeling capability against comparable model sizes. In this paper, we propose ERNIE-Doc, a document-level language pretraining model based on Recurrence Transformers. Two well-designed techniques, namely the retrospective feed mechanism and the enhanced recurrence mechanism, enable ERNIE-Doc, which has a much longer effective context length, to capture the contextual information of a complete document. We pretrain ERNIE-Doc to explicitly learn the relationships among segments with an additional document-aware segment-reordering objective. Various experiments were conducted on both English and Chinese document-level tasks. ERNIE-Doc improved the state-of-the-art language modeling result of perplexity to 16.8 on WikiText-103. Moreover, it outperformed competitive pretraining models by a large margin on most language understanding tasks, such as text classification and question answering.

2020

pdf bib
Kk2018 at SemEval-2020 Task 9: Adversarial Training for Code-Mixing Sentiment Classification
Jiaxiang Liu | Xuyi Chen | Shikun Feng | Shuohuan Wang | Xuan Ouyang | Yu Sun | Zhengjie Huang | Weiyue Su
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Code switching is a linguistic phenomenon which may occur within a multilingual setting where speakers share more than one language. With the increasing communication between groups with different languages, this phenomenon is more and more popular. However, there are little research and data in this area, especially in code-mixing sentiment classification. In this work, the domain transfer learning from state-of-the-art uni-language model ERNIE is tested on the code-mixing dataset, and surprisingly, a strong baseline is achieved. And further more, the adversarial training with a multi-lingual model is used to achieved 1st place of SemEval-2020 Task9 Hindi-English sentiment classification competition.

pdf bib
Galileo at SemEval-2020 Task 12: Multi-lingual Learning for Offensive Language Identification Using Pre-trained Language Models
Shuohuan Wang | Jiaxiang Liu | Xuan Ouyang | Yu Sun
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes Galileo’s performance in SemEval-2020 Task 12 on detecting and categorizing offensive language in social media. For Offensive Language Identification, we proposed a multi-lingual method using Pre-trained Language Models, ERNIE and XLM-R. For offensive language categorization, we proposed a knowledge distillation method trained on soft labels generated by several supervised models. Our team participated in all three sub-tasks. In Sub-task A - Offensive Language Identification, we ranked first in terms of average F1 scores in all languages. We are also the only team which ranked among the top three across all languages. We also took the first place in Sub-task B - Automatic Categorization of Offense Types and Sub-task C - Offence Target Identification.

pdf bib
ERNIE at SemEval-2020 Task 10: Learning Word Emphasis Selection by Pre-trained Language Model
Zhengjie Huang | Shikun Feng | Weiyue Su | Xuyi Chen | Shuohuan Wang | Jiaxiang Liu | Xuan Ouyang | Yu Sun
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes the system designed by ERNIE Team which achieved the first place in SemEval-2020 Task 10: Emphasis Selection For Written Text in Visual Media. Given a sentence, we are asked to find out the most important words as the suggestion for automated design. We leverage the unsupervised pre-training model and finetune these models on our task. After our investigation, we found that the following models achieved an excellent performance in this task: ERNIE 2.0, XLM-ROBERTA, ROBERTA and ALBERT. We combine a pointwise regression loss and a pairwise ranking loss which is more close to the final Match m metric to finetune our models. And we also find that additional feature engineering and data augmentation can help improve the performance. Our best model achieves the highest score of 0.823 and ranks first for all kinds of metrics.

pdf bib
Semi-Supervised Dialogue Policy Learning via Stochastic Reward Estimation
Xinting Huang | Jianzhong Qi | Yu Sun | Rui Zhang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Dialogue policy optimization often obtains feedback until task completion in task-oriented dialogue systems. This is insufficient for training intermediate dialogue turns since supervision signals (or rewards) are only provided at the end of dialogues. To address this issue, reward learning has been introduced to learn from state-action pairs of an optimal policy to provide turn-by-turn rewards. This approach requires complete state-action annotations of human-to-human dialogues (i.e., expert demonstrations), which is labor intensive. To overcome this limitation, we propose a novel reward learning approach for semi-supervised policy learning. The proposed approach learns a dynamics model as the reward function which models dialogue progress (i.e., state-action sequences) based on expert demonstrations, either with or without annotations. The dynamics model computes rewards by predicting whether the dialogue progress is consistent with expert demonstrations. We further propose to learn action embeddings for a better generalization of the reward function. The proposed approach outperforms competitive policy learning baselines on MultiWOZ, a benchmark multi-domain dataset.

pdf bib
Generalizable and Explainable Dialogue Generation via Explicit Action Learning
Xinting Huang | Jianzhong Qi | Yu Sun | Rui Zhang
Findings of the Association for Computational Linguistics: EMNLP 2020

Response generation for task-oriented dialogues implicitly optimizes two objectives at the same time: task completion and language quality. Conditioned response generation serves as an effective approach to separately and better optimize these two objectives. Such an approach relies on system action annotations which are expensive to obtain. To alleviate the need of action annotations, latent action learning is introduced to map each utterance to a latent representation. However, this approach is prone to over-dependence on the training data, and the generalization capability is thus restricted. To address this issue, we propose to learn natural language actions that represent utterances as a span of words. This explicit action representation promotes generalization via the compositional structure of language. It also enables an explainable generation process. Our proposed unsupervised approach learns a memory component to summarize system utterances into a short span of words. To further promote a compact action representation, we propose an auxiliary task that restores state annotations as the summarized dialogue context using the memory component. Our proposed approach outperforms latent action baselines on MultiWOZ, a benchmark multi-domain dataset.

pdf bib
PGL at TextGraphs 2020 Shared Task: Explanation Regeneration using Language and Graph Learning Methods
Weibin Li | Yuxiang Lu | Zhengjie Huang | Weiyue Su | Jiaxiang Liu | Shikun Feng | Yu Sun
Proceedings of the Graph-based Methods for Natural Language Processing (TextGraphs)

This paper describes the system designed by the Baidu PGL Team which achieved the first place in the TextGraphs 2020 Shared Task. The task focuses on generating explanations for elementary science questions. Given a question and its corresponding correct answer, we are asked to select the facts that can explain why the answer is correct for the question and answering (QA) from a large knowledge base. To address this problem, we use a pre-trained language model to recall the top-K relevant explanations for each question. Then, we adopt a re-ranking approach based on a pre-trained language model to rank the candidate explanations. To further improve the rankings, we also develop an architecture consisting both powerful pre-trained transformers and GNNs to tackle the multi-hop inference problem. The official evaluation shows that, our system can outperform the second best system by 1.91 points.

2019

pdf bib
OleNet at SemEval-2019 Task 9: BERT based Multi-Perspective Models for Suggestion Mining
Jiaxiang Liu | Shuohuan Wang | Yu Sun
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes our system partici- pated in Task 9 of SemEval-2019: the task is focused on suggestion mining and it aims to classify given sentences into sug- gestion and non-suggestion classes in do- main specific and cross domain training setting respectively. We propose a multi- perspective architecture for learning rep- resentations by using different classical models including Convolutional Neural Networks (CNN), Gated Recurrent Units (GRU), Feed Forward Attention (FFA), etc. To leverage the semantics distributed in large amount of unsupervised data, we also have adopted the pre-trained Bidi- rectional Encoder Representations from Transformers (BERT) model as an en- coder to produce sentence and word rep- resentations. The proposed architecture is applied for both sub-tasks, and achieved f1-score of 0.7812 for subtask A, and 0.8579 for subtask B. We won the first and second place for the two tasks respec- tively in the final competition.

2018

pdf bib
Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification
Xilun Chen | Yu Sun | Ben Athiwaratkun | Claire Cardie | Kilian Weinberger
Transactions of the Association for Computational Linguistics, Volume 6

In recent years great success has been achieved in sentiment classification for English, thanks in part to the availability of copious annotated resources. Unfortunately, most languages do not enjoy such an abundance of labeled data. To tackle the sentiment classification problem in low-resource languages without adequate annotated data, we propose an Adversarial Deep Averaging Network (ADAN1) to transfer the knowledge learned from labeled data on a resource-rich source language to low-resource languages where only unlabeled data exist. ADAN has two discriminative branches: a sentiment classifier and an adversarial language discriminator. Both branches take input from a shared feature extractor to learn hidden representations that are simultaneously indicative for the classification task and invariant across languages. Experiments on Chinese and Arabic sentiment classification demonstrate that ADAN significantly outperforms state-of-the-art systems.

2013

pdf bib
Generalization of Words for Chinese Dependency Parsing
Xianchao Wu | Jie Zhou | Yu Sun | Zhanyi Liu | Dianhai Yu | Hua Wu | Haifeng Wang
Proceedings of the 13th International Conference on Parsing Technologies (IWPT 2013)