Yang Fan


2021

pdf bib
mixSeq: A Simple Data Augmentation Methodfor Neural Machine Translation
Xueqing Wu | Yingce Xia | Jinhua Zhu | Lijun Wu | Shufang Xie | Yang Fan | Tao Qin
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

Data augmentation, which refers to manipulating the inputs (e.g., adding random noise,masking specific parts) to enlarge the dataset,has been widely adopted in machine learning. Most data augmentation techniques operate on a single input, which limits the diversity of the training corpus. In this paper, we propose a simple yet effective data augmentation technique for neural machine translation, mixSeq, which operates on multiple inputs and their corresponding targets. Specifically, we randomly select two input sequences,concatenate them together as a longer input aswell as their corresponding target sequencesas an enlarged target, and train models on theaugmented dataset. Experiments on nine machine translation tasks demonstrate that such asimple method boosts the baselines by a non-trivial margin. Our method can be further combined with single input based data augmentation methods to obtain further improvements.

pdf bib
Contextual Domain Classification with Temporal Representations
Tzu-Hsiang Lin | Yipeng Shi | Chentao Ye | Yang Fan | Weitong Ruan | Emre Barut | Wael Hamza | Chengwei Su
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers

In commercial dialogue systems, the Spoken Language Understanding (SLU) component tends to have numerous domains thus context is needed to help resolve ambiguities. Previous works that incorporate context for SLU have mostly focused on domains where context is limited to a few minutes. However, there are domains that have related context that could span up to hours and days. In this paper, we propose temporal representations that combine wall-clock second difference and turn order offset information to utilize both recent and distant context in a novel large-scale setup. Experiments on the Contextual Domain Classification (CDC) task with various encoder architectures show that temporal representations combining both information outperforms only one of the two. We further demonstrate that our contextual Transformer is able to reduce 13.04% of classification errors compared to a non-contextual baseline. We also conduct empirical analyses to study recent versus distant context and opportunities to lower deployment costs.

2020

pdf bib
CN-HIT-IT.NLP at SemEval-2020 Task 4: Enhanced Language Representation with Multiple Knowledge Triples
Yice Zhang | Jiaxuan Lin | Yang Fan | Peng Jin | Yuanchao Liu | Bingquan Liu
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes our system that participated in the SemEval-2020 task 4: Commonsense Validation and Explanation. For this task, it is obvious that external knowledge, such as Knowledge graph, can help the model understand commonsense in natural language statements. But how to select the right triples for statements remains unsolved, so how to reduce the interference of irrelevant triples on model performance is a research focus. This paper adopt a modified K-BERT as the language encoder, to enhance language representation through triples from knowledge graphs. Experiments show that our method is better than models without external knowledge, and is slightly better than the original K-BERT. We got an accuracy score of 0.97 in subtaskA, ranking 1/45, and got an accuracy score of 0.948, ranking 2/35.

2019

pdf bib
Microsoft Research Asia’s Systems for WMT19
Yingce Xia | Xu Tan | Fei Tian | Fei Gao | Di He | Weicong Chen | Yang Fan | Linyuan Gong | Yichong Leng | Renqian Luo | Yiren Wang | Lijun Wu | Jinhua Zhu | Tao Qin | Tie-Yan Liu
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

We Microsoft Research Asia made submissions to 11 language directions in the WMT19 news translation tasks. We won the first place for 8 of the 11 directions and the second place for the other three. Our basic systems are built on Transformer, back translation and knowledge distillation. We integrate several of our rececent techniques to enhance the baseline systems: multi-agent dual learning (MADL), masked sequence-to-sequence pre-training (MASS), neural architecture optimization (NAO), and soft contextual data augmentation (SCA).

2009

pdf bib
Refining Grammars for Parsing with Hierarchical Semantic Knowledge
Xiaojun Lin | Yang Fan | Meng Zhang | Xihong Wu | Huisheng Chi
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing