2022
pdf
bib
abs
Structure-Unified M-Tree Coding Solver for Math Word Problem
Bin Wang
|
Jiangzhou Ju
|
Yang Fan
|
Xinyu Dai
|
Shujian Huang
|
Jiajun Chen
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
As one of the challenging NLP tasks, designing math word problem (MWP) solvers has attracted increasing research attention for the past few years. In previous work, models designed by taking into account the properties of the binary tree structure of mathematical expressions at the output side have achieved better performance. However, the expressions corresponding to a MWP are often diverse (e.g., n1+n2 × n3-n4, n3× n2-n4+n1, etc.), and so are the corresponding binary trees, which creates difficulties in model learning due to the non-deterministic output space. In this paper, we propose the Structure-Unified M-Tree Coding Solver (SUMC-Solver), which applies a tree with any M branches (M-tree) to unify the output structures. To learn the M-tree, we use a mapping to convert the M-tree into the M-tree codes, where codes store the information of the paths from tree root to leaf nodes and the information of leaf nodes themselves, and then devise a Sequence-to-Code (seq2code) model to generate the codes. Experimental results on the widely used MAWPS and Math23K datasets have demonstrated that SUMC-Solver not only outperforms several state-of-the-art models under similar experimental settings but also performs much better under low-resource conditions.
2021
pdf
bib
abs
mixSeq: A Simple Data Augmentation Methodfor Neural Machine Translation
Xueqing Wu
|
Yingce Xia
|
Jinhua Zhu
|
Lijun Wu
|
Shufang Xie
|
Yang Fan
|
Tao Qin
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)
Data augmentation, which refers to manipulating the inputs (e.g., adding random noise,masking specific parts) to enlarge the dataset,has been widely adopted in machine learning. Most data augmentation techniques operate on a single input, which limits the diversity of the training corpus. In this paper, we propose a simple yet effective data augmentation technique for neural machine translation, mixSeq, which operates on multiple inputs and their corresponding targets. Specifically, we randomly select two input sequences,concatenate them together as a longer input aswell as their corresponding target sequencesas an enlarged target, and train models on theaugmented dataset. Experiments on nine machine translation tasks demonstrate that such asimple method boosts the baselines by a non-trivial margin. Our method can be further combined with single input based data augmentation methods to obtain further improvements.
pdf
bib
abs
Contextual Domain Classification with Temporal Representations
Tzu-Hsiang Lin
|
Yipeng Shi
|
Chentao Ye
|
Yang Fan
|
Weitong Ruan
|
Emre Barut
|
Wael Hamza
|
Chengwei Su
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers
In commercial dialogue systems, the Spoken Language Understanding (SLU) component tends to have numerous domains thus context is needed to help resolve ambiguities. Previous works that incorporate context for SLU have mostly focused on domains where context is limited to a few minutes. However, there are domains that have related context that could span up to hours and days. In this paper, we propose temporal representations that combine wall-clock second difference and turn order offset information to utilize both recent and distant context in a novel large-scale setup. Experiments on the Contextual Domain Classification (CDC) task with various encoder architectures show that temporal representations combining both information outperforms only one of the two. We further demonstrate that our contextual Transformer is able to reduce 13.04% of classification errors compared to a non-contextual baseline. We also conduct empirical analyses to study recent versus distant context and opportunities to lower deployment costs.
2020
pdf
bib
abs
CN-HIT-IT.NLP at SemEval-2020 Task 4: Enhanced Language Representation with Multiple Knowledge Triples
Yice Zhang
|
Jiaxuan Lin
|
Yang Fan
|
Peng Jin
|
Yuanchao Liu
|
Bingquan Liu
Proceedings of the Fourteenth Workshop on Semantic Evaluation
This paper describes our system that participated in the SemEval-2020 task 4: Commonsense Validation and Explanation. For this task, it is obvious that external knowledge, such as Knowledge graph, can help the model understand commonsense in natural language statements. But how to select the right triples for statements remains unsolved, so how to reduce the interference of irrelevant triples on model performance is a research focus. This paper adopt a modified K-BERT as the language encoder, to enhance language representation through triples from knowledge graphs. Experiments show that our method is better than models without external knowledge, and is slightly better than the original K-BERT. We got an accuracy score of 0.97 in subtaskA, ranking 1/45, and got an accuracy score of 0.948, ranking 2/35.
2019
pdf
bib
abs
Microsoft Research Asia’s Systems for WMT19
Yingce Xia
|
Xu Tan
|
Fei Tian
|
Fei Gao
|
Di He
|
Weicong Chen
|
Yang Fan
|
Linyuan Gong
|
Yichong Leng
|
Renqian Luo
|
Yiren Wang
|
Lijun Wu
|
Jinhua Zhu
|
Tao Qin
|
Tie-Yan Liu
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)
We Microsoft Research Asia made submissions to 11 language directions in the WMT19 news translation tasks. We won the first place for 8 of the 11 directions and the second place for the other three. Our basic systems are built on Transformer, back translation and knowledge distillation. We integrate several of our rececent techniques to enhance the baseline systems: multi-agent dual learning (MADL), masked sequence-to-sequence pre-training (MASS), neural architecture optimization (NAO), and soft contextual data augmentation (SCA).
2009
pdf
bib
Refining Grammars for Parsing with Hierarchical Semantic Knowledge
Xiaojun Lin
|
Yang Fan
|
Meng Zhang
|
Xihong Wu
|
Huisheng Chi
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing