Yang Zhang


2022

pdf bib
Dual Attention Model for Citation Recommendation with Analyses on Explainability of Attention Mechanisms and Qualitative Experiments
Yang Zhang | Qiang Ma
Computational Linguistics, Volume 48, Issue 2 - June 2022

Based on an exponentially increasing number of academic articles, discovering and citing comprehensive and appropriate resources have become non-trivial tasks. Conventional citation recommendation methods suffer from severe information losses. For example, they do not consider the section header of the paper that the author is writing and for which they need to find a citation, the relatedness between the words in the local context (the text span that describes a citation), or the importance of each word from the local context. These shortcomings make such methods insufficient for recommending adequate citations to academic manuscripts. In this study, we propose a novel embedding-based neural network called dual attention model for citation recommendation (DACR) to recommend citations during manuscript preparation. Our method adapts the embedding of three semantic pieces of information: words in the local context, structural contexts,1 and the section on which the author is working. A neural network model is designed to maximize the similarity between the embedding of the three inputs (local context words, section headers, and structural contexts) and the target citation appearing in the context. The core of the neural network model comprises self-attention and additive attention; the former aims to capture the relatedness between the contextual words and structural context, and the latter aims to learn their importance. Recommendation experiments on real-world datasets demonstrate the effectiveness of the proposed approach. To seek explainability on DACR, particularly the two attention mechanisms, the learned weights from them are investigated to determine how the attention mechanisms interpret “relatedness” and “importance” through the learned weights. In addition, qualitative analyses were conducted to testify that DACR could find necessary citations that were not noticed by the authors in the past due to the limitations of the keyword-based searching.

pdf bib
DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings
Yung-Sung Chuang | Rumen Dangovski | Hongyin Luo | Yang Zhang | Shiyu Chang | Marin Soljacic | Shang-Wen Li | Scott Yih | Yoon Kim | James Glass
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

We propose DiffCSE, an unsupervised contrastive learning framework for learning sentence embeddings. DiffCSE learns sentence embeddings that are sensitive to the difference between the original sentence and an edited sentence, where the edited sentence is obtained by stochastically masking out the original sentence and then sampling from a masked language model. We show that DiffSCE is an instance of equivariant contrastive learning, which generalizes contrastive learning and learns representations that are insensitive to certain types of augmentations and sensitive to other “harmful” types of augmentations. Our experiments show that DiffCSE achieves state-of-the-art results among unsupervised sentence representation learning methods, outperforming unsupervised SimCSE by 2.3 absolute points on semantic textual similarity tasks.

2021

pdf bib
Frustratingly Simple Few-Shot Slot Tagging
Jianqiang Ma | Zeyu Yan | Chang Li | Yang Zhang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib
SQL Generation via Machine Reading Comprehension
Zeyu Yan | Jianqiang Ma | Yang Zhang | Jianping Shen
Proceedings of the 28th International Conference on Computational Linguistics

Text-to-SQL systems offers natural language interfaces to databases, which can automatically generates SQL queries given natural language questions. On the WikiSQL benchmark, state-of- the-art text-to-SQL systems typically take a slot-filling approach by building several specialized models for each type of slot. Despite being effective, such modularized systems are complex and also fall short in jointly learning for different slots. To solve these problems, this paper proposes a novel approach that formulates the task as a question answering problem, where different slots are predicted by a unified machine reading comprehension (MRC) model. For this purpose, we use a BERT-based MRC model, which can also benefit from intermediate training on other MRC datasets. The proposed method can achieve competitive results on WikiSQL, suggesting it being a promising direction for text-to-SQL.

pdf bib
Dual Attention Model for Citation Recommendation
Yang Zhang | Qiang Ma
Proceedings of the 28th International Conference on Computational Linguistics

Based on an exponentially increasing number of academic articles, discovering and citing comprehensive and appropriate resources has become a non-trivial task. Conventional citation recommender methods suffer from severe information loss. For example, they do not consider the section of the paper that the user is writing and for which they need to find a citation, the relatedness between the words in the local context (the text span that describes a citation), or the importance on each word from the local context. These shortcomings make such methods insufficient for recommending adequate citations to academic manuscripts. In this study, we propose a novel embedding-based neural network called “dual attention model for citation recommendation (DACR)” to recommend citations during manuscript preparation. Our method adapts embedding of three semantic information: words in the local context, structural contexts, and the section on which a user is working. A neural network model is designed to maximize the similarity between the embedding of the three input (local context words, section and structural contexts) and the target citation appearing in the context. The core of the neural network model is composed of self-attention and additive attention, where the former aims to capture the relatedness between the contextual words and structural context, and the latter aims to learn the importance of them. The experiments on real-world datasets demonstrate the effectiveness of the proposed approach.

pdf bib
FASTMATCH: Accelerating the Inference of BERT-based Text Matching
Shuai Pang | Jianqiang Ma | Zeyu Yan | Yang Zhang | Jianping Shen
Proceedings of the 28th International Conference on Computational Linguistics

Recently, pre-trained language models such as BERT have shown state-of-the-art accuracies in text matching. When being applied to IR (or QA), the BERT-based matching models need to online calculate the representations and interactions for all query-candidate pairs. The high inference cost has prohibited the deployments of BERT-based matching models in many practical applications. To address this issue, we propose a novel BERT-based text matching model, in which the representations and the interactions are decoupled. Then, the representations of the candidates can be calculated and stored offline, and directly retrieved during the online matching phase. To conduct the interactions and generate final matching scores, a lightweight attention network is designed. Experiments based on several large scale text matching datasets show that the proposed model, called FASTMATCH, can achieve up to 100X speed-up to BERT and RoBERTa at the online matching phase, while keeping more up to 98.7% of the performance.

pdf bib
BioMegatron: Larger Biomedical Domain Language Model
Hoo-Chang Shin | Yang Zhang | Evelina Bakhturina | Raul Puri | Mostofa Patwary | Mohammad Shoeybi | Raghav Mani
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

There has been an influx of biomedical domain-specific language models, showing language models pre-trained on biomedical text perform better on biomedical domain benchmarks than those trained on general domain text corpora such as Wikipedia and Books. Yet, most works do not study the factors affecting each domain language application deeply. Additionally, the study of model size on domain-specific models has been mostly missing. We empirically study and evaluate several factors that can affect performance on domain language applications, such as the sub-word vocabulary set, model size, pre-training corpus, and domain transfer. We show consistent improvements on benchmarks with our larger BioMegatron model trained on a larger domain corpus, contributing to our understanding of domain language model applications. We demonstrate noticeable improvements over the previous state-of-the-art (SOTA) on standard biomedical NLP benchmarks of question answering, named entity recognition, and relation extraction. Code and checkpoints to reproduce our experiments are available at [github.com/NVIDIA/NeMo].

pdf bib
Mention Extraction and Linking for SQL Query Generation
Jianqiang Ma | Zeyu Yan | Shuai Pang | Yang Zhang | Jianping Shen
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

On the WikiSQL benchmark, state-of-the-art text-to-SQL systems typically take a slot- filling approach by building several dedicated models for each type of slots. Such modularized systems are not only complex but also of limited capacity for capturing inter-dependencies among SQL clauses. To solve these problems, this paper proposes a novel extraction-linking approach, where a unified extractor recognizes all types of slot mentions appearing in the question sentence before a linker maps the recognized columns to the table schema to generate executable SQL queries. Trained with automatically generated annotations, the proposed method achieves the first place on the WikiSQL benchmark.

2019

pdf bib
Rethinking Cooperative Rationalization: Introspective Extraction and Complement Control
Mo Yu | Shiyu Chang | Yang Zhang | Tommi Jaakkola
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Selective rationalization has become a common mechanism to ensure that predictive models reveal how they use any available features. The selection may be soft or hard, and identifies a subset of input features relevant for prediction. The setup can be viewed as a co-operate game between the selector (aka rationale generator) and the predictor making use of only the selected features. The co-operative setting may, however, be compromised for two reasons. First, the generator typically has no direct access to the outcome it aims to justify, resulting in poor performance. Second, there’s typically no control exerted on the information left outside the selection. We revise the overall co-operative framework to address these challenges. We introduce an introspective model which explicitly predicts and incorporates the outcome into the selection process. Moreover, we explicitly control the rationale complement via an adversary so as not to leave any useful information out of the selection. We show that the two complementary mechanisms maintain both high predictive accuracy and lead to comprehensive rationales.

2018

pdf bib
Adaptive Learning of Local Semantic and Global Structure Representations for Text Classification
Jianyu Zhao | Zhiqiang Zhan | Qichuan Yang | Yang Zhang | Changjian Hu | Zhensheng Li | Liuxin Zhang | Zhiqiang He
Proceedings of the 27th International Conference on Computational Linguistics

Representation learning is a key issue for most Natural Language Processing (NLP) tasks. Most existing representation models either learn little structure information or just rely on pre-defined structures, leading to degradation of performance and generalization capability. This paper focuses on learning both local semantic and global structure representations for text classification. In detail, we propose a novel Sandwich Neural Network (SNN) to learn semantic and structure representations automatically without relying on parsers. More importantly, semantic and structure information contribute unequally to the text representation at corpus and instance level. To solve the fusion problem, we propose two strategies: Adaptive Learning Sandwich Neural Network (AL-SNN) and Self-Attention Sandwich Neural Network (SA-SNN). The former learns the weights at corpus level, and the latter further combines attention mechanism to assign the weights at instance level. Experimental results demonstrate that our approach achieves competitive performance on several text classification tasks, including sentiment analysis, question type classification and subjectivity classification. Specifically, the accuracies are MR (82.1%), SST-5 (50.4%), TREC (96%) and SUBJ (93.9%).

2017

pdf bib
Sentiment Lexicon Expansion Based on Neural PU Learning, Double Dictionary Lookup, and Polarity Association
Yasheng Wang | Yang Zhang | Bing Liu
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Although many sentiment lexicons in different languages exist, most are not comprehensive. In a recent sentiment analysis application, we used a large Chinese sentiment lexicon and found that it missed a large number of sentiment words in social media. This prompted us to make a new attempt to study sentiment lexicon expansion. This paper first poses the problem as a PU learning problem, which is a new formulation. It then proposes a new PU learning method suitable for our problem using a neural network. The results are enhanced further with a new dictionary-based technique and a novel polarity classification technique. Experimental results show that the proposed approach outperforms baseline methods greatly.

2016

pdf bib
Integrating Encyclopedic Knowledge into Neural Language Models
Yang Zhang | Jan Niehues | Alexander Waibel
Proceedings of the 13th International Conference on Spoken Language Translation

Neural models have recently shown big improvements in the performance of phrase-based machine translation. Recurrent language models, in particular, have been a great success due to their ability to model arbitrary long context. In this work, we integrate global semantic information extracted from large encyclopedic sources into neural network language models. We integrate semantic word classes extracted from Wikipedia and sentence level topic information into a recurrent neural network-based language model. The new resulting models exhibit great potential in alleviating data sparsity problems with the additional knowledge provided. This approach of integrating global information is not restricted to language modeling but can also be easily applied to any model that profits from context or further data resources, e.g. neural machine translation. Using this model has improved rescoring quality of a state-of-the-art phrase-based translation system by 0.84 BLEU points. We performed experiments on two language pairs.

2015

pdf bib
Clustering Sentences with Density Peaks for Multi-document Summarization
Yang Zhang | Yunqing Xia | Yi Liu | Wenmin Wang
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2011

pdf bib
Why Press Backspace? Understanding User Input Behaviors in Chinese Pinyin Input Method
Yabin Zheng | Lixing Xie | Zhiyuan Liu | Maosong Sun | Yang Zhang | Liyun Ru
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2006

pdf bib
Exploring Distributional Similarity Based Models for Query Spelling Correction
Mu Li | Muhua Zhu | Yang Zhang | Ming Zhou
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Discriminative Reranking for Spelling Correction
Yang Zhang | Pilian He | Wei Xiang | Mu Li
Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation