Yu Wang


2021

pdf bib
MedAI at SemEval-2021 Task 10: Negation-aware Pre-training for Source-free Negation Detection Domain Adaptation
Jinquan Sun | Qi Zhang | Yu Wang | Lei Zhang
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

Due to the increasing concerns for data privacy, source-free unsupervised domain adaptation attracts more and more research attention, where only a trained source model is assumed to be available, while the labeled source data remain private. To get promising adaptation results, we need to find effective ways to transfer knowledge learned in source domain and leverage useful domain specific information from target domain at the same time. This paper describes our winning contribution to SemEval 2021 Task 10: Source-Free Domain Adaptation for Semantic Processing. Our key idea is to leverage the model trained on source domain data to generate pseudo labels for target domain samples. Besides, we propose Negation-aware Pre-training (NAP) to incorporate negation knowledge into model. Our method win the 1st place with F1-score of 0.822 on the official blind test set of Negation Detection Track.

pdf bib
Chase: A Large-Scale and Pragmatic Chinese Dataset for Cross-Database Context-Dependent Text-to-SQL
Jiaqi Guo | Ziliang Si | Yu Wang | Qian Liu | Ming Fan | Jian-Guang Lou | Zijiang Yang | Ting Liu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

The cross-database context-dependent Text-to-SQL (XDTS) problem has attracted considerable attention in recent years due to its wide range of potential applications. However, we identify two biases in existing datasets for XDTS: (1) a high proportion of context-independent questions and (2) a high proportion of easy SQL queries. These biases conceal the major challenges in XDTS to some extent. In this work, we present Chase, a large-scale and pragmatic Chinese dataset for XDTS. It consists of 5,459 coherent question sequences (17,940 questions with their SQL queries annotated) over 280 databases, in which only 35% of questions are context-independent, and 28% of SQL queries are easy. We experiment on Chase with three state-of-the-art XDTS approaches. The best approach only achieves an exact match accuracy of 40% over all questions and 16% over all question sequences, indicating that Chase highlights the challenging problems of XDTS. We believe that XDTS can provide fertile soil for addressing the problems.

2020

pdf bib
Synchronous Double-channel Recurrent Network for Aspect-Opinion Pair Extraction
Shaowei Chen | Jie Liu | Yu Wang | Wenzheng Zhang | Ziming Chi
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Opinion entity extraction is a fundamental task in fine-grained opinion mining. Related studies generally extract aspects and/or opinion expressions without recognizing the relations between them. However, the relations are crucial for downstream tasks, including sentiment classification, opinion summarization, etc. In this paper, we explore Aspect-Opinion Pair Extraction (AOPE) task, which aims at extracting aspects and opinion expressions in pairs. To deal with this task, we propose Synchronous Double-channel Recurrent Network (SDRN) mainly consisting of an opinion entity extraction unit, a relation detection unit, and a synchronization unit. The opinion entity extraction unit and the relation detection unit are developed as two channels to extract opinion entities and relations simultaneously. Furthermore, within the synchronization unit, we design Entity Synchronization Mechanism (ESM) and Relation Synchronization Mechanism (RSM) to enhance the mutual benefit on the above two channels. To verify the performance of SDRN, we manually build three datasets based on SemEval 2014 and 2015 benchmarks. Extensive experiments demonstrate that SDRN achieves state-of-the-art performances.

pdf bib
The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding
Xiaodong Liu | Yu Wang | Jianshu Ji | Hao Cheng | Xueyun Zhu | Emmanuel Awa | Pengcheng He | Weizhu Chen | Hoifung Poon | Guihong Cao | Jianfeng Gao
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We present MT-DNN, an open-source natural language understanding (NLU) toolkit that makes it easy for researchers and developers to train customized deep learning models. Built upon PyTorch and Transformers, MT-DNN is designed to facilitate rapid customization for a broad spectrum of NLU tasks, using a variety of objectives (classification, regression, structured prediction) and text encoders (e.g., RNNs, BERT, RoBERTa, UniLM). A unique feature of MT-DNN is its built-in support for robust and transferable learning using the adversarial multi-task learning paradigm. To enable efficient production deployment, MT-DNN supports multi-task knowledge distillation, which can substantially compress a deep neural model without significant performance drop. We demonstrate the effectiveness of MT-DNN on a wide range of NLU applications across general and biomedical domains. The software and pre-trained models will be publicly available at https://github.com/namisan/mt-dnn.

pdf bib
基于规则的双重否定识别——以“不v1不v2”为例(Double Negative Recognition Based on Rules——Taking “不v1不v2” as an Example)
Yu Wang (王昱)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

“不v1不v2”是汉语中典型的双重否定结构形式之一,它包括“不+助动词+不+v2”(不得不去)、“不+是+不v2”(不是不好)、述宾结构“不v1...不v2”(不认为他不去)等多种双重否定结构,情况复杂。本文以“不v1不v2”为例,结合“元语否定”、“动词叙实性”、“否定焦点”等概念,对“不v1不v2”进行了全面的考察,制定了“不v1不v2”双重否定结构的识别策略。根据识别策略,设计了双重否定自动识别程序,并在此过程中补充了助动词表、非叙实动词表等词库。最终,对28033句语料进行了识别,识别正确率为97.87%,召回率约为93.10%。

pdf bib
HIT: Nested Named Entity Recognition via Head-Tail Pair and Token Interaction
Yu Wang | Yun Li | Hanghang Tong | Ziye Zhu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Named Entity Recognition (NER) is a fundamental task in natural language processing. In order to identify entities with nested structure, many sophisticated methods have been recently developed based on either the traditional sequence labeling approaches or directed hypergraph structures. Despite being successful, these methods often fall short in striking a good balance between the expression power for nested structure and the model complexity. To address this issue, we present a novel nested NER model named HIT. Our proposed HIT model leverages two key properties pertaining to the (nested) named entity, including (1) explicit boundary tokens and (2) tight internal connection between tokens within the boundary. Specifically, we design (1) Head-Tail Detector based on the multi-head self-attention mechanism and bi-affine classifier to detect boundary tokens, and (2) Token Interaction Tagger based on traditional sequence labeling approaches to characterize the internal token connection within the boundary. Experiments on three public NER datasets demonstrate that the proposed HIT achieves state-of-the-art performance.

pdf bib
How Effective is Task-Agnostic Data Augmentation for Pretrained Transformers?
Shayne Longpre | Yu Wang | Chris DuBois
Findings of the Association for Computational Linguistics: EMNLP 2020

Task-agnostic forms of data augmentation have proven widely effective in computer vision, even on pretrained models. In NLP similar results are reported most commonly for low data regimes, non-pretrained models, or situationally for pretrained models. In this paper we ask how effective these techniques really are when applied to pretrained transformers. Using two popular varieties of task-agnostic data augmentation (not tailored to any particular task), Easy Data Augmentation (Wei andZou, 2019) and Back-Translation (Sennrichet al., 2015), we conduct a systematic examination of their effects across 5 classification tasks, 6 datasets, and 3 variants of modern pretrained transformers, including BERT, XLNet, and RoBERTa. We observe a negative result, finding that techniques which previously reported strong improvements for non-pretrained models fail to consistently improve performance for pretrained transformers, even when training data is limited. We hope this empirical analysis helps inform practitioners where data augmentation techniques may confer improvements.

2019

pdf bib
Single Training Dimension Selection for Word Embedding with PCA
Yu Wang
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

In this paper, we present a fast and reliable method based on PCA to select the number of dimensions for word embeddings. First, we train one embedding with a generous upper bound (e.g. 1,000) of dimensions. Then we transform the embeddings using PCA and incrementally remove the lesser dimensions one at a time while recording the embeddings’ performance on language tasks. Lastly, we select the number of dimensions, balancing model size and accuracy. Experiments using various datasets and language tasks demonstrate that we are able to train about 10 times fewer sets of embeddings while retaining optimal performance. Researchers interested in training the best-performing embeddings for downstream tasks, such as sentiment analysis, question answering and hypernym extraction, as well as those interested in embedding compression should find the method helpful.

2018

pdf bib
A Neural Transition-based Model for Nested Mention Recognition
Bailin Wang | Wei Lu | Yu Wang | Hongxia Jin
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

It is common that entity mentions can contain other mentions recursively. This paper introduces a scalable transition-based method to model the nested structure of mentions. We first map a sentence with nested mentions to a designated forest where each mention corresponds to a constituent of the forest. Our shift-reduce based system then learns to construct the forest structure in a bottom-up manner through an action sequence whose maximal length is guaranteed to be three times of the sentence length. Based on Stack-LSTM which is employed to efficiently and effectively represent the states of the system in a continuous space, our system is further incorporated with a character-based component to capture letter-level patterns. Our model gets the state-of-the-art performances in ACE datasets, showing its effectiveness in detecting nested mentions.

pdf bib
A New Concept of Deep Reinforcement Learning based Augmented General Tagging System
Yu Wang | Abhishek Patel | Hongxia Jin
Proceedings of the 27th International Conference on Computational Linguistics

In this paper, a new deep reinforcement learning based augmented general tagging system is proposed. The new system contains two parts: a deep neural network (DNN) based sequence labeling model and a deep reinforcement learning (DRL) based augmented tagger. The augmented tagger helps improve system performance by modeling the data with minority tags. The new system is evaluated on SLU and NLU sequence labeling tasks using ATIS and CoNLL-2003 benchmark datasets, to demonstrate the new system’s outstanding performance on general tagging tasks. Evaluated by F1 scores, it shows that the new system outperforms the current state-of-the-art model on ATIS dataset by 1.9% and that on CoNLL-2003 dataset by 1.4%.

pdf bib
A Bi-Model Based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling
Yu Wang | Yilin Shen | Hongxia Jin
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

Intent detection and slot filling are two main tasks for building a spoken language understanding(SLU) system. Multiple deep learning based models have demonstrated good results on these tasks . The most effective algorithms are based on the structures of sequence to sequence models (or “encoder-decoder” models), and generate the intents and semantic tags either using separate models. Most of the previous studies, however, either treat the intent detection and slot filling as two separate parallel tasks, or use a sequence to sequence model to generate both semantic tags and intent. None of the approaches consider the cross-impact between the intent detection task and the slot filling task. In this paper, new Bi-model based RNN semantic frame parsing network structures are designed to perform the intent detection and slot filling tasks jointly, by considering their cross-impact to each other using two correlated bidirectional LSTMs (BLSTM). Our Bi-model structure with a decoder achieves state-of-art result on the benchmark ATIS data, with about 0.5% intent accuracy improvement and 0.9 % slot filling improvement.

2016

pdf bib
Off-topic Response Detection for Spontaneous Spoken English Assessment
Andrey Malinin | Rogier Van Dalen | Kate Knill | Yu Wang | Mark Gales
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2014

pdf bib
Towards Tracking Political Sentiment through Microblog Data
Yu Wang | Tom Clark | Jeffrey Staton | Eugene Agichtein
Proceedings of the Joint Workshop on Social Dynamics and Personal Attributes in Social Media

2010

pdf bib
Query Ambiguity Revisited: Clickthrough Measures for Distinguishing Informational and Ambiguous Queries
Yu Wang | Eugene Agichtein
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics