Jian Wang


2022

pdf bib
Domain-specific knowledge distillation yields smaller and better models for conversational commerce
Kristen Howell | Jian Wang | Akshay Hazare | Joseph Bradley | Chris Brew | Xi Chen | Matthew Dunn | Beth Hockey | Andrew Maurer | Dominic Widdows
Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)

We demonstrate that knowledge distillation can be used not only to reduce model size, but to simultaneously adapt a contextual language model to a specific domain. We use Multilingual BERT (mBERT; Devlin et al., 2019) as a starting point and follow the knowledge distillation approach of (Sahn et al., 2019) to train a smaller multilingual BERT model that is adapted to the domain at hand. We show that for in-domain tasks, the domain-specific model shows on average 2.3% improvement in F1 score, relative to a model distilled on domain-general data. Whereas much previous work with BERT has fine-tuned the encoder weights during task training, we show that the model improvements from distillation on in-domain data persist even when the encoder weights are frozen during task training, allowing a single encoder to support classifiers for multiple tasks and languages.

2020

pdf bib
Dual Dynamic Memory Network for End-to-End Multi-turn Task-oriented Dialog Systems
Jian Wang | Junhao Liu | Wei Bi | Xiaojiang Liu | Kejing He | Ruifeng Xu | Min Yang
Proceedings of the 28th International Conference on Computational Linguistics

Existing end-to-end task-oriented dialog systems struggle to dynamically model long dialog context for interactions and effectively incorporate knowledge base (KB) information into dialog generation. To conquer these limitations, we propose a Dual Dynamic Memory Network (DDMN) for multi-turn dialog generation, which maintains two core components: dialog memory manager and KB memory manager. The dialog memory manager dynamically expands the dialog memory turn by turn and keeps track of dialog history with an updating mechanism, which encourages the model to filter irrelevant dialog history and memorize important newly coming information. The KB memory manager shares the structural KB triples throughout the whole conversation, and dynamically extracts KB information with a memory pointer at each turn. Experimental results on three benchmark datasets demonstrate that DDMN significantly outperforms the strong baselines in terms of both automatic evaluation and human evaluation. Our code is available at https://github.com/siat-nlp/DDMN.

pdf bib
TransS-Driven Joint Learning Architecture for Implicit Discourse Relation Recognition
Ruifang He | Jian Wang | Fengyu Guo | Yugui Han
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Implicit discourse relation recognition is a challenging task due to the lack of connectives as strong linguistic clues. Previous methods primarily encode two arguments separately or extract the specific interaction patterns for the task, which have not fully exploited the annotated relation signal. Therefore, we propose a novel TransS-driven joint learning architecture to address the issues. Specifically, based on the multi-level encoder, we 1) translate discourse relations in low-dimensional embedding space (called TransS), which could mine the latent geometric structure information of argument-relation instances; 2) further exploit the semantic features of arguments to assist discourse understanding; 3) jointly learn 1) and 2) to mutually reinforce each other to obtain the better argument representations, so as to improve the performance of the task. Extensive experimental results on the Penn Discourse TreeBank (PDTB) show that our model achieves competitive results against several state-of-the-art systems.

2018

pdf bib
WECA: A WordNet-Encoded Collocation-Attention Network for Homographic Pun Recognition
Yufeng Diao | Hongfei Lin | Di Wu | Liang Yang | Kan Xu | Zhihao Yang | Jian Wang | Shaowu Zhang | Bo Xu | Dongyu Zhang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Homographic puns have a long history in human writing, widely used in written and spoken literature, which usually occur in a certain syntactic or stylistic structure. How to recognize homographic puns is an important research. However, homographic pun recognition does not solve very well in existing work. In this work, we first use WordNet to understand and expand word embedding for settling the polysemy of homographic puns, and then propose a WordNet-Encoded Collocation-Attention network model (WECA) which combined with the context weights for recognizing the puns. Our experiments on the SemEval2017 Task7 and Pun of the Day demonstrate that the proposed model is able to distinguish between homographic pun and non-homographic pun texts. We show the effectiveness of the model to present the capability of choosing qualitatively informative words. The results show that our model achieves the state-of-the-art performance on homographic puns recognition.

pdf bib
Think Visually: Question Answering through Virtual Imagery
Ankit Goyal | Jian Wang | Jia Deng
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this paper, we study the problem of geometric reasoning (a form of visual reasoning) in the context of question-answering. We introduce Dynamic Spatial Memory Network (DSMN), a new deep network architecture that specializes in answering questions that admit latent visual representations, and learns to generate and reason over such representations. Further, we propose two synthetic benchmarks, FloorPlanQA and ShapeIntersection, to evaluate the geometric reasoning capability of QA systems. Experimental results validate the effectiveness of our proposed DSMN for visual thinking tasks.

2017

pdf bib
Alibaba at IJCNLP-2017 Task 2: A Boosted Deep System for Dimensional Sentiment Analysis of Chinese Phrases
Xin Zhou | Jian Wang | Xu Xie | Changlong Sun | Luo Si
Proceedings of the IJCNLP 2017, Shared Tasks

This paper introduces Team Alibaba’s systems participating IJCNLP 2017 shared task No. 2 Dimensional Sentiment Analysis for Chinese Phrases (DSAP). The systems mainly utilize a multi-layer neural networks, with multiple features input such as word embedding, part-of-speech-tagging (POST), word clustering, prefix type, character embedding, cross sentiment input, and AdaBoost method for model training. For word level task our best run achieved MAE 0.545 (ranked 2nd), PCC 0.892 (ranked 2nd) in valence prediction and MAE 0.857 (ranked 1st), PCC 0.678 (ranked 2nd) in arousal prediction. For average performance of word and phrase task we achieved MAE 0.5355 (ranked 3rd), PCC 0.8965 (ranked 3rd) in valence prediction and MAE 0.661 (ranked 3rd), PCC 0.766 (ranked 2nd) in arousal prediction. In the final our submitted system achieved 2nd in mean rank.

2016

pdf bib
DUTIR in BioNLP-ST 2016: Utilizing Convolutional Network and Distributed Representation to Extract Complicate Relations
Honglei Li | Jianhai Zhang | Jian Wang | Hongfei Lin | Zhihao Yang
Proceedings of the 4th BioNLP Shared Task Workshop

2015

pdf bib
Biography-Dependent Collaborative Entity Archiving for Slot Filling
Yu Hong | Xiaobin Wang | Yadong Chen | Jian Wang | Tongtao Zhang | Heng Ji
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing