Han Wang


2022

pdf bib
PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
Stephen Bach | Victor Sanh | Zheng Xin Yong | Albert Webson | Colin Raffel | Nihal V. Nayak | Abheesht Sharma | Taewoon Kim | M Saiful Bari | Thibault Fevry | Zaid Alyafeai | Manan Dey | Andrea Santilli | Zhiqing Sun | Srulik Ben-david | Canwen Xu | Gunjan Chhablani | Han Wang | Jason Fries | Maged Al-shaibani | Shanya Sharma | Urmish Thakker | Khalid Almubarak | Xiangru Tang | Dragomir Radev | Mike Tian-jian Jiang | Alexander Rush
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

PromptSource is a system for creating, sharing, and using natural language prompts. Prompts are functions that map an example from a dataset to a natural language input and target output. Using prompts to train and query language models is an emerging area in NLP that requires new tools that let users develop and refine these prompts collaboratively. PromptSource addresses the emergent challenges in this new setting with (1) a templating language for defining data-linked prompts, (2) an interface that lets users quickly iterate on prompt development by observing outputs of their prompts on many examples, and (3) a community-driven set of guidelines for contributing new prompts to a common pool. Over 2,000 prompts for roughly 170 datasets are already available in PromptSource. PromptSource is available at https://github.com/bigscience-workshop/promptsource.

2021

pdf bib
Personalized Entity Resolution with Dynamic Heterogeneous KnowledgeGraph Representations
Ying Lin | Han Wang | Jiangning Chen | Tong Wang | Yue Liu | Heng Ji | Yang Liu | Premkumar Natarajan
Proceedings of The 4th Workshop on e-Commerce and NLP

The growing popularity of Virtual Assistants poses new challenges for Entity Resolution, the task of linking mentions in text to their referent entities in a knowledge base. Specifically, in the shopping domain, customers tend to mention the entities implicitly (e.g., “organic milk”) rather than use the entity names explicitly, leading to a large number of candidate products. Meanwhile, for the same query, different customers may expect different results. For example, with “add milk to my cart”, a customer may refer to a certain product from his/her favorite brand, while some customers may want to re-order products they regularly purchase. Moreover, new customers may lack persistent shopping history, which requires us to enrich the connections between customers through products and their attributes. To address these issues, we propose a new framework that leverages personalized features to improve the accuracy of product ranking. We first build a cross-source heterogeneous knowledge graph from customer purchase history and product knowledge graph to jointly learn customer and product embeddings. After that, we incorporate product, customer, and history representations into a neural reranking model to predict which candidate is most likely to be purchased by a specific customer. Experiment results show that our model substantially improves the accuracy of the top ranked candidates by 24.6% compared to the state-of-the-art product search model.

pdf bib
Retrieval Enhanced Model for Commonsense Generation
Han Wang | Yang Liu | Chenguang Zhu | Linjun Shou | Ming Gong | Yichong Xu | Michael Zeng
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Decomposing Complex Questions Makes Multi-Hop QA Easier and More Interpretable
Ruiliu Fu | Han Wang | Xuejun Zhang | Jun Zhou | Yonghong Yan
Findings of the Association for Computational Linguistics: EMNLP 2021

Multi-hop QA requires the machine to answer complex questions through finding multiple clues and reasoning, and provide explanatory evidence to demonstrate the machine’s reasoning process. We propose Relation Extractor-Reader and Comparator (RERC), a three-stage framework based on complex question decomposition. The Relation Extractor decomposes the complex question, and then the Reader answers the sub-questions in turn, and finally the Comparator performs numerical comparison and summarizes all to get the final answer, where the entire process itself constitutes a complete reasoning evidence path. In the 2WikiMultiHopQA dataset, our RERC model has achieved the state-of-the-art performance, with a winning joint F1 score of 53.58 on the leaderboard. All indicators of our RERC are close to human performance, with only 1.95 behind the human level in F1 score of support fact. At the same time, the evidence path provided by our RERC framework has excellent readability and faithfulness.

pdf bib
Optimizing NLU Reranking Using Entity Resolution Signals in Multi-domain Dialog Systems
Tong Wang | Jiangning Chen | Mohsen Malmir | Shuyan Dong | Xin He | Han Wang | Chengwei Su | Yue Liu | Yang Liu
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers

In dialog systems, the Natural Language Understanding (NLU) component typically makes the interpretation decision (including domain, intent and slots) for an utterance before the mentioned entities are resolved. This may result in intent classification and slot tagging errors. In this work, we propose to leverage Entity Resolution (ER) features in NLU reranking and introduce a novel loss term based on ER signals to better learn model weights in the reranking framework. In addition, for a multi-domain dialog scenario, we propose a score distribution matching method to ensure scores generated by the NLU reranking models for different domains are properly calibrated. In offline experiments, we demonstrate our proposed approach significantly outperforms the baseline model on both single-domain and cross-domain evaluations.

pdf bib
Entity Resolution in Open-domain Conversations
Mingyue Shang | Tong Wang | Mihail Eric | Jiangning Chen | Jiyang Wang | Matthew Welch | Tiantong Deng | Akshay Grewal | Han Wang | Yue Liu | Yang Liu | Dilek Hakkani-Tur
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers

In recent years, incorporating external knowledge for response generation in open-domain conversation systems has attracted great interest. To improve the relevancy of retrieved knowledge, we propose a neural entity linking (NEL) approach. Different from formal documents, such as news, conversational utterances are informal and multi-turn, which makes it more challenging to disambiguate the entities. Therefore, we present a context-aware named entity recognition model (NER) and entity resolution (ER) model to utilize dialogue context information. We conduct NEL experiments on three open-domain conversation datasets and validate that incorporating context information improves the performance of NER and ER models. The end-to-end NEL approach outperforms the baseline by 62.8% relatively in F1 metric. Furthermore, we verify that using external knowledge based on NEL benefits the neural response generation model.

2020

pdf bib
Enhancing Generalization in Natural Language Inference by Syntax
Qi He | Han Wang | Yue Zhang
Findings of the Association for Computational Linguistics: EMNLP 2020

Pre-trained language models such as BERT have achieved the state-of-the-art performance on natural language inference (NLI). However, it has been shown that such models can be tricked by variations of surface patterns such as syntax. We investigate the use of dependency trees to enhance the generalization of BERT in the NLI task, leveraging on a graph convolutional network to represent a syntax-based matching graph with heterogeneous matching patterns. Experimental results show that, our syntax-based method largely enhance generalization of BERT on a test set where the sentence pair has high lexical overlap but diverse syntactic structures, and do not degrade performance on the standard test set. In other words, the proposed method makes BERT more robust on syntactic changes.

2015

pdf bib
Language and Domain Independent Entity Linking with Quantified Collective Validation
Han Wang | Jin Guang Zheng | Xiaogang Ma | Peter Fox | Heng Ji
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing