Shi Wang


2022

pdf bib
Neural Label Search for Zero-Shot Multi-Lingual Extractive Summarization
Ruipeng Jia | Xingxing Zhang | Yanan Cao | Zheng Lin | Shi Wang | Furu Wei
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In zero-shot multilingual extractive text summarization, a model is typically trained on English summarization dataset and then applied on summarization datasets of other languages. Given English gold summaries and documents, sentence-level labels for extractive summarization are usually generated using heuristics. However, these monolingual labels created on English datasets may not be optimal on datasets of other languages, for that there is the syntactic or semantic discrepancy between different languages. In this way, it is possible to translate the English dataset to other languages and obtain different sets of labels again using heuristics. To fully leverage the information of these different sets of labels, we propose NLSSum (Neural Label Search for Summarization), which jointly learns hierarchical weights for these different sets of labels together with our summarization model. We conduct multilingual zero-shot summarization experiments on MLSUM and WikiLingua datasets, and we achieve state-of-the-art results using both human and automatic evaluations across these two datasets.

2021

pdf bib
SOM-NCSCM : An Efficient Neural Chinese Sentence Compression Model Enhanced with Self-Organizing Map
Kangli Zi | Shi Wang | Yu Liu | Jicun Li | Yanan Cao | Cungen Cao
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Sentence Compression (SC), which aims to shorten sentences while retaining important words that express the essential meanings, has been studied for many years in many languages, especially in English. However, improvements on Chinese SC task are still quite few due to several difficulties: scarce of parallel corpora, different segmentation granularity of Chinese sentences, and imperfect performance of syntactic analyses. Furthermore, entire neural Chinese SC models have been under-investigated so far. In this work, we construct an SC dataset of Chinese colloquial sentences from a real-life question answering system in the telecommunication domain, and then, we propose a neural Chinese SC model enhanced with a Self-Organizing Map (SOM-NCSCM), to gain a valuable insight from the data and improve the performance of the whole neural Chinese SC model in a valid manner. Experimental results show that our SOM-NCSCM can significantly benefit from the deep investigation of similarity among data, and achieve a promising F1 score of 89.655 and BLEU4 score of 70.116, which also provides a baseline for further research on Chinese SC task.

pdf bib
Enhancing Document Ranking with Task-adaptive Training and Segmented Token Recovery Mechanism
Xingwu Sun | Yanling Cui | Hongyin Tang | Fuzheng Zhang | Beihong Jin | Shi Wang
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

In this paper, we propose a new ranking model DR-BERT, which improves the Document Retrieval (DR) task by a task-adaptive training process and a Segmented Token Recovery Mechanism (STRM). In the task-adaptive training, we first pre-train DR-BERT to be domain-adaptive and then make the two-phase fine-tuning. In the first-phase fine-tuning, the model learns query-document matching patterns regarding different query types in a pointwise way. Next, in the second-phase fine-tuning, the model learns document-level ranking features and ranks documents with regard to a given query in a listwise manner. Such pointwise plus listwise fine-tuning enables the model to minimize errors in the document ranking by incorporating ranking-specific supervisions. Meanwhile, the model derived from pointwise fine-tuning is also used to reduce noise in the training data of the listwise fine-tuning. On the other hand, we present STRM which can compute OOV word representation and contextualization more precisely in BERT-based models. As an effective strategy in DR-BERT, STRM improves the matching perfromance of OOV words between a query and a document. Notably, our DR-BERT model keeps in the top three on the MS MARCO leaderboard since May 20, 2020.

pdf bib
Deep Differential Amplifier for Extractive Summarization
Ruipeng Jia | Yanan Cao | Fang Fang | Yuchen Zhou | Zheng Fang | Yanbing Liu | Shi Wang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

For sentence-level extractive summarization, there is a disproportionate ratio of selected and unselected sentences, leading to flatting the summary features when maximizing the accuracy. The imbalanced classification of summarization is inherent, which can’t be addressed by common algorithms easily. In this paper, we conceptualize the single-document extractive summarization as a rebalance problem and present a deep differential amplifier framework. Specifically, we first calculate and amplify the semantic difference between each sentence and all other sentences, and then apply the residual unit as the second item of the differential amplifier to deepen the architecture. Finally, to compensate for the imbalance, the corresponding objective loss of minority class is boosted by a weighted cross-entropy. In contrast to previous approaches, this model pays more attention to the pivotal information of one sentence, instead of all the informative context modeling by recurrent or Transformer architecture. We demonstrate experimentally on two benchmark datasets that our summarizer performs competitively against state-of-the-art methods. Our source code will be available on Github.

2020

pdf bib
Neural Extractive Summarization with Hierarchical Attentive Heterogeneous Graph Network
Ruipeng Jia | Yanan Cao | Hengzhu Tang | Fang Fang | Cong Cao | Shi Wang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Sentence-level extractive text summarization is substantially a node classification task of network mining, adhering to the informative components and concise representations. There are lots of redundant phrases between extracted sentences, but it is difficult to model them exactly by the general supervised methods. Previous sentence encoders, especially BERT, specialize in modeling the relationship between source sentences. While, they have no ability to consider the overlaps of the target selected summary, and there are inherent dependencies among target labels of sentences. In this paper, we propose HAHSum (as shorthand for Hierarchical Attentive Heterogeneous Graph for Text Summarization), which well models different levels of information, including words and sentences, and spotlights redundancy dependencies between sentences. Our approach iteratively refines the sentence representations with redundancy-aware graph and delivers the label dependencies by message passing. Experiments on large scale benchmark corpus (CNN/DM, NYT, and NEWSROOM) demonstrate that HAHSum yields ground-breaking performance and outperforms previous extractive summarizers.

2018

pdf bib
Answer-focused and Position-aware Neural Question Generation
Xingwu Sun | Jing Liu | Yajuan Lyu | Wei He | Yanjun Ma | Shi Wang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

In this paper, we focus on the problem of question generation (QG). Recent neural network-based approaches employ the sequence-to-sequence model which takes an answer and its context as input and generates a relevant question as output. However, we observe two major issues with these approaches: (1) The generated interrogative words (or question words) do not match the answer type. (2) The model copies the context words that are far from and irrelevant to the answer, instead of the words that are close and relevant to the answer. To address these two issues, we propose an answer-focused and position-aware neural question generation model. (1) By answer-focused, we mean that we explicitly model question word generation by incorporating the answer embedding, which can help generate an interrogative word matching the answer type. (2) By position-aware, we mean that we model the relative distance between the context words and the answer. Hence the model can be aware of the position of the context words when copying them to generate a question. We conduct extensive experiments to examine the effectiveness of our model. The experimental results show that our model significantly improves the baseline and outperforms the state-of-the-art system.