Wei Li


2022

pdf bib
UNIMO-2: End-to-End Unified Vision-Language Grounded Learning
Wei Li | Can Gao | Guocheng Niu | Xinyan Xiao | Hao Liu | Jiachen Liu | Hua Wu | Haifeng Wang
Findings of the Association for Computational Linguistics: ACL 2022

Vision-Language Pre-training (VLP) has achieved impressive performance on various cross-modal downstream tasks. However, most existing methods can only learn from aligned image-caption data and rely heavily on expensive regional features, which greatly limits their scalability and performance. In this paper, we propose an end-to-end unified-modal pre-training framework, namely UNIMO-2, for joint learning on both aligned image-caption data and unaligned image-only and text-only corpus. We build a unified Transformer model to jointly learn visual representations, textual representations and semantic alignment between images and texts. In particular, we propose to conduct grounded learning on both images and texts via a sharing grounded space, which helps bridge unaligned images and texts, and align the visual and textual semantic spaces on different types of corpora. The experiments show that our grounded learning method can improve textual and visual semantic alignment for improving performance on various cross-modal tasks. Moreover, benefiting from effective joint modeling of different types of corpora, our model also achieves impressive performance on single-modal visual and textual tasks. Our code and models are public at the UNIMO project page https://unimo-ptm.github.io/.

pdf bib
Unsupervised Chinese Word Segmentation with BERT Oriented Probing and Transformation
Wei Li | Yuhan Song | Qi Su | Yanqiu Shao
Findings of the Association for Computational Linguistics: ACL 2022

Word Segmentation is a fundamental step for understanding Chinese language. Previous neural approaches for unsupervised Chinese Word Segmentation (CWS) only exploits shallow semantic information, which can miss important context. Large scale Pre-trained language models (PLM) have achieved great success in many areas because of its ability to capture the deep contextual semantic relation. In this paper, we propose to take advantage of the deep semantic information embedded in PLM (e.g., BERT) with a self-training manner, which iteratively probes and transforms the semantic information in PLM into explicit word segmentation ability. Extensive experiment results show that our proposed approach achieves state-of-the-art F1 score on two CWS benchmark datasets.

pdf bib
Explore More Guidance: A Task-aware Instruction Network for Sign Language Translation Enhanced with Data Augmentation
Yong Cao | Wei Li | Xianzhi Li | Min Chen | Guangyong Chen | Long Hu | Zhengdao Li | Kai Hwang
Findings of the Association for Computational Linguistics: NAACL 2022

Sign language recognition and translation first uses a recognition module to generate glosses from sign language videos and then employs a translation module to translate glosses into spoken sentences. Most existing works focus on the recognition step, while paying less attention to sign language translation. In this work, we propose a task-aware instruction network, namely TIN-SLT, for sign language translation, by introducing the isntruction module and the learning-based feature fuse strategy into a Transformer network. In this way, the pre-trained model’s language ability can be well explored and utilized to further boost the translation performance. Moreover, by exploring the representation space of sign language glosses and target spoken language, we propose a multi-level data augmentation scheme to adjust the data distribution of the training set. We conduct extensive experiments on two challenging benchmark datasets, PHOENIX-2014-T and ASLG-PC12, on which our method outperforms former best solutions by 1.65 and 1.42 in terms of BLEU-4. Our code and trained networks will be available upon the publication of this work.

pdf bib
Meta-CQG: A Meta-Learning Framework for Complex Question Generation over Knowledge Bases
Kun Zhang | Yunqi Qiu | Yuanzhuo Wang | Long Bai | Wei Li | Xuhui Jiang | Huawei Shen | Xueqi Cheng
Proceedings of the 29th International Conference on Computational Linguistics

Complex question generation over knowledge bases (KB) aims to generate natural language questions involving multiple KB relations or functional constraints. Existing methods train one encoder-decoder-based model to fit all questions. However, such a one-size-fits-all strategy may not perform well since complex questions exhibit an uneven distribution in many dimensions, such as question types, involved KB relations, and query structures, resulting in insufficient learning for long-tailed samples under different dimensions. To address this problem, we propose a meta-learning framework for complex question generation. The meta-trained generator can acquire universal and transferable meta-knowledge and quickly adapt to long-tailed samples through a few most related training samples. To retrieve similar samples for each input query, we design a self-supervised graph retriever to learn distributed representations for samples, and contrastive learning is leveraged to improve the learned representations. We conduct experiments on both WebQuestionsSP and ComplexWebQuestion, and results on long-tailed samples of different dimensions have been significantly improved, which demonstrates the effectiveness of the proposed framework.

pdf bib
《二十四史》古代汉语语义依存图库构建(Construction of Semantic Dependency Graph Bank of Ancient Chinese in twenty four histories)
Tian Huang (黄恬) | Yanqiu Shao (邵艳秋) | Wei Li (李炜)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“语义依存图是NLP处理语义的深层分析方法,能够对句子中词与词之间的语义进行分析。该文针对古代汉语特点,在制定古代汉语语义依存图标注规范的基础上,以《二十四史》为语料来源,完成标注了规模为3000句的古代汉语语义依存图库,标注一致性的kappa值为78.83%。通过与现代汉语语义依存图库的对比,对依存图库基本情况进行统计,分析古代汉语的语义特色和规律。统计显示,古代汉语语义分布宏观上符合齐普夫定律,在语义事件描述上具有强烈的历史性叙事和正式文体特征,如以人物纪传为中心,时间、地点等周边角色描述细致,叙事语言冷静客观,缺少描述情态、语气、程度、时间状态等的修饰词语等。 "

pdf bib
针对古代经典文献的引用查找问题的数据构建与匹配方法(Data Construction and Matching Method for the Task of Ancient Classics Reference Detection)
Wei Li (李炜) | Yanqiu Shao (邵艳秋) | Mengxi Bi (毕梦曦)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“中国古代思想家的思想建构往往建立在对更早期经典的创造性诠释中,将这些诠释中包含的引用查找出来对思想史研究意义重大。但一些体量较大的文献如果完全依靠手工标记引用将耗费大量时间与人力成本,因此找到一种自动化的方法辅助专家进行引用标记查找非常重要。以预训练语言模型为代表的自然语言处理技术的发展提升了计算机对于文本处理和语义理解的能力。据此,本文提出多种利用专家知识或深度学习语义理解能力的无监督基线方法来自动查找古代思想家著作中对早期经典的引用。为了验证本文提出的方法的效果并推动自然语言处理技术在数字人文领域的应用,本文以宋代具有重大影响力的理学家二程(程颢、程颐)对早期儒家经典的引用为例进行研究,并构建和发布相应的引用查找数据集1。实验结果表明本文提出的基于预训练语言模型和对比学习目标的复合方法可以较为准确地判断是否存在引用关系。基于短句的引用探测ROC-AUC值达到了87.83,基于段落的引用探测ROC-AUC值达到了91.02。进一步的分析表明本文的方法不仅有利于自动化找到引用关系,更能够有效帮助专家提高引用查找判断效率。本方法在注释整理、文本溯源、重出文献查找、引用统计分析、索引文献集制作等方面具有广阔的应用前景。”

pdf bib
基于强化学习的古今汉语句子对齐研究(Research on Sentence Alignment of Ancient and Modern Chinese based on Reinforcement Learning)
Kuai Yu (喻快) | Yanqiu Shao (邵艳秋) | Wei Li (李炜)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“基于深度学习的有监督机器翻译取得了良好的效果,但训练过程中需要大量质量较高的对齐语料。对于中文古今翻译场景,高质量的平行语料并不多,而粗对齐的篇章、段语料比较容易获得,因此语料对齐很有研究价值和研究必要。在传统双语平行语料的句子对齐研究中,传统方法根据双语文本中的长度、词汇、共现文字等语法信息,建立一个综合评判标准来衡量两个句对之间相似度。此类方法虽然在单句对齐上取得了较好的效果,但是对于句子语义匹配的能力有限,并且在一些多对多的对齐模式上的性能表现不佳。在本文中我们提出尝试利用现在发展迅速且具有强大语义表示能力的预训练语言模型来考虑双语的语义信息,但是单独使用预训练语言模型只能考虑相对局部的信息,因此我们提出采用基于动态规划算法的强化学习训练目标来整合段落全局信息,并且进行无监督训练。实验结果证明我们提出的方法训练得到的模型性能优于此前获得最好表现的基线模型,尤其相较于传统模型难以处理的多对多对齐模式下,性能提升较大。”

pdf bib
Complex Evolutional Pattern Learning for Temporal Knowledge Graph Reasoning
Zixuan Li | Saiping Guan | Xiaolong Jin | Weihua Peng | Yajuan Lyu | Yong Zhu | Long Bai | Wei Li | Jiafeng Guo | Xueqi Cheng
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

A Temporal Knowledge Graph (TKG) is a sequence of KGs corresponding to different timestamps. TKG reasoning aims to predict potential facts in the future given the historical KG sequences. One key of this task is to mine and understand evolutional patterns of facts from these sequences. The evolutional patterns are complex in two aspects, length-diversity and time-variability. Existing models for TKG reasoning focus on modeling fact sequences of a fixed length, which cannot discover complex evolutional patterns that vary in length. Furthermore, these models are all trained offline, which cannot well adapt to the changes of evolutional patterns from then on. Thus, we propose a new model, called Complex Evolutional Network (CEN), which uses a length-aware Convolutional Neural Network (CNN) to handle evolutional patterns of different lengths via an easy-to-difficult curriculum learning strategy. Besides, we propose to learn the model under the online setting so that it can adapt to the changes of evolutional patterns over time. Extensive experiments demonstrate that CEN obtains substantial performance improvement under both the traditional offline and the proposed online settings.

2021

pdf bib
SgSum:Transforming Multi-document Summarization into Sub-graph Selection
Moye Chen | Wei Li | Jiachen Liu | Xinyan Xiao | Hua Wu | Haifeng Wang
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Most of existing extractive multi-document summarization (MDS) methods score each sentence individually and extract salient sentences one by one to compose a summary, which have two main drawbacks: (1) neglecting both the intra and cross-document relations between sentences; (2) neglecting the coherence and conciseness of the whole summary. In this paper, we propose a novel MDS framework (SgSum) to formulate the MDS task as a sub-graph selection problem, in which source documents are regarded as a relation graph of sentences (e.g., similarity graph or discourse graph) and the candidate summaries are its sub-graphs. Instead of selecting salient sentences, SgSum selects a salient sub-graph from the relation graph as the summary. Comparing with traditional methods, our method has two main advantages: (1) the relations between sentences are captured by modeling both the graph structure of the whole document set and the candidate sub-graphs; (2) directly outputs an integrate summary in the form of sub-graph which is more informative and coherent. Extensive experiments on MultiNews and DUC datasets show that our proposed method brings substantial improvements over several strong baselines. Human evaluation results also demonstrate that our model can produce significantly more coherent and informative summaries compared with traditional MDS methods. Moreover, the proposed architecture has strong transfer ability from single to multi-document input, which can reduce the resource bottleneck in MDS tasks.

pdf bib
Do Transformer Modifications Transfer Across Implementations and Applications?
Sharan Narang | Hyung Won Chung | Yi Tay | Liam Fedus | Thibault Fevry | Michael Matena | Karishma Malkan | Noah Fiedel | Noam Shazeer | Zhenzhong Lan | Yanqi Zhou | Wei Li | Nan Ding | Jake Marcus | Adam Roberts | Colin Raffel
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

The research community has proposed copious modifications to the Transformer architecture since it was introduced over three years ago, relatively few of which have seen widespread adoption. In this paper, we comprehensively evaluate many of these modifications in a shared experimental setting that covers most of the common uses of the Transformer in natural language processing. Surprisingly, we find that most modifications do not meaningfully improve performance. Furthermore, most of the Transformer variants we found beneficial were either developed in the same codebase that we used or are relatively minor changes. We conjecture that performance improvements may strongly depend on implementation details and correspondingly make some recommendations for improving the generality of experimental results.

pdf bib
UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning
Wei Li | Can Gao | Guocheng Niu | Xinyan Xiao | Hao Liu | Jiachen Liu | Hua Wu | Haifeng Wang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Existed pre-training methods either focus on single-modal tasks or multi-modal tasks, and cannot effectively adapt to each other. They can only utilize single-modal data (i.e., text or image) or limited multi-modal data (i.e., image-text pairs). In this work, we propose a UNIfied-MOdal pre-training architecture, namely UNIMO, which can effectively adapt to both single-modal and multi-modal understanding and generation tasks. Large scale of free text corpus and image collections are utilized to improve the capability of visual and textual understanding, and cross-modal contrastive learning (CMCL) is leveraged to align the textual and visual information into a unified semantic space, over a corpus of image-text pairs augmented with related images and texts. With the help of rich non-paired single-modal data, our model is able to learn more generalizable representations, by allowing textual knowledge and visual knowledge to enhance each other in the unified semantic space. The experimental results show that UNIMO greatly improves the performance of several single-modal and multi-modal downstream tasks. Our code and pre-trained models are public at https://github.com/PaddlePaddle/Research/tree/master/NLP/UNIMO.

pdf bib
Search from History and Reason for Future: Two-stage Reasoning on Temporal Knowledge Graphs
Zixuan Li | Xiaolong Jin | Saiping Guan | Wei Li | Jiafeng Guo | Yuanzhuo Wang | Xueqi Cheng
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Temporal Knowledge Graphs (TKGs) have been developed and used in many different areas. Reasoning on TKGs that predicts potential facts (events) in the future brings great challenges to existing models. When facing a prediction task, human beings usually search useful historical information (i.e., clues) in their memories and then reason for future meticulously. Inspired by this mechanism, we propose CluSTeR to predict future facts in a two-stage manner, Clue Searching and Temporal Reasoning, accordingly. Specifically, at the clue searching stage, CluSTeR learns a beam search policy via reinforcement learning (RL) to induce multiple clues from historical facts. At the temporal reasoning stage, it adopts a graph convolution network based sequence method to deduce answers from clues. Experiments on four datasets demonstrate the substantial advantages of CluSTeR compared with the state-of-the-art methods. Moreover, the clues found by CluSTeR further provide interpretability for the results.

pdf bib
BASS: Boosting Abstractive Summarization with Unified Semantic Graph
Wenhao Wu | Wei Li | Xinyan Xiao | Jiachen Liu | Ziqiang Cao | Sujian Li | Hua Wu | Haifeng Wang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Abstractive summarization for long-document or multi-document remains challenging for the Seq2Seq architecture, as Seq2Seq is not good at analyzing long-distance relations in text. In this paper, we present BASS, a novel framework for Boosting Abstractive Summarization based on a unified Semantic graph, which aggregates co-referent phrases distributing across a long range of context and conveys rich relations between phrases. Further, a graph-based encoder-decoder model is proposed to improve both the document representation and summary generation process by leveraging the graph structure. Specifically, several graph augmentation methods are designed to encode both the explicit and implicit relations in the text while the graph-propagation attention mechanism is developed in the decoder to select salient content into the summary. Empirical results show that the proposed architecture brings substantial improvements for both long-document and multi-document summarization tasks.

pdf bib
UoB_UK at SemEval 2021 Task 2: Zero-Shot and Few-Shot Learning for Multi-lingual and Cross-lingual Word Sense Disambiguation.
Wei Li | Harish Tayyar Madabushi | Mark Lee
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper describes our submission to SemEval 2021 Task 2. We compare XLM-RoBERTa Base and Large in the few-shot and zero-shot settings and additionally test the effectiveness of using a k-nearest neighbors classifier in the few-shot setting instead of the more traditional multi-layered perceptron. Our experiments on both the multi-lingual and cross-lingual data show that XLM-RoBERTa Large, unlike the Base version, seems to be able to more effectively transfer learning in a few-shot setting and that the k-nearest neighbors classifier is indeed a more powerful classifier than a multi-layered perceptron when used in few-shot learning.

2020

pdf bib
Leveraging Graph to Improve Abstractive Multi-Document Summarization
Wei Li | Xinyan Xiao | Jiachen Liu | Hua Wu | Haifeng Wang | Junping Du
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Graphs that capture relations between textual units have great benefits for detecting salient information from multiple documents and generating overall coherent summaries. In this paper, we develop a neural abstractive multi-document summarization (MDS) model which can leverage well-known graph representations of documents such as similarity graph and discourse graph, to more effectively process multiple input documents and produce abstractive summaries. Our model utilizes graphs to encode documents in order to capture cross-document relations, which is crucial to summarizing long documents. Our model can also take advantage of graphs to guide the summary generation process, which is beneficial for generating coherent and concise summaries. Furthermore, pre-trained language models can be easily combined with our model, which further improve the summarization performance significantly. Empirical results on the WikiSum and MultiNews dataset show that the proposed architecture brings substantial improvements over several strong baselines.

pdf bib
基于统一模型的藏文新闻摘要(Abstractive Summarization of Tibetan News Based on Hybrid Model)
Xiaodong Yan (闫晓东) | Xiaoqing Xie (解晓庆) | Yu Zou (邹煜) | Wei Li (李维)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

Seq2seq神经网络模型在中英文文本摘要的研究中取得了良好的效果,但在低资源语言的文本摘要研究还处于探索阶段,尤其是在藏语中。此外,目前还没有大规模的标注语料库进行摘要提取。本文提出了一种生成藏文新闻摘要的统一模型。利用TextRank算法解决了藏语标注训练数据不足的问题。然后,采用两层双GRU神经网络提取代表原始新闻的句子,减少冗余信息。最后,使用基于注意力机制的Seq2Seq来生成理解式摘要。同时,我们加入了指针网络来处理未登录词的问题。实验结果表明,ROUGE-1评分比传统模型提高了2%。 关键词:文本摘要;藏文;TextRank; 指针网络;Bi-GRU

2019

pdf bib
Monotonic Infinite Lookback Attention for Simultaneous Machine Translation
Naveen Arivazhagan | Colin Cherry | Wolfgang Macherey | Chung-Cheng Chiu | Semih Yavuz | Ruoming Pang | Wei Li | Colin Raffel
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Simultaneous machine translation begins to translate each source sentence before the source speaker is finished speaking, with applications to live and streaming scenarios. Simultaneous systems must carefully schedule their reading of the source sentence to balance quality against latency. We present the first simultaneous translation system to learn an adaptive schedule jointly with a neural machine translation (NMT) model that attends over all source tokens read thus far. We do so by introducing Monotonic Infinite Lookback (MILk) attention, which maintains both a hard, monotonic attention head to schedule the reading of the source sentence, and a soft attention head that extends from the monotonic head back to the beginning of the source. We show that MILk’s adaptive schedule allows it to arrive at latency-quality trade-offs that are favorable to those of a recently proposed wait-k strategy for many latency values.

pdf bib
Coherent Comments Generation for Chinese Articles with a Graph-to-Sequence Model
Wei Li | Jingjing Xu | Yancheng He | ShengLi Yan | Yunfang Wu | Xu Sun
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Automatic article commenting is helpful in encouraging user engagement on online news platforms. However, the news documents are usually too long for models under traditional encoder-decoder frameworks, which often results in general and irrelevant comments. In this paper, we propose to generate comments with a graph-to-sequence model that models the input news as a topic interaction graph. By organizing the article into graph structure, our model can better understand the internal structure of the article and the connection between topics, which makes it better able to generate coherent and informative comments. We collect and release a large scale news-comment corpus from a popular Chinese online news platform Tencent Kuaibao. Extensive experiment results show that our model can generate much more coherent and informative comments compared with several strong baseline models.

2018

pdf bib
Improving Neural Abstractive Document Summarization with Explicit Information Selection Modeling
Wei Li | Xinyan Xiao | Yajuan Lyu | Yuanzhuo Wang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Information selection is the most important component in document summarization task. In this paper, we propose to extend the basic neural encoding-decoding framework with an information selection layer to explicitly model and optimize the information selection process in abstractive document summarization. Specifically, our information selection layer consists of two parts: gated global information filtering and local sentence selection. Unnecessary information in the original document is first globally filtered, then salient sentences are selected locally while generating each summary sentence sequentially. To optimize the information selection process directly, distantly-supervised training guided by the golden summary is also imported. Experimental results demonstrate that the explicit modeling and optimizing of the information selection process improves document summarization performance significantly, which enables our model to generate more informative and concise summaries, and thus significantly outperform state-of-the-art neural abstractive methods.

pdf bib
Improving Neural Abstractive Document Summarization with Structural Regularization
Wei Li | Xinyan Xiao | Yajuan Lyu | Yuanzhuo Wang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Recent neural sequence-to-sequence models have shown significant progress on short text summarization. However, for document summarization, they fail to capture the long-term structure of both documents and multi-sentence summaries, resulting in information loss and repetitions. In this paper, we propose to leverage the structural information of both documents and multi-sentence summaries to improve the document summarization performance. Specifically, we import both structural-compression and structural-coverage regularization into the summarization process in order to capture the information compression and information coverage properties, which are the two most important structural properties of document summarization. Experimental results demonstrate that the structural regularization improves the document summarization performance significantly, which enables our model to generate more informative and concise summaries, and thus significantly outperforms state-of-the-art neural abstractive methods.

pdf bib
Learning Universal Sentence Representations with Mean-Max Attention Autoencoder
Minghua Zhang | Yunfang Wu | Weikang Li | Wei Li
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

In order to learn universal sentence representations, previous methods focus on complex recurrent neural networks or supervised learning. In this paper, we propose a mean-max attention autoencoder (mean-max AAE) within the encoder-decoder framework. Our autoencoder rely entirely on the MultiHead self-attention mechanism to reconstruct the input sequence. In the encoding we propose a mean-max strategy that applies both mean and max pooling operations over the hidden vectors to capture diverse information of the input. To enable the information to steer the reconstruction process dynamically, the decoder performs attention over the mean-max representation. By training our model on a large collection of unlabelled data, we obtain high-quality representations of sentences. Experimental results on a broad range of 10 transfer tasks demonstrate that our model outperforms the state-of-the-art unsupervised single methods, including the classical skip-thoughts and the advanced skip-thoughts+LN model. Furthermore, compared with the traditional recurrent neural network, our mean-max AAE greatly reduce the training time.

pdf bib
Automatic Academic Paper Rating Based on Modularized Hierarchical Convolutional Neural Network
Pengcheng Yang | Xu Sun | Wei Li | Shuming Ma
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

As more and more academic papers are being submitted to conferences and journals, evaluating all these papers by professionals is time-consuming and can cause inequality due to the personal factors of the reviewers. In this paper, in order to assist professionals in evaluating academic papers, we propose a novel task: automatic academic paper rating (AAPR), which automatically determine whether to accept academic papers. We build a new dataset for this task and propose a novel modularized hierarchical convolutional neural network to achieve automatic academic paper rating. Evaluation results show that the proposed model outperforms the baselines by a large margin. The dataset and code are available at https://github.com/lancopku/AAPR

pdf bib
Query and Output: Generating Words by Querying Distributed Word Representations for Paraphrase Generation
Shuming Ma | Xu Sun | Wei Li | Sujian Li | Wenjie Li | Xuancheng Ren
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Most recent approaches use the sequence-to-sequence model for paraphrase generation. The existing sequence-to-sequence model tends to memorize the words and the patterns in the training dataset instead of learning the meaning of the words. Therefore, the generated sentences are often grammatically correct but semantically improper. In this work, we introduce a novel model based on the encoder-decoder framework, called Word Embedding Attention Network (WEAN). Our proposed model generates the words by querying distributed word representations (i.e. neural word embeddings), hoping to capturing the meaning of the according words. Following previous work, we evaluate our model on two paraphrase-oriented tasks, namely text simplification and short text abstractive summarization. Experimental results show that our model outperforms the sequence-to-sequence baseline by the BLEU score of 6.3 and 5.5 on two English text simplification datasets, and the ROUGE-2 F1 score of 5.7 on a Chinese summarization dataset. Moreover, our model achieves state-of-the-art performances on these three benchmark datasets.

pdf bib
SGM: Sequence Generation Model for Multi-label Classification
Pengcheng Yang | Xu Sun | Wei Li | Shuming Ma | Wei Wu | Houfeng Wang
Proceedings of the 27th International Conference on Computational Linguistics

Multi-label classification is an important yet challenging task in natural language processing. It is more complex than single-label classification in that the labels tend to be correlated. Existing methods tend to ignore the correlations between labels. Besides, different parts of the text can contribute differently for predicting different labels, which is not considered by existing models. In this paper, we propose to view the multi-label classification task as a sequence generation problem, and apply a sequence generation model with a novel decoder structure to solve it. Extensive experimental results show that our proposed methods outperform previous work by a substantial margin. Further analysis of experimental results demonstrates that the proposed methods not only capture the correlations between labels, but also select the most informative words automatically when predicting different labels.

2017

pdf bib
Derivation of Document Vectors from Adaptation of LSTM Language Model
Wei Li | Brian Mak
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

In many natural language processing (NLP) tasks, a document is commonly modeled as a bag of words using the term frequency-inverse document frequency (TF-IDF) vector. One major shortcoming of the frequency-based TF-IDF feature vector is that it ignores word orders that carry syntactic and semantic relationships among the words in a document. This paper proposes a novel distributed vector representation of a document, which will be labeled as DV-LSTM, and is derived from the result of adapting a long short-term memory recurrent neural network language model by the document. DV-LSTM is expected to capture some high-level sequential information in the document, which other current document representations fail to do. It was evaluated in document genre classification in the Brown Corpus and the BNC Baby Corpus. The results show that DV-LSTM significantly outperforms TF-IDF vector and paragraph vector (PV-DM) in most cases, and their combinations may further improve the classification performance.

2016

pdf bib
Abstractive News Summarization based on Event Semantic Link Network
Wei Li | Lei He | Hai Zhuge
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

This paper studies the abstractive multi-document summarization for event-oriented news texts through event information extraction and abstract representation. Fine-grained event mentions and semantic relations between them are extracted to build a unified and connected event semantic link network, an abstract representation of source texts. A network reduction algorithm is proposed to summarize the most salient and coherent event information. New sentences with good linguistic quality are automatically generated and selected through sentences over-generation and greedy-selection processes. Experimental results on DUC 2006 and DUC 2007 datasets show that our system significantly outperforms the state-of-the-art extractive and abstractive baselines under both pyramid and ROUGE evaluation metrics.

pdf bib
Exploring Differential Topic Models for Comparative Summarization of Scientific Papers
Lei He | Wei Li | Hai Zhuge
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

This paper investigates differential topic models (dTM) for summarizing the differences among document groups. Starting from a simple probabilistic generative model, we propose dTM-SAGE that explicitly models the deviations on group-specific word distributions to indicate how words are used differen-tially across different document groups from a background word distribution. It is more effective to capture unique characteristics for comparing document groups. To generate dTM-based comparative summaries, we propose two sentence scoring methods for measuring the sentence discriminative capacity. Experimental results on scientific papers dataset show that our dTM-based comparative summari-zation methods significantly outperform the generic baselines and the state-of-the-art comparative summarization methods under ROUGE metrics.

pdf bib
Chinese Poetry Generation with Planning based Neural Network
Zhe Wang | Wei He | Hua Wu | Haiyang Wu | Wei Li | Haifeng Wang | Enhong Chen
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Chinese poetry generation is a very challenging task in natural language processing. In this paper, we propose a novel two-stage poetry generating method which first plans the sub-topics of the poem according to the user’s writing intent, and then generates each line of the poem sequentially, using a modified recurrent neural network encoder-decoder framework. The proposed planning-based method can ensure that the generated poem is coherent and semantically consistent with the user’s intent. A comprehensive evaluation with human judgments demonstrates that our proposed approach outperforms the state-of-the-art poetry generating methods and the poem quality is somehow comparable to human poets.

pdf bib
Multi-level Gated Recurrent Neural Network for dialog act classification
Wei Li | Yunfang Wu
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In this paper we focus on the problem of dialog act (DA) labelling. This problem has recently attracted a lot of attention as it is an important sub-part of an automatic question answering system, which is currently in great demand. Traditional methods tend to see this problem as a sequence labelling task and deals with it by applying classifiers with rich features. Most of the current neural network models still omit the sequential information in the conversation. Henceforth, we apply a novel multi-level gated recurrent neural network (GRNN) with non-textual information to predict the DA tag. Our model not only utilizes textual information, but also makes use of non-textual and contextual information. In comparison, our model has shown significant improvement over previous works on Switchboard Dialog Act (SWDA) task by over 6%.

2015

pdf bib
Abstractive Multi-document Summarization with Semantic Information Extraction
Wei Li
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Improved beam search with constrained softmax for NMT
Xiaoguang Hu | Wei Li | Xiang Lan | Hua Wu | Haifeng Wang
Proceedings of Machine Translation Summit XV: Papers

2006

pdf bib
Mining Implicit Entities in Queries
Wei Li | Wenjie Li | Qin Lu
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Entities are pivotal in describing events and objects, and also very important in Document Summarization. In general only explicit entities which can be extracted by a Named Entity Recognizer are used in real applications. However, implicit entities hidden behind the phrases or words, e.g. entity referred by the phrase “cross border”, are proved to be helpful in Document Summarization. In our experiment, we extract the implicit entities from the web resources.

2005

pdf bib
Automatic Image Annotation Using Maximum Entropy Model
Wei Li | Maosong Sun
Second International Joint Conference on Natural Language Processing: Full Papers

pdf bib
A Preliminary Work on Classifying Time Granularities of Temporal Questions
Wei Li | Wenjie Li | Qin Lu | Kam-Fai Wong
Second International Joint Conference on Natural Language Processing: Full Papers

pdf bib
Word Independent Context Pair Classification Model for Word Sense Disambiguation
Cheng Niu | Wei Li | Rohini K. Srihari | Huifeng Li
Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005)

2004

pdf bib
Weakly Supervised Learning for Cross-document Person Name Disambiguation Supported by Information Extraction
Cheng Niu | Wei Li | Rohini K. Srihari
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

pdf bib
Context clustering for Word Sense Disambiguation based on modeling pairwise context similarities
Cheng Niu | Wei Li | Rohini K. Srihari | Huifeng Li | Laurie Crist
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

2003

pdf bib
InfoXtract location normalization: a hybrid approach to geographic references in information extraction
Huifeng Li | K. Rohini Srihari | Cheng Niu | Wei Li
Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References

pdf bib
Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons
Andrew McCallum | Wei Li
Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003

pdf bib
InfoXtract: A Customizable Intermediate Level Information Extraction Engine
Rohini K. Srihari | Wei Li | Cheng Niu | Thomas Cornell
Proceedings of the HLT-NAACL 2003 Workshop on Software Engineering and Architecture of Language Technology Systems (SEALTS)

pdf bib
Question Answering on a Case Insensitive Corpus
Wei Li | Rohini Srihari | Cheng Niu | Xiaoge Li
Proceedings of the ACL 2003 Workshop on Multilingual Summarization and Question Answering

pdf bib
A Bootstrapping Approach to Named Entity Classification Using Successive Learners
Cheng Niu | Wei Li | Jihong Ding | Rohini Srihari
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

pdf bib
An Expert Lexicon Approach to Identifying English Phrasal Verbs
Wei Li | Xiuhong Zhang | Cheng Niu | Yuankai Jiang | Rohini K. Srihari
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

pdf bib
Bootstrapping for Named Entity Tagging Using Concept-based Seeds
Cheng Niu | Wei Li | Jihong Ding | Rohini K. Srihari
Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers

2002

pdf bib
Location Normalization for Information Extraction
Huifeng Li | Rohini K. Srihari | Cheng Niu | Wei Li
COLING 2002: The 19th International Conference on Computational Linguistics

pdf bib
Extracting Exact Answers to Questions Based on Structural Links
Wei Li | Rohini K. Srihari | Xiaoge Li | M. Srikanth | Xiuhong Zhang | Cheng Niu
COLING-02: Multilingual Summarization and Question Answering

2000

pdf bib
A Question Answering System Supported by Information Extraction
Rohini Srihari | Wei Li
Sixth Applied Natural Language Processing Conference

1989

pdf bib
JFY-IV machine translation system
Zho Liu | Aiping Fu | Wei Li
Proceedings of Machine Translation Summit II

Search