Wei Li


2024

pdf bib
Interpretable Composition Attribution Enhancement for Visio-linguistic Compositional Understanding
Wei Li | Zhen Huang | Xinmei Tian | Le Lu | Houqiang Li | Xu Shen | Jieping Ye
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Contrastively trained vision-language models such as CLIP have achieved remarkable progress in vision and language representation learning. Despite the promising progress, their proficiency in compositional reasoning over attributes and relations (e.g., distinguishing between “the car is underneath the person” and “the person is underneath the car”) remains notably inadequate. We investigate the cause for this deficient behavior is the composition attribution issue, where the attribution scores (e.g., attention scores or GradCAM scores) for relations (e.g., underneath) or attributes (e.g., red) in the text are substantially lower than those for object terms. In this work, we show such issue is mitigated via a novel framework called CAE (Composition Attribution Enhancement). This generic framework incorporates various interpretable attribution methods to encourage the model to pay greater attention to composition words denoting relationships and attributes within the text. Detailed analysis shows that our approach enables the models to adjust and rectify the attribution of the texts. Extensive experiments across seven benchmarks reveal that our framework significantly enhances the ability to discern intricate details and construct more sophisticated interpretations of combined visual and linguistic elements.

pdf bib
InstructEval: Instruction-Tuned Text Evaluator from Human Preference
Wenhao Wu | Wei Li | Xinyan Xiao | Jiachen Liu | Sujian Li
Findings of the Association for Computational Linguistics: ACL 2024

This paper explores to construct a general text evaluator based on open-source Large Language Models (LLMs), a domain predominantly occupied by commercial counterparts such as GPT-4. Recognizing the limitations of open-source models like Llama in evaluative tasks, we introduce InstructEval, a general multi-aspect text evaluator developed through instruction tuning of open-source LLMs. To overcome the shortage of annotated resources for multi-aspect evaluations, InstructEval combines extensive open Human Preference Modeling (HPM) datasets with a small set of multi-aspect annotated data.This approach not only enhances effectiveness in overall evaluation tasks but also exhibits improved performance in multi-aspect evaluation tasks.As demonstrated by our extensive experiments, InstructEval achieves comparable or superior performance to commercial LLMs like ChatGPT or GPT-4 in terms of both overall and multi-aspect evaluation.

pdf bib
MM-ChatAlign: A Novel Multimodal Reasoning Framework based on Large Language Models for Entity Alignment
Xuhui Jiang | Yinghan Shen | Zhichao Shi | Chengjin Xu | Wei Li | Huang Zihe | Jian Guo | Yuanzhuo Wang
Findings of the Association for Computational Linguistics: EMNLP 2024

Multimodal entity alignment (MMEA) integrates multi-source and cross-modal knowledge graphs, a crucial yet challenging task for data-centric applications.Traditional MMEA methods derive the visual embeddings of entities and combine them with other modal data for alignment by embedding similarity comparison.However, these methods are hampered by the limited comprehension of visual attributes and deficiencies in realizing and bridging the semantics of multimodal data. To address these challenges, we propose MM-ChatAlign, a novel framework that utilizes the visual reasoning abilities of MLLMs for MMEA.The framework features an embedding-based candidate collection module that adapts to various knowledge representation strategies, effectively filtering out irrelevant reasoning candidates. Additionally, a reasoning and rethinking module, powered by MLLMs, enhances alignment by efficiently utilizing multimodal information.Extensive experiments on four MMEA datasets demonstrate MM-ChatAlign’s superiority and underscore the significant potential of MLLMs in MMEA tasks.The source code is available at https://github.com/jxh4945777/MMEA/.

pdf bib
Enhancing Discourse Dependency Parsing with Sentence Dependency Parsing: A Unified Generative Method Based on Code Representation
Zizhuo Shen | Yanqiu Shao | Wei Li
Findings of the Association for Computational Linguistics: EMNLP 2024

Due to the high complexity of Discourse Dependency Parsing (DDP) tasks, their existing annotation resources are relatively scarce compared to other NLP tasks, and different DDP tasks also have significant differences in annotation schema. These issues have led to the dilemma of low resources for DDP tasks. Thanks to the powerful capabilities of Large Language Models (LLMs) in cross-task learning, we can use LLMs to model dependency parsing under different annotation schema in an unified manner, in order to alleviate the dilemma of low resources for DDP tasks. However, enabling LLMs to deeply comprehend dependency parsing tasks is a challenge that remains underexplored. Inspired by the application of code-based methods in complex tasks, we propose a code-based unified dependency parsing method. We treat the process of dependency parsing as a search process of dependency paths and use code to represent this search process. Furthermore, we use a curriculum-learning based instruction tuning strategy for joint training of multiple dependency parsing tasks. The experimental results show that our proposed code-based DDP system has achieved good performance on two Chinese DDP tasks (especially significant improvement on the DDP task with relatively less training data).

pdf bib
UINav: A Practical Approach to Train On-Device Automation Agents
Wei Li | Fu-Lin Hsu | William Bishop | Folawiyo Campbell-Ajala | Max Lin | Oriana Riva
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track)

Automation systems that can autonomously drive application user interfaces to complete user tasks are of great benefit, especially when users are situationally or permanently impaired. Prior automation systems do not produce generalizable models while AI-based automation agents work reliably only in simple, hand-crafted applications or incur high computation costs. We propose UINav, a demonstration-based approach to train automation agents that fit mobile devices, yet achieving high success rates with modest numbers of demonstrations. To reduce the demonstration overhead, UINav uses a referee model that provides users with immediate feedback on tasks where the agent fails, and automatically augments human demonstrations to increase diversity in training data. Our evaluation shows that with only 10 demonstrations can achieve 70% accuracy, and that with enough demonstrations it can surpass 90% accuracy.

pdf bib
Detection-Correction Structure via General Language Model for Grammatical Error Correction
Wei Li | Houfeng Wang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Grammatical error correction (GEC) is a task dedicated to rectifying texts with minimal edits, which can be decoupled into two components: detection and correction. However, previous works have predominantly focused on direct correction, with no prior efforts to integrate both into a single model. Moreover, the exploration of the detection-correction paradigm by large language models (LLMs) remains underdeveloped. This paper introduces an integrated detection-correction structure, named DeCoGLM, based on the General Language Model (GLM). The detection phase employs a fault-tolerant detection template, while the correction phase leverages autoregressive mask infilling for localized error correction. Through the strategic organization of input tokens and modification of attention masks, we facilitate multi-task learning within a single model. Our model demonstrates competitive performance against the state-of-the-art models on English and Chinese GEC datasets. Further experiments present the effectiveness of the detection-correction structure in LLMs, suggesting a promising direction for GEC.

pdf bib
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion
Wei Li | Xue Xu | Jiachen Liu | Xinyan Xiao
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Existing text-to-image diffusion models primarily generate images from text prompts. However, the inherent conciseness of textual descriptions poses challenges in faithfully synthesizing images with intricate details, such as specific entities or scenes. This paper presents UNIMO-G, a simple multimodal conditional diffusion framework that operates on multimodal prompts with interleaved textual and visual inputs, which demonstrates a unified ability for both text-driven and subject-driven image generation. UNIMO-G comprises two core components: a Multimodal Large Language Model (MLLM) for encoding multimodal prompts, and a conditional denoising diffusion network for generating images based on the encoded multimodal input. We leverage a two-stage training strategy to effectively train the framework: firstly pre-training on large-scale text-image pairs to develop conditional image generation capabilities, and then instruction tuning with multimodal prompts to achieve unified image generation proficiency. A well-designed data processing pipeline involving language grounding and image segmentation is employed to construct multi-modal prompts. UNIMO-G excels in both text-to-image generation and zero-shot subject-driven synthesis, and is notably effective in generating high-fidelity images from complex multimodal prompts involving multiple image entities.

pdf bib
Unlocking the Power of Large Language Models for Entity Alignment
Xuhui Jiang | Yinghan Shen | Zhichao Shi | Chengjin Xu | Wei Li | Zixuan Li | Jian Guo | Huawei Shen | Yuanzhuo Wang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Entity Alignment (EA) is vital for integrating diverse knowledge graph (KG) data, playing a crucial role in data-driven AI applications. Traditional EA methods primarily rely on comparing entity embeddings, but their effectiveness is constrained by the limited input KG data and the capabilities of the representation learning techniques. Against this backdrop, we introduce ChatEA, an innovative framework that incorporates large language models (LLMs) to improve EA. To address the constraints of limited input KG data, ChatEA introduces a KG-code translation module that translates KG structures into a format understandable by LLMs, thereby allowing LLMs to utilize their extensive background knowledge to improve EA accuracy. To overcome the over-reliance on entity embedding comparisons, ChatEA implements a two-stage EA strategy that capitalizes on LLMs’ capability for multi-step reasoning in a dialogue format, thereby enhancing accuracy while preserving efficiency. Our experimental results affirm ChatEA’s superior performance, highlighting LLMs’ potential in facilitating EA tasks.The source code is available at https://anonymous.4open.science/r/ChatEA/.

pdf bib
KnowCoder: Coding Structured Knowledge into LLMs for Universal Information Extraction
Zixuan Li | Yutao Zeng | Yuxin Zuo | Weicheng Ren | Wenxuan Liu | Miao Su | Yucan Guo | Yantao Liu | Lixiang Lixiang | Zhilei Hu | Long Bai | Wei Li | Yidan Liu | Pan Yang | Xiaolong Jin | Jiafeng Guo | Xueqi Cheng
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations
Jiaxing Sun | Weiquan Huang | Jiang Wu | Chenya Gu | Wei Li | Songyang Zhang | Hang Yan | Conghui He
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We introduce CHARM, the first benchmark for comprehensively and in-depth evaluating the commonsense reasoning ability of large language models (LLMs) in Chinese, which covers both globally known and Chinese-specific commonsense. We evaluated 7 English and 12 Chinese-oriented LLMs on CHARM, employing 5 representative prompt strategies for improving LLMs’ reasoning ability, such as Chain-of-Thought. Our findings indicated that the LLM’s language orientation and the task’s domain influence the effectiveness of the prompt strategy, which enriches previous research findings. We built closely-interconnected reasoning and memorization tasks, and found that some LLMs struggle with memorizing Chinese commonsense, affecting their reasoning ability, while others show differences in reasoning despite similar memorization performance. We also evaluated the LLMs’ memorization-independent reasoning abilities and analyzed the typical errors. Our study precisely identified the LLMs’ strengths and weaknesses, providing the clear direction for optimization. It can also serve as a reference for studies in other fields. We will release CHARM at https://github.com/opendatalab/CHARM.

pdf bib
An Unsupervised Framework for Adaptive Context-aware Simplified-Traditional Chinese Character Conversion
Wei Li | Shutan Huang | Yanqiu Shao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Traditional Chinese character is an important carrier of Chinese culture, and is still actively used in many areas. Automatic conversion between traditional and simplified Chinese characters can help modern people understand traditional culture and facilitate communication among different regions. Previous conversion methods rely on rule-based mapping or shallow feature-based machine learning models, which struggle to convert simplified characters with different origins and constructing training data is costly. In this study, we propose an unsupervised adaptive context-aware conversion model that learns to convert between simplified and traditional Chinese characters under a denoising auto-encoder framework requiring no labeled data. Our model includes a Latent Generative Adversarial Encoder that transforms vectors to a latent space with generative adversarial network, which adds noise as an inevitable side effect, Based on which a Context-aware Semantic Reconstruction Decoder restores the original input while considering a broader range of context with a pretrained language model. Additionally, we propose to apply early exit mechanism during inference to reduce the computation complexity and improve the generalization ability. To test the effectiveness of our model, we construct a high quality test dataset with simplified-traditional Chinese character text pairs. Experiment results and extensive analysis demonstrate that our model outperforms strong unsupervised baselines and yields better conversion result for one-to-many cases.

pdf bib
Hypergraph-Based Session Modeling: A Multi-Collaborative Self-Supervised Approach for Enhanced Recommender Systems
Xiangping Zheng | Bo Wu | Alex X. Zhang | Wei Li
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Session-based recommendation (SBR) is a challenging task that involves predicting a user’s next item click based on their recent session history. Presently, many state-of-the-art methodologies employ graph neural networks to model item transitions. Notwithstanding their impressive performance, graph-based models encounter significant challenges when confronted with intricate session dependencies and data sparsity in real-world scenarios, ultimately constraining their capacity to enhance recommendation accuracy. In recognition of these challenges, we introduce an innovative methodology known as ‘Mssen,’ which stands for Multi-collaborative self-supervised learning in hypergraph neural networks. Mssen is meticulously crafted to adeptly discern user intent. Our approach initiates by representing session-based data as a hypergraph, adeptly capturing intricate, high-order relationships. Subsequently, we employ self-supervised learning on item-session hypergraphs to mitigate the challenges of data sparsity, all without necessitating manual fine-tuning, extensive search, or domain-specific expertise in augmentation selection. Comprehensive experimental analyses conducted across multiple datasets consistently underscore the superior performance of our approach when compared to existing methodologies.

pdf bib
Improving Robustness of GNN-based Anomaly Detection by Graph Adversarial Training
Xiangping Zheng | Bo Wu | Alex X. Zhang | Wei Li
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Graph neural networks (GNNs) play a fundamental role in anomaly detection, excelling at the identification of node anomalies by aggregating information from neighboring nodes. Nonetheless, they exhibit vulnerability to attacks, with even minor alterations in the graph structure or node attributes resulting in substantial performance degradation. To address this critical challenge, we introduce an innovative mechanism for graph adversarial training, meticulously designed to bolster GNN-based anomaly detection systems against potential poisoning attacks. This novel approach follows a two-step framework. (1) In the initial phase, we employ a Multiple-Objective Generative Adversarial Attack (MO-GAA), which focuses on generating feature modifications and inducing structural disruptions within the graph. Its primary objective is to mimic the adversarial behavior of potential attackers on the anomaly detection graph, with the explicit intention of confounding the anomaly detector. (2) In the subsequent stage, we introduce Purification-Based Adversarial Attack Defense (PB-AAD), a method specifically designed to rectify any contamination and restore the integrity of the graph. The central aim of PB-AAD is to counteract the destructive actions carried out by potential attackers. Our empirical findings, derived from extensive experiments conducted on four real-world anomaly detection datasets, serve to demonstrate how MO-GAA systematically disrupts the graph, compromising the effectiveness of GNN-based detectors, while PB-AAD effectively mitigates these adversarial actions, thereby enhancing the overall robustness of GNN-based anomaly detectors.

2023

pdf bib
WeCheck: Strong Factual Consistency Checker via Weakly Supervised Learning
Wenhao Wu | Wei Li | Xinyan Xiao | Jiachen Liu | Sujian Li | Yajuan Lyu
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

A crucial issue of current text generation models is that they often uncontrollably generate text that is factually inconsistent with inputs. Due to lack of annotated data, existing factual consistency metrics usually train evaluation models on synthetic texts or directly transfer from other related tasks, such as question answering (QA) and natural language inference (NLI).Bias in synthetic text or upstream tasks makes them perform poorly on text actually generated by language models, especially for general evaluation for various tasks. To alleviate this problem, we propose a weakly supervised framework named WeCheck that is directly trained on actual generated samples from language models with weakly annotated labels.WeCheck first utilizes a generative model to infer the factual labels of generated samples by aggregating weak labels from multiple resources.Next, we train a simple noise-aware classification model as the target metric using the inferred weakly supervised information.Comprehensive experiments on various tasks demonstrate the strong performance of WeCheck, achieving an average absolute improvement of 3.3% on the TRUE benchmark over 11B state-of-the-art methods using only 435M parameters.Furthermore, it is up to 30 times faster than previous evaluation methods, greatly improving the accuracy and efficiency of factual consistency evaluation.

pdf bib
PAED: Zero-Shot Persona Attribute Extraction in Dialogues
Luyao Zhu | Wei Li | Rui Mao | Vlad Pandelea | Erik Cambria
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Persona attribute extraction is critical for personalized human-computer interaction. Dialogue is an important medium that communicates and delivers persona information. Although there is a public dataset for triplet-based persona attribute extraction from conversations, its automatically generated labels present many issues, including unspecific relations and inconsistent annotations. We fix such issues by leveraging more reliable text-label matching criteria to generate high-quality data for persona attribute extraction. We also propose a contrastive learning- and generation-based model with a novel hard negative sampling strategy for generalized zero-shot persona attribute extraction. We benchmark our model with state-of-the-art baselines on our dataset and a public dataset, showing outstanding accuracy gains. Our sampling strategy also exceeds others by a large margin in persona attribute extraction.

pdf bib
Task-Aware Self-Supervised Framework for Dialogue Discourse Parsing
Wei Li | Luyao Zhu | Wei Shao | Zonglin Yang | Erik Cambria
Findings of the Association for Computational Linguistics: EMNLP 2023

Dialogue discourse parsing is a fundamental natural language processing task. It can benefit a series of conversation-related downstream tasks including dialogue summarization and emotion recognition in conversations. However, existing parsing approaches are constrained by predefined relation types, which can impede the adaptability of the parser for downstream tasks. To this end, we propose to introduce a task-aware paradigm to improve the versatility of the parser in this paper. Moreover, to alleviate error propagation and learning bias, we design a graph-based discourse parsing model termed DialogDP. Building upon the symmetrical property of matrix-embedded parsing graphs, we have developed an innovative self-supervised mechanism that leverages both bottom-up and top-down parsing strategies. This approach allows the parsing graphs to mutually regularize and enhance each other. Empirical studies on dialogue discourse parsing datasets and a downstream task demonstrate the effectiveness and flexibility of our framework.

pdf bib
Video-Helpful Multimodal Machine Translation
Yihang Li | Shuichiro Shimizu | Chenhui Chu | Sadao Kurohashi | Wei Li
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Existing multimodal machine translation (MMT) datasets consist of images and video captions or instructional video subtitles, which rarely contain linguistic ambiguity, making visual information ineffective in generating appropriate translations. Recent work has constructed an ambiguous subtitles dataset to alleviate this problem but is still limited to the problem that videos do not necessarily contribute to disambiguation. We introduce EVA (Extensive training set and Video-helpful evaluation set for Ambiguous subtitles translation), an MMT dataset containing 852k Japanese-English parallel subtitle pairs, 520k Chinese-English parallel subtitle pairs, and corresponding video clips collected from movies and TV episodes. In addition to the extensive training set, EVA contains a video-helpful evaluation set in which subtitles are ambiguous, and videos are guaranteed helpful for disambiguation. Furthermore, we propose SAFA, an MMT model based on the Selective Attention model with two novel methods: Frame attention loss and Ambiguity augmentation, aiming to use videos in EVA for disambiguation fully. Experiments on EVA show that visual information and the proposed methods can boost translation performance, and our model performs significantly better than existing MMT models.

pdf bib
CCL23-Eval 任务3系统报告:基于多任务pipeline策略的汉语框架语义解析(System Report for CCL23-Eval Task 3: Chinese Frame Semantic Parsing Based on Multi task Pipeline Strategy)
Shutan Huang (黄舒坦) | Qiuyan Shao (邵艳秋) | Wei Li (李炜)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)

“本论文为2023届CCL汉语框架语义解析评测任务提供了实现方法。针对汉语框架语义解析任务是多任务的特点,考虑到各子任务之间具有较强的时序性和关联性,方法采用了多任务pipeline策略的框架结构,主要由框架分类,论元识别,角色分类三个子模块组成,分别对应框架识别,论元范围识别,论元角色识别三个子任务。本文将框架识别和论元角色识别任务建模为文本分类任务,将论元范围识别任务建模为实体识别任务。考虑到各子任务之间具有较强的时序性和关联性,方法在每个模块均充分考虑了如何利用完成其他子任务时所抽取到的特征和信息。比如在进行角色分类时,利用了框架分类模块识别出的框架类别,以及论元识别模块识别出的论元范围。考虑到目标词及其上下文语境的重要性,本文使用预训练语言模型进行finetune。观察到模型的表现不稳定,训练时使用了对抗训练等策略提升模型性能。最终A榜分数值达到71.91,B榜分数值达到70.60,排名第2,验证了本文方法的有效性。”

pdf bib
CCL23-Eval 任务5总结报告:跨领域句子级别中文省略消解(Overview of CCL23-Eval Task 5: Sentence Level Multi-domain Chinese Ellipsis Resolution)
Wei Li (李炜) | Qiuyan Shao (邵艳秋) | Jialu Qi (祁佳璐)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)

“省略是一种会出现在包括中文在内的各种语言中的一种语言现象。虽然人类一般能够正确理解带有省略的文本,但是其对机器在句法、语义等方面的理解却会造成影响。因此自动恢复省略成分对文本自动分析理解具有重要意义。本任务提出一个面向应用的省略恢复任务,旨在恢复在句子句法结构中占据有效位置同时在句子中扮演语义成分的被省略内容。本任务将省略恢复任务划分成两个子任务:省略位置探测和省略内容生成,并分别描述在两个子任务中取得较好结果的基线方法。此外,为了推进对大语言模型的研究,本文还尝试使用场景学习的方法使用ChatGPT来完成本任务,并进行了相关分析。”

pdf bib
Multiloop Incremental Bootstrapping for Low-Resource Machine Translation
Wuying Liu | Wei Li | Lin Wang
Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track

Due to the scarcity of high-quality bilingual sentence pairs, some deep-learning-based machine translation algorithms cannot achieve better performance in low-resource machine translation. On this basis, we are committed to integrating the ideas of machine learning algorithm improvement and data augmentation, propose a novel multiloop incremental bootstrapping framework, and design the corresponding semi-supervised learning algorithm. This framework is a meta-frame independent of specific machine translation algorithms. This algorithm makes full use of bilingual seed data of appropriate scale and super-large-scale monolingual data to expand bilingual sentence pair data incrementally, and trains machine translation models step by step to improve the translation quality. The experimental results of neural machine translation on multiple language pairs prove that our proposed framework can make use of continuous monolingual data to raise itself. Its effectiveness is not only reflected in the easy implementation of state-of-the-art low-resource machine translation, but also in the practical option to quickly establish precise domain machine translation systems.

2022

pdf bib
Complex Evolutional Pattern Learning for Temporal Knowledge Graph Reasoning
Zixuan Li | Saiping Guan | Xiaolong Jin | Weihua Peng | Yajuan Lyu | Yong Zhu | Long Bai | Wei Li | Jiafeng Guo | Xueqi Cheng
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

A Temporal Knowledge Graph (TKG) is a sequence of KGs corresponding to different timestamps. TKG reasoning aims to predict potential facts in the future given the historical KG sequences. One key of this task is to mine and understand evolutional patterns of facts from these sequences. The evolutional patterns are complex in two aspects, length-diversity and time-variability. Existing models for TKG reasoning focus on modeling fact sequences of a fixed length, which cannot discover complex evolutional patterns that vary in length. Furthermore, these models are all trained offline, which cannot well adapt to the changes of evolutional patterns from then on. Thus, we propose a new model, called Complex Evolutional Network (CEN), which uses a length-aware Convolutional Neural Network (CNN) to handle evolutional patterns of different lengths via an easy-to-difficult curriculum learning strategy. Besides, we propose to learn the model under the online setting so that it can adapt to the changes of evolutional patterns over time. Extensive experiments demonstrate that CEN obtains substantial performance improvement under both the traditional offline and the proposed online settings.

pdf bib
Precisely the Point: Adversarial Augmentations for Faithful and Informative Text Generation
Wenhao Wu | Wei Li | Jiachen Liu | Xinyan Xiao | Sujian Li | Yajuan Lyu
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Though model robustness has been extensively studied in language understanding, the robustness of Seq2Seq generation remains understudied.In this paper, we conduct the first quantitative analysis on the robustness of pre-trained Seq2Seq models. We find that even current SOTA pre-trained Seq2Seq model (BART) is still vulnerable, which leads to significant degeneration in faithfulness and informativeness for text generation tasks.This motivated us to further propose a novel adversarial augmentation framework, namely AdvSeq, for generally improving faithfulness and informativeness of Seq2Seq models via enhancing their robustness. AdvSeq automatically constructs two types of adversarial augmentations during training, including implicit adversarial samples by perturbing word representations and explicit adversarial samples by word swapping, both of which effectively improve Seq2Seq robustness.Extensive experiments on three popular text generation tasks demonstrate that AdvSeq significantly improves both the faithfulness and informativeness of Seq2Seq generation under both automatic and human evaluation settings.

pdf bib
UNIMO-2: End-to-End Unified Vision-Language Grounded Learning
Wei Li | Can Gao | Guocheng Niu | Xinyan Xiao | Hao Liu | Jiachen Liu | Hua Wu | Haifeng Wang
Findings of the Association for Computational Linguistics: ACL 2022

Vision-Language Pre-training (VLP) has achieved impressive performance on various cross-modal downstream tasks. However, most existing methods can only learn from aligned image-caption data and rely heavily on expensive regional features, which greatly limits their scalability and performance. In this paper, we propose an end-to-end unified-modal pre-training framework, namely UNIMO-2, for joint learning on both aligned image-caption data and unaligned image-only and text-only corpus. We build a unified Transformer model to jointly learn visual representations, textual representations and semantic alignment between images and texts. In particular, we propose to conduct grounded learning on both images and texts via a sharing grounded space, which helps bridge unaligned images and texts, and align the visual and textual semantic spaces on different types of corpora. The experiments show that our grounded learning method can improve textual and visual semantic alignment for improving performance on various cross-modal tasks. Moreover, benefiting from effective joint modeling of different types of corpora, our model also achieves impressive performance on single-modal visual and textual tasks. Our code and models are public at the UNIMO project page https://unimo-ptm.github.io/.

pdf bib
Unsupervised Chinese Word Segmentation with BERT Oriented Probing and Transformation
Wei Li | Yuhan Song | Qi Su | Yanqiu Shao
Findings of the Association for Computational Linguistics: ACL 2022

Word Segmentation is a fundamental step for understanding Chinese language. Previous neural approaches for unsupervised Chinese Word Segmentation (CWS) only exploits shallow semantic information, which can miss important context. Large scale Pre-trained language models (PLM) have achieved great success in many areas because of its ability to capture the deep contextual semantic relation. In this paper, we propose to take advantage of the deep semantic information embedded in PLM (e.g., BERT) with a self-training manner, which iteratively probes and transforms the semantic information in PLM into explicit word segmentation ability. Extensive experiment results show that our proposed approach achieves state-of-the-art F1 score on two CWS benchmark datasets.

pdf bib
Explore More Guidance: A Task-aware Instruction Network for Sign Language Translation Enhanced with Data Augmentation
Yong Cao | Wei Li | Xianzhi Li | Min Chen | Guangyong Chen | Long Hu | Zhengdao Li | Kai Hwang
Findings of the Association for Computational Linguistics: NAACL 2022

Sign language recognition and translation first uses a recognition module to generate glosses from sign language videos and then employs a translation module to translate glosses into spoken sentences. Most existing works focus on the recognition step, while paying less attention to sign language translation. In this work, we propose a task-aware instruction network, namely TIN-SLT, for sign language translation, by introducing the isntruction module and the learning-based feature fuse strategy into a Transformer network. In this way, the pre-trained model’s language ability can be well explored and utilized to further boost the translation performance. Moreover, by exploring the representation space of sign language glosses and target spoken language, we propose a multi-level data augmentation scheme to adjust the data distribution of the training set. We conduct extensive experiments on two challenging benchmark datasets, PHOENIX-2014-T and ASLG-PC12, on which our method outperforms former best solutions by 1.65 and 1.42 in terms of BLEU-4. Our code and trained networks will be available upon the publication of this work.

pdf bib
FRSUM: Towards Faithful Abstractive Summarization via Enhancing Factual Robustness
Wenhao Wu | Wei Li | Jiachen Liu | Xinyan Xiao | Ziqiang Cao | Sujian Li | Hua Wu
Findings of the Association for Computational Linguistics: EMNLP 2022

Despite being able to generate fluent and grammatical text, current Seq2Seq summarization models still suffering from the unfaithful generation problem.In this paper, we study the faithfulness of existing systems from a new perspective of factual robustness which is the ability to correctly generate factual information over adversarial unfaithful information.We first measure a model’sfactual robustness by its success rate to defend against adversarial attacks when generating factual information.The factual robustness analysis on a wide range of current systems shows its good consistency with human judgments on faithfulness.Inspired by these findings, we propose to improve the faithfulness of a model by enhancing its factual robustness.Specifically, we propose a novel training strategy, namely FRSUM, which teaches the model to defend against both explicit adversarial samples and implicit factual adversarial perturbations.Extensive automatic and human evaluation results show that FRSUM consistently improves the faithfulness of various Seq2Seq models, such as T5, BART.

pdf bib
HiSMatch: Historical Structure Matching based Temporal Knowledge Graph Reasoning
Zixuan Li | Zhongni Hou | Saiping Guan | Xiaolong Jin | Weihua Peng | Long Bai | Yajuan Lyu | Wei Li | Jiafeng Guo | Xueqi Cheng
Findings of the Association for Computational Linguistics: EMNLP 2022

A Temporal Knowledge Graph (TKG) is a sequence of KGs with respective timestamps, which adopts quadruples in the form of (subject, relation, object, timestamp) to describe dynamic facts. TKG reasoning has facilitated many real-world applications via answering such queries as (query entity, query relation, ?, future timestamp) about future. This is actually a matching task between a query and candidate entities based on their historical structures, which reflect behavioral trends of the entities at different timestamps. In addition, recent KGs provide background knowledge of all the entities, which is also helpful for the matching. Thus, in this paper, we propose the Historical Structure Matching (HiSMatch) model. It applies two structure encoders to capture the semantic information contained in the historical structures of the query and candidate entities. Besides, it adopts another encoder to integrate the background knowledge into the model. TKG reasoning experiments on six benchmark datasets demonstrate the significant improvement of the proposed HiSMatch model, with up to 5.6% performance improvement in MRR, compared to the state-of-the-art baselines.

pdf bib
No Stock is an Island: Learning Internal and Relational Attributes of Stocks with Contrastive Learning
Shicheng Li | Wei Li | Zhiyuan Zhang | Ruihan Bao | Keiko Harimoto | Xu Sun
Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)

Previous work has demonstrated the viability of applying deep learning techniques in the financial area. Recently, the task of stock embedding learning has been drawing attention from the research community, which aims to represent the characteristics of stocks with distributed vectors that can be used in various financial analysis scenarios. Existing approaches for learning stock embeddings either require expert knowledge, or mainly focus on the textual part of information corresponding to individual temporal movements. In this paper, we propose to model stock properties as the combination of internal attributes and relational attributes, which takes into consideration both the time-invariant properties of individual stocks and their movement patterns in relation to the market. To learn the two types of attributes from financial news and transaction data, we design several training objectives based on contrastive learning to extract and separate the long-term and temporary information in the data that are able to counter the inherent randomness of the stock market. Experiments and further analyses on portfolio optimization reveal the effectiveness of our method in extracting comprehensive stock information from various data sources.

pdf bib
《二十四史》古代汉语语义依存图库构建(Construction of Semantic Dependency Graph Bank of Ancient Chinese in twenty four histories)
Tian Huang (黄恬) | Yanqiu Shao (邵艳秋) | Wei Li (李炜)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“语义依存图是NLP处理语义的深层分析方法,能够对句子中词与词之间的语义进行分析。该文针对古代汉语特点,在制定古代汉语语义依存图标注规范的基础上,以《二十四史》为语料来源,完成标注了规模为3000句的古代汉语语义依存图库,标注一致性的kappa值为78.83%。通过与现代汉语语义依存图库的对比,对依存图库基本情况进行统计,分析古代汉语的语义特色和规律。统计显示,古代汉语语义分布宏观上符合齐普夫定律,在语义事件描述上具有强烈的历史性叙事和正式文体特征,如以人物纪传为中心,时间、地点等周边角色描述细致,叙事语言冷静客观,缺少描述情态、语气、程度、时间状态等的修饰词语等。 "

pdf bib
针对古代经典文献的引用查找问题的数据构建与匹配方法(Data Construction and Matching Method for the Task of Ancient Classics Reference Detection)
Wei Li (李炜) | Yanqiu Shao (邵艳秋) | Mengxi Bi (毕梦曦)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“中国古代思想家的思想建构往往建立在对更早期经典的创造性诠释中,将这些诠释中包含的引用查找出来对思想史研究意义重大。但一些体量较大的文献如果完全依靠手工标记引用将耗费大量时间与人力成本,因此找到一种自动化的方法辅助专家进行引用标记查找非常重要。以预训练语言模型为代表的自然语言处理技术的发展提升了计算机对于文本处理和语义理解的能力。据此,本文提出多种利用专家知识或深度学习语义理解能力的无监督基线方法来自动查找古代思想家著作中对早期经典的引用。为了验证本文提出的方法的效果并推动自然语言处理技术在数字人文领域的应用,本文以宋代具有重大影响力的理学家二程(程颢、程颐)对早期儒家经典的引用为例进行研究,并构建和发布相应的引用查找数据集1。实验结果表明本文提出的基于预训练语言模型和对比学习目标的复合方法可以较为准确地判断是否存在引用关系。基于短句的引用探测ROC-AUC值达到了87.83,基于段落的引用探测ROC-AUC值达到了91.02。进一步的分析表明本文的方法不仅有利于自动化找到引用关系,更能够有效帮助专家提高引用查找判断效率。本方法在注释整理、文本溯源、重出文献查找、引用统计分析、索引文献集制作等方面具有广阔的应用前景。”

pdf bib
基于强化学习的古今汉语句子对齐研究(Research on Sentence Alignment of Ancient and Modern Chinese based on Reinforcement Learning)
Kuai Yu (喻快) | Yanqiu Shao (邵艳秋) | Wei Li (李炜)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“基于深度学习的有监督机器翻译取得了良好的效果,但训练过程中需要大量质量较高的对齐语料。对于中文古今翻译场景,高质量的平行语料并不多,而粗对齐的篇章、段语料比较容易获得,因此语料对齐很有研究价值和研究必要。在传统双语平行语料的句子对齐研究中,传统方法根据双语文本中的长度、词汇、共现文字等语法信息,建立一个综合评判标准来衡量两个句对之间相似度。此类方法虽然在单句对齐上取得了较好的效果,但是对于句子语义匹配的能力有限,并且在一些多对多的对齐模式上的性能表现不佳。在本文中我们提出尝试利用现在发展迅速且具有强大语义表示能力的预训练语言模型来考虑双语的语义信息,但是单独使用预训练语言模型只能考虑相对局部的信息,因此我们提出采用基于动态规划算法的强化学习训练目标来整合段落全局信息,并且进行无监督训练。实验结果证明我们提出的方法训练得到的模型性能优于此前获得最好表现的基线模型,尤其相较于传统模型难以处理的多对多对齐模式下,性能提升较大。”

pdf bib
Meta-CQG: A Meta-Learning Framework for Complex Question Generation over Knowledge Bases
Kun Zhang | Yunqi Qiu | Yuanzhuo Wang | Long Bai | Wei Li | Xuhui Jiang | Huawei Shen | Xueqi Cheng
Proceedings of the 29th International Conference on Computational Linguistics

Complex question generation over knowledge bases (KB) aims to generate natural language questions involving multiple KB relations or functional constraints. Existing methods train one encoder-decoder-based model to fit all questions. However, such a one-size-fits-all strategy may not perform well since complex questions exhibit an uneven distribution in many dimensions, such as question types, involved KB relations, and query structures, resulting in insufficient learning for long-tailed samples under different dimensions. To address this problem, we propose a meta-learning framework for complex question generation. The meta-trained generator can acquire universal and transferable meta-knowledge and quickly adapt to long-tailed samples through a few most related training samples. To retrieve similar samples for each input query, we design a self-supervised graph retriever to learn distributed representations for samples, and contrastive learning is leveraged to improve the learned representations. We conduct experiments on both WebQuestionsSP and ComplexWebQuestion, and results on long-tailed samples of different dimensions have been significantly improved, which demonstrates the effectiveness of the proposed framework.

2021

pdf bib
UoB_UK at SemEval 2021 Task 2: Zero-Shot and Few-Shot Learning for Multi-lingual and Cross-lingual Word Sense Disambiguation.
Wei Li | Harish Tayyar Madabushi | Mark Lee
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper describes our submission to SemEval 2021 Task 2. We compare XLM-RoBERTa Base and Large in the few-shot and zero-shot settings and additionally test the effectiveness of using a k-nearest neighbors classifier in the few-shot setting instead of the more traditional multi-layered perceptron. Our experiments on both the multi-lingual and cross-lingual data show that XLM-RoBERTa Large, unlike the Base version, seems to be able to more effectively transfer learning in a few-shot setting and that the k-nearest neighbors classifier is indeed a more powerful classifier than a multi-layered perceptron when used in few-shot learning.

pdf bib
UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning
Wei Li | Can Gao | Guocheng Niu | Xinyan Xiao | Hao Liu | Jiachen Liu | Hua Wu | Haifeng Wang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Existed pre-training methods either focus on single-modal tasks or multi-modal tasks, and cannot effectively adapt to each other. They can only utilize single-modal data (i.e., text or image) or limited multi-modal data (i.e., image-text pairs). In this work, we propose a UNIfied-MOdal pre-training architecture, namely UNIMO, which can effectively adapt to both single-modal and multi-modal understanding and generation tasks. Large scale of free text corpus and image collections are utilized to improve the capability of visual and textual understanding, and cross-modal contrastive learning (CMCL) is leveraged to align the textual and visual information into a unified semantic space, over a corpus of image-text pairs augmented with related images and texts. With the help of rich non-paired single-modal data, our model is able to learn more generalizable representations, by allowing textual knowledge and visual knowledge to enhance each other in the unified semantic space. The experimental results show that UNIMO greatly improves the performance of several single-modal and multi-modal downstream tasks. Our code and pre-trained models are public at https://github.com/PaddlePaddle/Research/tree/master/NLP/UNIMO.

pdf bib
Search from History and Reason for Future: Two-stage Reasoning on Temporal Knowledge Graphs
Zixuan Li | Xiaolong Jin | Saiping Guan | Wei Li | Jiafeng Guo | Yuanzhuo Wang | Xueqi Cheng
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Temporal Knowledge Graphs (TKGs) have been developed and used in many different areas. Reasoning on TKGs that predicts potential facts (events) in the future brings great challenges to existing models. When facing a prediction task, human beings usually search useful historical information (i.e., clues) in their memories and then reason for future meticulously. Inspired by this mechanism, we propose CluSTeR to predict future facts in a two-stage manner, Clue Searching and Temporal Reasoning, accordingly. Specifically, at the clue searching stage, CluSTeR learns a beam search policy via reinforcement learning (RL) to induce multiple clues from historical facts. At the temporal reasoning stage, it adopts a graph convolution network based sequence method to deduce answers from clues. Experiments on four datasets demonstrate the substantial advantages of CluSTeR compared with the state-of-the-art methods. Moreover, the clues found by CluSTeR further provide interpretability for the results.

pdf bib
BASS: Boosting Abstractive Summarization with Unified Semantic Graph
Wenhao Wu | Wei Li | Xinyan Xiao | Jiachen Liu | Ziqiang Cao | Sujian Li | Hua Wu | Haifeng Wang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Abstractive summarization for long-document or multi-document remains challenging for the Seq2Seq architecture, as Seq2Seq is not good at analyzing long-distance relations in text. In this paper, we present BASS, a novel framework for Boosting Abstractive Summarization based on a unified Semantic graph, which aggregates co-referent phrases distributing across a long range of context and conveys rich relations between phrases. Further, a graph-based encoder-decoder model is proposed to improve both the document representation and summary generation process by leveraging the graph structure. Specifically, several graph augmentation methods are designed to encode both the explicit and implicit relations in the text while the graph-propagation attention mechanism is developed in the decoder to select salient content into the summary. Empirical results show that the proposed architecture brings substantial improvements for both long-document and multi-document summarization tasks.

pdf bib
SgSum:Transforming Multi-document Summarization into Sub-graph Selection
Moye Chen | Wei Li | Jiachen Liu | Xinyan Xiao | Hua Wu | Haifeng Wang
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Most of existing extractive multi-document summarization (MDS) methods score each sentence individually and extract salient sentences one by one to compose a summary, which have two main drawbacks: (1) neglecting both the intra and cross-document relations between sentences; (2) neglecting the coherence and conciseness of the whole summary. In this paper, we propose a novel MDS framework (SgSum) to formulate the MDS task as a sub-graph selection problem, in which source documents are regarded as a relation graph of sentences (e.g., similarity graph or discourse graph) and the candidate summaries are its sub-graphs. Instead of selecting salient sentences, SgSum selects a salient sub-graph from the relation graph as the summary. Comparing with traditional methods, our method has two main advantages: (1) the relations between sentences are captured by modeling both the graph structure of the whole document set and the candidate sub-graphs; (2) directly outputs an integrate summary in the form of sub-graph which is more informative and coherent. Extensive experiments on MultiNews and DUC datasets show that our proposed method brings substantial improvements over several strong baselines. Human evaluation results also demonstrate that our model can produce significantly more coherent and informative summaries compared with traditional MDS methods. Moreover, the proposed architecture has strong transfer ability from single to multi-document input, which can reduce the resource bottleneck in MDS tasks.

pdf bib
Do Transformer Modifications Transfer Across Implementations and Applications?
Sharan Narang | Hyung Won Chung | Yi Tay | Liam Fedus | Thibault Fevry | Michael Matena | Karishma Malkan | Noah Fiedel | Noam Shazeer | Zhenzhong Lan | Yanqi Zhou | Wei Li | Nan Ding | Jake Marcus | Adam Roberts | Colin Raffel
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

The research community has proposed copious modifications to the Transformer architecture since it was introduced over three years ago, relatively few of which have seen widespread adoption. In this paper, we comprehensively evaluate many of these modifications in a shared experimental setting that covers most of the common uses of the Transformer in natural language processing. Surprisingly, we find that most modifications do not meaningfully improve performance. Furthermore, most of the Transformer variants we found beneficial were either developed in the same codebase that we used or are relatively minor changes. We conjecture that performance improvements may strongly depend on implementation details and correspondingly make some recommendations for improving the generality of experimental results.

2020

pdf bib
Leveraging Graph to Improve Abstractive Multi-Document Summarization
Wei Li | Xinyan Xiao | Jiachen Liu | Hua Wu | Haifeng Wang | Junping Du
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Graphs that capture relations between textual units have great benefits for detecting salient information from multiple documents and generating overall coherent summaries. In this paper, we develop a neural abstractive multi-document summarization (MDS) model which can leverage well-known graph representations of documents such as similarity graph and discourse graph, to more effectively process multiple input documents and produce abstractive summaries. Our model utilizes graphs to encode documents in order to capture cross-document relations, which is crucial to summarizing long documents. Our model can also take advantage of graphs to guide the summary generation process, which is beneficial for generating coherent and concise summaries. Furthermore, pre-trained language models can be easily combined with our model, which further improve the summarization performance significantly. Empirical results on the WikiSum and MultiNews dataset show that the proposed architecture brings substantial improvements over several strong baselines.

pdf bib
基于统一模型的藏文新闻摘要(Abstractive Summarization of Tibetan News Based on Hybrid Model)
Xiaodong Yan (闫晓东) | Xiaoqing Xie (解晓庆) | Yu Zou (邹煜) | Wei Li (李维)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

Seq2seq神经网络模型在中英文文本摘要的研究中取得了良好的效果,但在低资源语言的文本摘要研究还处于探索阶段,尤其是在藏语中。此外,目前还没有大规模的标注语料库进行摘要提取。本文提出了一种生成藏文新闻摘要的统一模型。利用TextRank算法解决了藏语标注训练数据不足的问题。然后,采用两层双GRU神经网络提取代表原始新闻的句子,减少冗余信息。最后,使用基于注意力机制的Seq2Seq来生成理解式摘要。同时,我们加入了指针网络来处理未登录词的问题。实验结果表明,ROUGE-1评分比传统模型提高了2%。 关键词:文本摘要;藏文;TextRank; 指针网络;Bi-GRU

2019

pdf bib
Monotonic Infinite Lookback Attention for Simultaneous Machine Translation
Naveen Arivazhagan | Colin Cherry | Wolfgang Macherey | Chung-Cheng Chiu | Semih Yavuz | Ruoming Pang | Wei Li | Colin Raffel
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Simultaneous machine translation begins to translate each source sentence before the source speaker is finished speaking, with applications to live and streaming scenarios. Simultaneous systems must carefully schedule their reading of the source sentence to balance quality against latency. We present the first simultaneous translation system to learn an adaptive schedule jointly with a neural machine translation (NMT) model that attends over all source tokens read thus far. We do so by introducing Monotonic Infinite Lookback (MILk) attention, which maintains both a hard, monotonic attention head to schedule the reading of the source sentence, and a soft attention head that extends from the monotonic head back to the beginning of the source. We show that MILk’s adaptive schedule allows it to arrive at latency-quality trade-offs that are favorable to those of a recently proposed wait-k strategy for many latency values.

pdf bib
Coherent Comments Generation for Chinese Articles with a Graph-to-Sequence Model
Wei Li | Jingjing Xu | Yancheng He | ShengLi Yan | Yunfang Wu | Xu Sun
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Automatic article commenting is helpful in encouraging user engagement on online news platforms. However, the news documents are usually too long for models under traditional encoder-decoder frameworks, which often results in general and irrelevant comments. In this paper, we propose to generate comments with a graph-to-sequence model that models the input news as a topic interaction graph. By organizing the article into graph structure, our model can better understand the internal structure of the article and the connection between topics, which makes it better able to generate coherent and informative comments. We collect and release a large scale news-comment corpus from a popular Chinese online news platform Tencent Kuaibao. Extensive experiment results show that our model can generate much more coherent and informative comments compared with several strong baseline models.

2018

pdf bib
Query and Output: Generating Words by Querying Distributed Word Representations for Paraphrase Generation
Shuming Ma | Xu Sun | Wei Li | Sujian Li | Wenjie Li | Xuancheng Ren
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Most recent approaches use the sequence-to-sequence model for paraphrase generation. The existing sequence-to-sequence model tends to memorize the words and the patterns in the training dataset instead of learning the meaning of the words. Therefore, the generated sentences are often grammatically correct but semantically improper. In this work, we introduce a novel model based on the encoder-decoder framework, called Word Embedding Attention Network (WEAN). Our proposed model generates the words by querying distributed word representations (i.e. neural word embeddings), hoping to capturing the meaning of the according words. Following previous work, we evaluate our model on two paraphrase-oriented tasks, namely text simplification and short text abstractive summarization. Experimental results show that our model outperforms the sequence-to-sequence baseline by the BLEU score of 6.3 and 5.5 on two English text simplification datasets, and the ROUGE-2 F1 score of 5.7 on a Chinese summarization dataset. Moreover, our model achieves state-of-the-art performances on these three benchmark datasets.

pdf bib
Automatic Academic Paper Rating Based on Modularized Hierarchical Convolutional Neural Network
Pengcheng Yang | Xu Sun | Wei Li | Shuming Ma
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

As more and more academic papers are being submitted to conferences and journals, evaluating all these papers by professionals is time-consuming and can cause inequality due to the personal factors of the reviewers. In this paper, in order to assist professionals in evaluating academic papers, we propose a novel task: automatic academic paper rating (AAPR), which automatically determine whether to accept academic papers. We build a new dataset for this task and propose a novel modularized hierarchical convolutional neural network to achieve automatic academic paper rating. Evaluation results show that the proposed model outperforms the baselines by a large margin. The dataset and code are available at https://github.com/lancopku/AAPR

pdf bib
SGM: Sequence Generation Model for Multi-label Classification
Pengcheng Yang | Xu Sun | Wei Li | Shuming Ma | Wei Wu | Houfeng Wang
Proceedings of the 27th International Conference on Computational Linguistics

Multi-label classification is an important yet challenging task in natural language processing. It is more complex than single-label classification in that the labels tend to be correlated. Existing methods tend to ignore the correlations between labels. Besides, different parts of the text can contribute differently for predicting different labels, which is not considered by existing models. In this paper, we propose to view the multi-label classification task as a sequence generation problem, and apply a sequence generation model with a novel decoder structure to solve it. Extensive experimental results show that our proposed methods outperform previous work by a substantial margin. Further analysis of experimental results demonstrates that the proposed methods not only capture the correlations between labels, but also select the most informative words automatically when predicting different labels.

pdf bib
Improving Neural Abstractive Document Summarization with Explicit Information Selection Modeling
Wei Li | Xinyan Xiao | Yajuan Lyu | Yuanzhuo Wang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Information selection is the most important component in document summarization task. In this paper, we propose to extend the basic neural encoding-decoding framework with an information selection layer to explicitly model and optimize the information selection process in abstractive document summarization. Specifically, our information selection layer consists of two parts: gated global information filtering and local sentence selection. Unnecessary information in the original document is first globally filtered, then salient sentences are selected locally while generating each summary sentence sequentially. To optimize the information selection process directly, distantly-supervised training guided by the golden summary is also imported. Experimental results demonstrate that the explicit modeling and optimizing of the information selection process improves document summarization performance significantly, which enables our model to generate more informative and concise summaries, and thus significantly outperform state-of-the-art neural abstractive methods.

pdf bib
Improving Neural Abstractive Document Summarization with Structural Regularization
Wei Li | Xinyan Xiao | Yajuan Lyu | Yuanzhuo Wang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Recent neural sequence-to-sequence models have shown significant progress on short text summarization. However, for document summarization, they fail to capture the long-term structure of both documents and multi-sentence summaries, resulting in information loss and repetitions. In this paper, we propose to leverage the structural information of both documents and multi-sentence summaries to improve the document summarization performance. Specifically, we import both structural-compression and structural-coverage regularization into the summarization process in order to capture the information compression and information coverage properties, which are the two most important structural properties of document summarization. Experimental results demonstrate that the structural regularization improves the document summarization performance significantly, which enables our model to generate more informative and concise summaries, and thus significantly outperforms state-of-the-art neural abstractive methods.

pdf bib
Learning Universal Sentence Representations with Mean-Max Attention Autoencoder
Minghua Zhang | Yunfang Wu | Weikang Li | Wei Li
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

In order to learn universal sentence representations, previous methods focus on complex recurrent neural networks or supervised learning. In this paper, we propose a mean-max attention autoencoder (mean-max AAE) within the encoder-decoder framework. Our autoencoder rely entirely on the MultiHead self-attention mechanism to reconstruct the input sequence. In the encoding we propose a mean-max strategy that applies both mean and max pooling operations over the hidden vectors to capture diverse information of the input. To enable the information to steer the reconstruction process dynamically, the decoder performs attention over the mean-max representation. By training our model on a large collection of unlabelled data, we obtain high-quality representations of sentences. Experimental results on a broad range of 10 transfer tasks demonstrate that our model outperforms the state-of-the-art unsupervised single methods, including the classical skip-thoughts and the advanced skip-thoughts+LN model. Furthermore, compared with the traditional recurrent neural network, our mean-max AAE greatly reduce the training time.

2017

pdf bib
Derivation of Document Vectors from Adaptation of LSTM Language Model
Wei Li | Brian Mak
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

In many natural language processing (NLP) tasks, a document is commonly modeled as a bag of words using the term frequency-inverse document frequency (TF-IDF) vector. One major shortcoming of the frequency-based TF-IDF feature vector is that it ignores word orders that carry syntactic and semantic relationships among the words in a document. This paper proposes a novel distributed vector representation of a document, which will be labeled as DV-LSTM, and is derived from the result of adapting a long short-term memory recurrent neural network language model by the document. DV-LSTM is expected to capture some high-level sequential information in the document, which other current document representations fail to do. It was evaluated in document genre classification in the Brown Corpus and the BNC Baby Corpus. The results show that DV-LSTM significantly outperforms TF-IDF vector and paragraph vector (PV-DM) in most cases, and their combinations may further improve the classification performance.

2016

pdf bib
Abstractive News Summarization based on Event Semantic Link Network
Wei Li | Lei He | Hai Zhuge
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

This paper studies the abstractive multi-document summarization for event-oriented news texts through event information extraction and abstract representation. Fine-grained event mentions and semantic relations between them are extracted to build a unified and connected event semantic link network, an abstract representation of source texts. A network reduction algorithm is proposed to summarize the most salient and coherent event information. New sentences with good linguistic quality are automatically generated and selected through sentences over-generation and greedy-selection processes. Experimental results on DUC 2006 and DUC 2007 datasets show that our system significantly outperforms the state-of-the-art extractive and abstractive baselines under both pyramid and ROUGE evaluation metrics.

pdf bib
Exploring Differential Topic Models for Comparative Summarization of Scientific Papers
Lei He | Wei Li | Hai Zhuge
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

This paper investigates differential topic models (dTM) for summarizing the differences among document groups. Starting from a simple probabilistic generative model, we propose dTM-SAGE that explicitly models the deviations on group-specific word distributions to indicate how words are used differen-tially across different document groups from a background word distribution. It is more effective to capture unique characteristics for comparing document groups. To generate dTM-based comparative summaries, we propose two sentence scoring methods for measuring the sentence discriminative capacity. Experimental results on scientific papers dataset show that our dTM-based comparative summari-zation methods significantly outperform the generic baselines and the state-of-the-art comparative summarization methods under ROUGE metrics.

pdf bib
Chinese Poetry Generation with Planning based Neural Network
Zhe Wang | Wei He | Hua Wu | Haiyang Wu | Wei Li | Haifeng Wang | Enhong Chen
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Chinese poetry generation is a very challenging task in natural language processing. In this paper, we propose a novel two-stage poetry generating method which first plans the sub-topics of the poem according to the user’s writing intent, and then generates each line of the poem sequentially, using a modified recurrent neural network encoder-decoder framework. The proposed planning-based method can ensure that the generated poem is coherent and semantically consistent with the user’s intent. A comprehensive evaluation with human judgments demonstrates that our proposed approach outperforms the state-of-the-art poetry generating methods and the poem quality is somehow comparable to human poets.

pdf bib
Multi-level Gated Recurrent Neural Network for dialog act classification
Wei Li | Yunfang Wu
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In this paper we focus on the problem of dialog act (DA) labelling. This problem has recently attracted a lot of attention as it is an important sub-part of an automatic question answering system, which is currently in great demand. Traditional methods tend to see this problem as a sequence labelling task and deals with it by applying classifiers with rich features. Most of the current neural network models still omit the sequential information in the conversation. Henceforth, we apply a novel multi-level gated recurrent neural network (GRNN) with non-textual information to predict the DA tag. Our model not only utilizes textual information, but also makes use of non-textual and contextual information. In comparison, our model has shown significant improvement over previous works on Switchboard Dialog Act (SWDA) task by over 6%.

2015

pdf bib
Improved beam search with constrained softmax for NMT
Xiaoguang Hu | Wei Li | Xiang Lan | Hua Wu | Haifeng Wang
Proceedings of Machine Translation Summit XV: Papers

pdf bib
Abstractive Multi-document Summarization with Semantic Information Extraction
Wei Li
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2006

pdf bib
Mining Implicit Entities in Queries
Wei Li | Wenjie Li | Qin Lu
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Entities are pivotal in describing events and objects, and also very important in Document Summarization. In general only explicit entities which can be extracted by a Named Entity Recognizer are used in real applications. However, implicit entities hidden behind the phrases or words, e.g. entity referred by the phrase “cross border”, are proved to be helpful in Document Summarization. In our experiment, we extract the implicit entities from the web resources.

2005

pdf bib
Automatic Image Annotation Using Maximum Entropy Model
Wei Li | Maosong Sun
Second International Joint Conference on Natural Language Processing: Full Papers

pdf bib
A Preliminary Work on Classifying Time Granularities of Temporal Questions
Wei Li | Wenjie Li | Qin Lu | Kam-Fai Wong
Second International Joint Conference on Natural Language Processing: Full Papers

pdf bib
Word Independent Context Pair Classification Model for Word Sense Disambiguation
Cheng Niu | Wei Li | Rohini K. Srihari | Huifeng Li
Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005)

2004

pdf bib
Weakly Supervised Learning for Cross-document Person Name Disambiguation Supported by Information Extraction
Cheng Niu | Wei Li | Rohini K. Srihari
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

pdf bib
Context clustering for Word Sense Disambiguation based on modeling pairwise context similarities
Cheng Niu | Wei Li | Rohini K. Srihari | Huifeng Li | Laurie Crist
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

2003

pdf bib
Bootstrapping for Named Entity Tagging Using Concept-based Seeds
Cheng Niu | Wei Li | Jihong Ding | Rohini K. Srihari
Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers

pdf bib
A Bootstrapping Approach to Named Entity Classification Using Successive Learners
Cheng Niu | Wei Li | Jihong Ding | Rohini Srihari
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

pdf bib
An Expert Lexicon Approach to Identifying English Phrasal Verbs
Wei Li | Xiuhong Zhang | Cheng Niu | Yuankai Jiang | Rohini K. Srihari
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

pdf bib
InfoXtract location normalization: a hybrid approach to geographic references in information extraction
Huifeng Li | K. Rohini Srihari | Cheng Niu | Wei Li
Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References

pdf bib
Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons
Andrew McCallum | Wei Li
Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003

pdf bib
InfoXtract: A Customizable Intermediate Level Information Extraction Engine
Rohini K. Srihari | Wei Li | Cheng Niu | Thomas Cornell
Proceedings of the HLT-NAACL 2003 Workshop on Software Engineering and Architecture of Language Technology Systems (SEALTS)

pdf bib
Question Answering on a Case Insensitive Corpus
Wei Li | Rohini Srihari | Cheng Niu | Xiaoge Li
Proceedings of the ACL 2003 Workshop on Multilingual Summarization and Question Answering

2002

pdf bib
Extracting Exact Answers to Questions Based on Structural Links
Wei Li | Rohini K. Srihari | Xiaoge Li | M. Srikanth | Xiuhong Zhang | Cheng Niu
COLING-02: Multilingual Summarization and Question Answering

pdf bib
Location Normalization for Information Extraction
Huifeng Li | Rohini K. Srihari | Cheng Niu | Wei Li
COLING 2002: The 19th International Conference on Computational Linguistics

2000

pdf bib
A Question Answering System Supported by Information Extraction
Rohini Srihari | Wei Li
Sixth Applied Natural Language Processing Conference

1989

pdf bib
JFY-IV machine translation system
Zho Liu | Aiping Fu | Wei Li
Proceedings of Machine Translation Summit II

Search
Co-authors