Wei Lu


2021

pdf bib
Better Feature Integration for Named Entity Recognition
Lu Xu | Zhanming Jie | Wei Lu | Lidong Bing
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

It has been shown that named entity recognition (NER) could benefit from incorporating the long-distance structured information captured by dependency trees. We believe this is because both types of features - the contextual information captured by the linear sequences and the structured information captured by the dependency trees may complement each other. However, existing approaches largely focused on stacking the LSTM and graph neural networks such as graph convolutional networks (GCNs) for building improved NER models, where the exact interaction mechanism between the two types of features is not very clear, and the performance gain does not appear to be significant. In this work, we propose a simple and robust solution to incorporate both types of features with our Synergized-LSTM (Syn-LSTM), which clearly captures how the two types of features interact. We conduct extensive experiments on several standard datasets across four languages. The results demonstrate that the proposed model achieves better performance than previous approaches while requiring fewer parameters. Our further analysis demonstrates that our model can capture longer dependencies compared with strong baselines.

pdf bib
Entity-Aware Abstractive Multi-Document Summarization
Hao Zhou | Weidong Ren | Gongshen Liu | Bo Su | Wei Lu
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib
WN-Salience: A Corpus of News Articles with Entity Salience Annotations
Chuan Wu | Evangelos Kanoulas | Maarten de Rijke | Wei Lu
Proceedings of the 12th Language Resources and Evaluation Conference

Entities can be found in various text genres, ranging from tweets and web pages to user queries submitted to web search engines. Existing research either considers all entities in the text equally important, or heuristics are used to measure their salience. We believe that a key reason for the relatively limited work on entity salience is the lack of appropriate datasets. To support research on entity salience, we present a new dataset, the WikiNews Salience dataset (WN-Salience), which can be used to benchmark tasks such as entity salience detection and salient entity linking. WN-Salience is built on top of Wikinews, a Wikimedia project whose mission is to present reliable news articles. Entities in Wikinews articles are identified by the authors of the articles and are linked to Wikinews categories when they are salient or to Wikipedia pages otherwise. The dataset is built automatically, and consists of approximately 7,000 news articles, and 90,000 in-text entity annotations. We compare the WN-Salience dataset against existing datasets on the task and analyze their differences. Furthermore, we conduct experiments on entity salience detection; the results demonstrate that WN-Salience is a challenging testbed that is complementary to existing ones.

pdf bib
Reasoning with Latent Structure Refinement for Document-Level Relation Extraction
Guoshun Nan | Zhijiang Guo | Ivan Sekulic | Wei Lu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Document-level relation extraction requires integrating information within and across multiple sentences of a document and capturing complex interactions between inter-sentence entities. However, effective aggregation of relevant information in the document remains a challenging research question. Existing approaches construct static document-level graphs based on syntactic trees, co-references or heuristics from the unstructured text to model the dependencies. Unlike previous methods that may not be able to capture rich non-local interactions for inference, we propose a novel model that empowers the relational reasoning across sentences by automatically inducing the latent document-level graph. We further develop a refinement strategy, which enables the model to incrementally aggregate relevant information for multi-hop reasoning. Specifically, our model achieves an F1 score of 59.05 on a large-scale document-level dataset (DocRED), significantly improving over the previous results, and also yields new state-of-the-art results on the CDR and GDA dataset. Furthermore, extensive analyses show that the model is able to discover more accurate inter-sentence relations.

pdf bib
Understanding Attention for Text Classification
Xiaobing Sun | Wei Lu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Attention has been proven successful in many natural language processing (NLP) tasks. Recently, many researchers started to investigate the interpretability of attention on NLP tasks. Many existing approaches focused on examining whether the local attention weights could reflect the importance of input representations. In this work, we present a study on understanding the internal mechanism of attention by looking into the gradient update process, checking its behavior when approaching a local minimum during training. We propose to analyze for each word token the following two quantities: its polarity score and its attention score, where the latter is a global assessment on the token’s significance. We discuss conditions under which the attention mechanism may become more (or less) interpretable, and show how the interplay between the two quantities can contribute towards model performance.

pdf bib
ENT-DESC: Entity Description Generation by Exploring Knowledge Graph
Liying Cheng | Dekun Wu | Lidong Bing | Yan Zhang | Zhanming Jie | Wei Lu | Luo Si
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Previous works on knowledge-to-text generation take as input a few RDF triples or key-value pairs conveying the knowledge of some entities to generate a natural language description. Existing datasets, such as WIKIBIO, WebNLG, and E2E, basically have a good alignment between an input triple/pair set and its output text. However, in practice, the input knowledge could be more than enough, since the output description may only cover the most significant knowledge. In this paper, we introduce a large-scale and challenging dataset to facilitate the study of such a practical scenario in KG-to-text. Our dataset involves retrieving abundant knowledge of various types of main entities from a large knowledge graph (KG), which makes the current graph-to-sequence models severely suffer from the problems of information loss and parameter explosion while generating the descriptions. We address these challenges by proposing a multi-graph structure that is able to represent the original graph information more comprehensively. Furthermore, we also incorporate aggregation methods that learn to extract the rich graph information. Extensive experiments demonstrate the effectiveness of our model architecture.

pdf bib
Two are Better than One: Joint Entity and Relation Extraction with Table-Sequence Encoders
Jue Wang | Wei Lu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Named entity recognition and relation extraction are two important fundamental problems. Joint learning algorithms have been proposed to solve both tasks simultaneously, and many of them cast the joint task as a table-filling problem. However, they typically focused on learning a single encoder (usually learning representation in the form of a table) to capture information required for both tasks within the same space. We argue that it can be beneficial to design two distinct encoders to capture such two different types of information in the learning process. In this work, we propose the novel table-sequence encoders where two different encoders – a table encoder and a sequence encoder are designed to help each other in the representation learning process. Our experiments confirm the advantages of having two encoders over one encoder. On several standard datasets, our model shows significant improvements over existing approaches.

pdf bib
Lightweight, Dynamic Graph Convolutional Networks for AMR-to-Text Generation
Yan Zhang | Zhijiang Guo | Zhiyang Teng | Wei Lu | Shay B. Cohen | Zuozhu Liu | Lidong Bing
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

AMR-to-text generation is used to transduce Abstract Meaning Representation structures (AMR) into text. A key challenge in this task is to efficiently learn effective graph representations. Previously, Graph Convolution Networks (GCNs) were used to encode input AMRs, however, vanilla GCNs are not able to capture non-local information and additionally, they follow a local (first-order) information aggregation scheme. To account for these issues, larger and deeper GCN models are required to capture more complex interactions. In this paper, we introduce a dynamic fusion mechanism, proposing Lightweight Dynamic Graph Convolutional Networks (LDGCNs) that capture richer non-local interactions by synthesizing higher order information from the input graphs. We further develop two novel parameter saving strategies based on the group graph convolutions and weight tied convolutions to reduce memory usage and model complexity. With the help of these strategies, we are able to train a model with fewer parameters while maintaining the model capacity. Experiments demonstrate that LDGCNs outperform state-of-the-art models on two benchmark datasets for AMR-to-text generation with significantly fewer parameters.

pdf bib
Position-Aware Tagging for Aspect Sentiment Triplet Extraction
Lu Xu | Hao Li | Wei Lu | Lidong Bing
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Aspect Sentiment Triplet Extraction (ASTE) is the task of extracting the triplets of target entities, their associated sentiment, and opinion spans explaining the reason for the sentiment. Existing research efforts mostly solve this problem using pipeline approaches, which break the triplet extraction process into several stages. Our observation is that the three elements within a triplet are highly related to each other, and this motivates us to build a joint model to extract such triplets using a sequence tagging approach. However, how to effectively design a tagging approach to extract the triplets that can capture the rich interactions among the elements is a challenging research question. In this work, we propose the first end-to-end model with a novel position-aware tagging scheme that is capable of jointly extracting the triplets. Our experimental results on several existing datasets show that jointly capturing elements in the triplet using our approach leads to improved performance over the existing approaches. We also conducted extensive experiments to investigate the model effectiveness and robustness.

pdf bib
Aspect Sentiment Classification with Aspect-Specific Opinion Spans
Lu Xu | Lidong Bing | Wei Lu | Fei Huang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Aspect based sentiment analysis, predicting sentiment polarity of given aspects, has drawn extensive attention. Previous attention-based models emphasize using aspect semantics to help extract opinion features for classification. However, these works are either not able to capture opinion spans as a whole, or not able to capture variable-length opinion spans. In this paper, we present a neat and effective structured attention model by aggregating multiple linear-chain CRFs. Such a design allows the model to extract aspect-specific opinion spans and then evaluate sentiment polarity by exploiting the extracted opinion features. The experimental results on four datasets demonstrate the effectiveness of the proposed model, and our analysis demonstrates that our model can capture aspect-specific opinion spans.

pdf bib
Pre-training for Abstractive Document Summarization by Reinstating Source Text
Yanyan Zou | Xingxing Zhang | Wei Lu | Furu Wei | Ming Zhou
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Abstractive document summarization is usually modeled as a sequence-to-sequence (SEQ2SEQ) learning problem. Unfortunately, training large SEQ2SEQ based summarization models on limited supervised summarization data is challenging. This paper presents three sequence-to-sequence pre-training (in shorthand, STEP) objectives which allow us to pre-train a SEQ2SEQ based abstractive summarization model on unlabeled text. The main idea is that, given an input text artificially constructed from a document, a model is pre-trained to reinstate the original document. These objectives include sentence reordering, next sentence generation and masked document generation, which have close relations with the abstractive document summarization task. Experiments on two benchmark summarization datasets (i.e., CNN/DailyMail and New York Times) show that all three objectives can improve performance upon baselines. Compared to models pre-trained on large-scale data (larger than 160GB), our method, with only 19GB text for pre-training, achieves comparable results, which demonstrates its effectiveness.

pdf bib
Re-examining the Role of Schema Linking in Text-to-SQL
Wenqiang Lei | Weixin Wang | Zhixin Ma | Tian Gan | Wei Lu | Min-Yen Kan | Tat-Seng Chua
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

In existing sophisticated text-to-SQL models, schema linking is often considered as a simple, minor component, belying its importance. By providing a schema linking corpus based on the Spider text-to-SQL dataset, we systematically study the role of schema linking. We also build a simple BERT-based baseline, called Schema-Linking SQL (SLSQL) to perform a data-driven study. We find when schema linking is done well, SLSQL demonstrates good performance on Spider despite its structural simplicity. Many remaining errors are attributable to corpus noise. This suggests schema linking is the crux for the current text-to-SQL task. Our analytic studies provide insights on the characteristics of schema linking for future developments of text-to-SQL tasks.

pdf bib
APE: Argument Pair Extraction from Peer Review and Rebuttal via Multi-task Learning
Liying Cheng | Lidong Bing | Qian Yu | Wei Lu | Luo Si
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Peer review and rebuttal, with rich interactions and argumentative discussions in between, are naturally a good resource to mine arguments. However, few works study both of them simultaneously. In this paper, we introduce a new argument pair extraction (APE) task on peer review and rebuttal in order to study the contents, the structure and the connections between them. We prepare a challenging dataset that contains 4,764 fully annotated review-rebuttal passage pairs from an open review platform to facilitate the study of this task. To automatically detect argumentative propositions and extract argument pairs from this corpus, we cast it as the combination of a sequence labeling task and a text relation classification task. Thus, we propose a multitask learning framework based on hierarchical LSTM networks. Extensive experiments and analysis demonstrate the effectiveness of our multi-task framework, and also show the challenges of the new task as well as motivate future research directions.

pdf bib
SmartCiteCon: Implicit Citation Context Extraction from Academic Literature Using Supervised Learning
Chenrui Guo | Haoran Cui | Li Zhang | Jiamin Wang | Wei Lu | Jian Wu
Proceedings of the 8th International Workshop on Mining Scientific Publications

We introduce SmartCiteCon (SCC), a Java API for extracting both explicit and implicit citation context from academic literature in English. The tool is built on a Support Vector Machine (SVM) model trained on a set of 7,058 manually annotated citation context sentences, curated from 34,000 papers from the ACL Anthology. The model with 19 features achieves F1=85.6%. SCC supports PDF, XML, and JSON files out-of-box, provided that they are conformed to certain schemas. The API supports single document processing and batch processing in parallel. It takes about 12–45 seconds on average depending on the format to process a document on a dedicated server with 6 multithreaded cores. Using SCC, we extracted 11.8 million citation context sentences from ~33.3k PMC papers in the CORD-19 dataset, released on June 13, 2020. We will provide continuous supplementary data contribution to the CORD-19 and other datasets. The source code is released at https://gitee.com/irlab/SmartCiteCon.

2019

pdf bib
Dependency-Guided LSTM-CRF for Named Entity Recognition
Zhanming Jie | Wei Lu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Dependency tree structures capture long-distance and syntactic relationships between words in a sentence. The syntactic relations (e.g., nominal subject, object) can potentially infer the existence of certain named entities. In addition, the performance of a named entity recognizer could benefit from the long-distance dependencies between the words in dependency trees. In this work, we propose a simple yet effective dependency-guided LSTM-CRF model to encode the complete dependency trees and capture the above properties for the task of named entity recognition (NER). The data statistics show strong correlations between the entity types and dependency relations. We conduct extensive experiments on several standard datasets and demonstrate the effectiveness of the proposed model in improving NER and achieving state-of-the-art performance. Our analysis reveals that the significant improvements mainly result from the dependency relations and long-distance interactions provided by dependency trees.

pdf bib
Aligning Cross-Lingual Entities with Multi-Aspect Information
Hsiu-Wei Yang | Yanyan Zou | Peng Shi | Wei Lu | Jimmy Lin | Xu Sun
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Multilingual knowledge graphs (KGs), such as YAGO and DBpedia, represent entities in different languages. The task of cross-lingual entity alignment is to match entities in a source language with their counterparts in target languages. In this work, we investigate embedding-based approaches to encode entities from multilingual KGs into the same vector space, where equivalent entities are close to each other. Specifically, we apply graph convolutional networks (GCNs) to combine multi-aspect information of entities, including topological connections, relations, and attributes of entities, to learn entity embeddings. To exploit the literal descriptions of entities expressed in different languages, we propose two uses of a pretrained multilingual BERT model to bridge cross-lingual gaps. We further propose two strategies to integrate GCN-based and BERT-based modules to boost performance. Extensive experiments on two benchmark datasets demonstrate that our method significantly outperforms existing systems.

pdf bib
Text2Math: End-to-end Parsing Text into Math Expressions
Yanyan Zou | Wei Lu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We propose Text2Math, a model for semantically parsing text into math expressions. The model can be used to solve different math related problems including arithmetic word problems and equation parsing problems. Unlike previous approaches, we tackle the problem from an end-to-end structured prediction perspective where our algorithm aims to predict the complete math expression at once as a tree structure, where minimal manual efforts are involved in the process. Empirical results on benchmark datasets demonstrate the efficacy of our approach.

pdf bib
Learning Explicit and Implicit Structures for Targeted Sentiment Analysis
Hao Li | Wei Lu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Targeted sentiment analysis is the task of jointly predicting target entities and their associated sentiment information. Existing research efforts mostly regard this joint task as a sequence labeling problem, building models that can capture explicit structures in the output space. However, the importance of capturing implicit global structural information that resides in the input space is largely unexplored. In this work, we argue that both types of information (implicit and explicit structural information) are crucial for building a successful targeted sentiment analysis model. Our experimental results show that properly capturing both information is able to lead to better performance than competitive existing approaches. We also conduct extensive experiments to investigate our model’s effectiveness and robustness.

pdf bib
Combining Spans into Entities: A Neural Two-Stage Approach for Recognizing Discontiguous Entities
Bailin Wang | Wei Lu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

In medical documents, it is possible that an entity of interest not only contains a discontiguous sequence of words but also overlaps with another entity. Entities of such structures are intrinsically hard to recognize due to the large space of possible entity combinations. In this work, we propose a neural two-stage approach to recognizing discontiguous and overlapping entities by decomposing this problem into two subtasks: 1) it first detects all the overlapping spans that either form entities on their own or present as segments of discontiguous entities, based on the representation of segmental hypergraph, 2) next it learns to combine these segments into discontiguous entities with a classifier, which filters out other incorrect combinations of segments. Two neural components are designed for these subtasks respectively and they are learned jointly using a shared encoder for text. Our model achieves the state-of-the-art performance in a standard dataset, even in the absence of external features that previous methods used.

pdf bib
UER: An Open-Source Toolkit for Pre-training Models
Zhe Zhao | Hui Chen | Jinbin Zhang | Xin Zhao | Tao Liu | Wei Lu | Xi Chen | Haotang Deng | Qi Ju | Xiaoyong Du
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations

Existing works, including ELMO and BERT, have revealed the importance of pre-training for NLP tasks. While there does not exist a single pre-training model that works best in all cases, it is of necessity to develop a framework that is able to deploy various pre-training models efficiently. For this purpose, we propose an assemble-on-demand pre-training toolkit, namely Universal Encoder Representations (UER). UER is loosely coupled, and encapsulated with rich modules. By assembling modules on demand, users can either reproduce a state-of-the-art pre-training model or develop a pre-training model that remains unexplored. With UER, we have built a model zoo, which contains pre-trained models based on different corpora, encoders, and targets (objectives). With proper pre-trained models, we could achieve new state-of-the-art results on a range of downstream datasets.

pdf bib
Better Modeling of Incomplete Annotations for Named Entity Recognition
Zhanming Jie | Pengjun Xie | Wei Lu | Ruixue Ding | Linlin Li
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Supervised approaches to named entity recognition (NER) are largely developed based on the assumption that the training data is fully annotated with named entity information. However, in practice, annotated data can often be imperfect with one typical issue being the training data may contain incomplete annotations. We highlight several pitfalls associated with learning under such a setup in the context of NER and identify limitations associated with existing approaches, proposing a novel yet easy-to-implement approach for recognizing named entities with incomplete data annotations. We demonstrate the effectiveness of our approach through extensive experiments.

pdf bib
Joint Detection and Location of English Puns
Yanyan Zou | Wei Lu
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

A pun is a form of wordplay for an intended humorous or rhetorical effect, where a word suggests two or more meanings by exploiting polysemy (homographic pun) or phonological similarity to another word (heterographic pun). This paper presents an approach that addresses pun detection and pun location jointly from a sequence labeling perspective. We employ a new tagging scheme such that the model is capable of performing such a joint task, where useful structural information can be properly captured. We show that our proposed model is effective in handling both homographic and heterographic puns. Empirical results on the benchmark datasets demonstrate that our approach can achieve new state-of-the-art results.

pdf bib
Neural Chinese Address Parsing
Hao Li | Wei Lu | Pengjun Xie | Linlin Li
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

This paper introduces a new task – Chinese address parsing – the task of mapping Chinese addresses into semantically meaningful chunks. While it is possible to model this problem using a conventional sequence labelling approach, our observation is that there exist complex dependencies between labels that cannot be readily captured by a simple linear-chain structure. We investigate neural structured prediction models with latent variables to capture such rich structural information within Chinese addresses. We create and publicly release a new dataset consisting of 15K Chinese addresses, and conduct extensive experiments on the dataset to investigate the model effectiveness and robustness. We release our code and data at http://statnlp.org/research/sp.

pdf bib
Attention Guided Graph Convolutional Networks for Relation Extraction
Zhijiang Guo | Yan Zhang | Wei Lu
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Dependency trees convey rich structural information that is proven useful for extracting relations among entities in text. However, how to effectively make use of relevant information while ignoring irrelevant information from the dependency trees remains a challenging research question. Existing approaches employing rule based hard-pruning strategies for selecting relevant partial dependency structures may not always yield optimal results. In this work, we propose Attention Guided Graph Convolutional Networks (AGGCNs), a novel model which directly takes full dependency trees as inputs. Our model can be understood as a soft-pruning approach that automatically learns how to selectively attend to the relevant sub-structures useful for the relation extraction task. Extensive results on various tasks including cross-sentence n-ary relation extraction and large-scale sentence-level relation extraction show that our model is able to better leverage the structural information of the full dependency trees, giving significantly better results than previous approaches.

pdf bib
A Neural Multi-digraph Model for Chinese NER with Gazetteers
Ruixue Ding | Pengjun Xie | Xiaoyan Zhang | Wei Lu | Linlin Li | Luo Si
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Gazetteers were shown to be useful resources for named entity recognition (NER). Many existing approaches to incorporating gazetteers into machine learning based NER systems rely on manually defined selection strategies or handcrafted templates, which may not always lead to optimal effectiveness, especially when multiple gazetteers are involved. This is especially the case for the task of Chinese NER, where the words are not naturally tokenized, leading to additional ambiguities. To automatically learn how to incorporate multiple gazetteers into an NER system, we propose a novel approach based on graph neural networks with a multi-digraph structure that captures the information that the gazetteers offer. Experiments on various datasets show that our model is effective in incorporating rich gazetteer information while resolving ambiguities, outperforming previous approaches.

pdf bib
Twitter Homophily: Network Based Prediction of User’s Occupation
Jiaqi Pan | Rishabh Bhardwaj | Wei Lu | Hai Leong Chieu | Xinghao Pan | Ni Yi Puay
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In this paper, we investigate the importance of social network information compared to content information in the prediction of a Twitter user’s occupational class. We show that the content information of a user’s tweets, the profile descriptions of a user’s follower/following community, and the user’s social network provide useful information for classifying a user’s occupational group. In our study, we extend an existing data set for this problem, and we achieve significantly better performance by using social network homophily that has not been fully exploited in previous work. In our analysis, we found that by using the graph convolutional network to exploit social homophily, we can achieve competitive performance on this data set with just a small fraction of the training data.

pdf bib
Quantity Tagger: A Latent-Variable Sequence Labeling Approach to Solving Addition-Subtraction Word Problems
Yanyan Zou | Wei Lu
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

An arithmetic word problem typically includes a textual description containing several constant quantities. The key to solving the problem is to reveal the underlying mathematical relations (such as addition and subtraction) among quantities, and then generate equations to find solutions. This work presents a novel approach, Quantity Tagger, that automatically discovers such hidden relations by tagging each quantity with a sign corresponding to one type of mathematical operation. For each quantity, we assume there exists a latent, variable-sized quantity span surrounding the quantity token in the text, which conveys information useful for determining its sign. Empirical results show that our method achieves 5 and 8 points of accuracy gains on two datasets respectively, compared to prior approaches.

pdf bib
Densely Connected Graph Convolutional Networks for Graph-to-Sequence Learning
Zhijiang Guo | Yan Zhang | Zhiyang Teng | Wei Lu
Transactions of the Association for Computational Linguistics, Volume 7

We focus on graph-to-sequence learning, which can be framed as transducing graph structures to sequences for text generation. To capture structural information associated with graphs, we investigate the problem of encoding graphs using graph convolutional networks (GCNs). Unlike various existing approaches where shallow architectures were used for capturing local structural information only, we introduce a dense connection strategy, proposing a novel Densely Connected Graph Convolutional Network (DCGCN). Such a deep architecture is able to integrate both local and non-local features to learn a better structural representation of a graph. Our model outperforms the state-of-the-art neural models significantly on AMR-to-text generation and syntax-based neural machine translation.

2018

pdf bib
Learning with Structured Representations for Negation Scope Extraction
Hao Li | Wei Lu
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We report an empirical study on the task of negation scope extraction given the negation cue. Our key observation is that certain useful information such as features related to negation cue, long-distance dependencies as well as some latent structural information can be exploited for such a task. We design approaches based on conditional random fields (CRF), semi-Markov CRF, as well as latent-variable CRF models to capture such information. Extensive experiments on several standard datasets demonstrate that our approaches are able to achieve better results than existing approaches reported in the literature.

pdf bib
Learning Cross-lingual Distributed Logical Representations for Semantic Parsing
Yanyan Zou | Wei Lu
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

With the development of several multilingual datasets used for semantic parsing, recent research efforts have looked into the problem of learning semantic parsers in a multilingual setup. However, how to improve the performance of a monolingual semantic parser for a specific language by leveraging data annotated in different languages remains a research question that is under-explored. In this work, we present a study to show how learning distributed representations of the logical forms from data annotated in different languages can be used for improving the performance of a monolingual semantic parser. We extend two existing monolingual semantic parsers to incorporate such cross-lingual distributed logical representations as features. Experiments show that our proposed approach is able to yield improved semantic parsing results on the standard multilingual GeoQuery dataset.

pdf bib
Neural Segmental Hypergraphs for Overlapping Mention Recognition
Bailin Wang | Wei Lu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

In this work, we propose a novel segmental hypergraph representation to model overlapping entity mentions that are prevalent in many practical datasets. We show that our model built on top of such a new representation is able to capture features and interactions that cannot be captured by previous models while maintaining a low time complexity for inference. We also present a theoretical analysis to formally assess how our representation is better than alternative representations reported in the literature in terms of representational power. Coupled with neural networks for feature learning, our model achieves the state-of-the-art performance in three benchmark datasets annotated with overlapping mentions.

pdf bib
A Neural Transition-based Model for Nested Mention Recognition
Bailin Wang | Wei Lu | Yu Wang | Hongxia Jin
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

It is common that entity mentions can contain other mentions recursively. This paper introduces a scalable transition-based method to model the nested structure of mentions. We first map a sentence with nested mentions to a designated forest where each mention corresponds to a constituent of the forest. Our shift-reduce based system then learns to construct the forest structure in a bottom-up manner through an action sequence whose maximal length is guaranteed to be three times of the sentence length. Based on Stack-LSTM which is employed to efficiently and effectively represent the states of the system in a continuous space, our system is further incorporated with a character-based component to capture letter-level patterns. Our model gets the state-of-the-art performances in ACE datasets, showing its effectiveness in detecting nested mentions.

pdf bib
Better Transition-Based AMR Parsing with a Refined Search Space
Zhijiang Guo | Wei Lu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

This paper introduces a simple yet effective transition-based system for Abstract Meaning Representation (AMR) parsing. We argue that a well-defined search space involved in a transition system is crucial for building an effective parser. We propose to conduct the search in a refined search space based on a new compact AMR graph and an improved oracle. Our end-to-end parser achieves the state-of-the-art performance on various datasets with minimal additional information.

pdf bib
Neural Adaptation Layers for Cross-domain Named Entity Recognition
Bill Yuchen Lin | Wei Lu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Recent research efforts have shown that neural architectures can be effective in conventional information extraction tasks such as named entity recognition, yielding state-of-the-art results on standard newswire datasets. However, despite significant resources required for training such models, the performance of a model trained on one domain typically degrades dramatically when applied to a different domain, yet extracting entities from new emerging domains such as social media can be of significant interest. In this paper, we empirically investigate effective methods for conveniently adapting an existing, well-trained neural NER model for a new domain. Unlike existing approaches, we propose lightweight yet effective methods for performing domain adaptation for neural models. Specifically, we introduce adaptation layers on top of existing neural architectures, where no re-training using the source domain data is required. We conduct extensive empirical studies and show that our approach significantly outperforms state-of-the-art methods.

pdf bib
Dependency-based Hybrid Trees for Semantic Parsing
Zhanming Jie | Wei Lu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We propose a novel dependency-based hybrid tree model for semantic parsing, which converts natural language utterance into machine interpretable meaning representations. Unlike previous state-of-the-art models, the semantic information is interpreted as the latent dependency between the natural language words in our joint representation. Such dependency information can capture the interactions between the semantics and natural language words. We integrate a neural component into our model and propose an efficient dynamic-programming algorithm to perform tractable inference. Through extensive experiments on the standard multilingual GeoQuery dataset with eight languages, we demonstrate that our proposed approach is able to achieve state-of-the-art performance across several languages. Analysis also justifies the effectiveness of using our new dependency-based representation.

pdf bib
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Eduardo Blanco | Wei Lu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

pdf bib
SemEval-2018 Task 8: Semantic Extraction from CybersecUrity REports using Natural Language Processing (SecureNLP)
Peter Phandi | Amila Silva | Wei Lu
Proceedings of The 12th International Workshop on Semantic Evaluation

This paper describes the SemEval 2018 shared task on semantic extraction from cybersecurity reports, which is introduced for the first time as a shared task on SemEval. This task comprises four SubTasks done incrementally to predict the characteristics of a specific malware using cybersecurity reports. To the best of our knowledge, we introduce the world’s largest publicly available dataset of annotated malware reports in this task. This task received in total 18 submissions from 9 participating teams.

2017

pdf bib
MalwareTextDB: A Database for Annotated Malware Articles
Swee Kiat Lim | Aldrian Obaja Muis | Wei Lu | Chen Hui Ong
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Cybersecurity risks and malware threats are becoming increasingly dangerous and common. Despite the severity of the problem, there has been few NLP efforts focused on tackling cybersecurity. In this paper, we discuss the construction of a new database for annotated malware texts. An annotation framework is introduced based on the MAEC vocabulary for defining malware characteristics, along with a database consisting of 39 annotated APT reports with a total of 6,819 sentences. We also use the database to construct models that can potentially help cybersecurity researchers in their data collection and analytics efforts.

pdf bib
Topical Coherence in LDA-based Models through Induced Segmentation
Hesam Amoualian | Wei Lu | Eric Gaussier | Georgios Balikas | Massih R. Amini | Marianne Clausel
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This paper presents an LDA-based model that generates topically coherent segments within documents by jointly segmenting documents and assigning topics to their words. The coherence between topics is ensured through a copula, binding the topics associated to the words of a segment. In addition, this model relies on both document and segment specific topic distributions so as to capture fine grained differences in topic assignments. We show that the proposed model naturally encompasses other state-of-the-art LDA-based models designed for similar tasks. Furthermore, our experiments, conducted on six different publicly available datasets, show the effectiveness of our model in terms of perplexity, Normalized Pointwise Mutual Information, which captures the coherence between the generated topics, and the Micro F1 measure for text classification.

pdf bib
Neural Architectures for Multilingual Semantic Parsing
Raymond Hendy Susanto | Wei Lu
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

In this paper, we address semantic parsing in a multilingual context. We train one multilingual model that is capable of parsing natural language sentences from multiple different languages into their corresponding formal semantic representations. We extend an existing sequence-to-tree model to a multi-task learning framework which shares the decoder for generating semantic representations. We report evaluation results on the multilingual GeoQuery corpus and introduce a new multilingual version of the ATIS corpus.

pdf bib
Labeling Gaps Between Words: Recognizing Overlapping Mentions with Mention Separators
Aldrian Obaja Muis | Wei Lu
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

In this paper, we propose a new model that is capable of recognizing overlapping mentions. We introduce a novel notion of mention separators that can be effectively used to capture how mentions overlap with one another. On top of a novel multigraph representation that we introduce, we show that efficient and exact inference can still be performed. We present some theoretical analysis on the differences between our model and a recently proposed model for recognizing overlapping mentions, and discuss the possible implications of the differences. Through extensive empirical analysis on standard datasets, we demonstrate the effectiveness of our approach.

pdf bib
A Simple Regularization-based Algorithm for Learning Cross-Domain Word Embeddings
Wei Yang | Wei Lu | Vincent Zheng
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Learning word embeddings has received a significant amount of attention recently. Often, word embeddings are learned in an unsupervised manner from a large collection of text. The genre of the text typically plays an important role in the effectiveness of the resulting embeddings. How to effectively train word embedding models using data from different domains remains a problem that is less explored. In this paper, we present a simple yet effective method for learning word embeddings based on text from different domains. We demonstrate the effectiveness of our approach through extensive experiments on various down-stream NLP tasks.

pdf bib
A Unified Framework for Structured Prediction: From Theory to Practice
Wei Lu
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

Structured prediction is one of the most important topics in various fields, including machine learning, computer vision, natural language processing (NLP) and bioinformatics. In this tutorial, we present a novel framework that unifies various structured prediction models.The hidden Markov model (HMM) and the probabilistic context-free grammars (PCFGs) are two classic generative models used for predicting outputs with linear-chain and tree structures, respectively. As HMM’s discriminative counterpart, the linear-chain conditional random fields (CRFs) (Lafferty et al., 2001) model was later proposed. Such a model was shown to yield good performance on standard NLP tasks such as information extraction. Several extensions to such a model were then proposed afterward, including the semi-Markov CRFs (Sarawagi and Cohen, 2004), tree CRFs (Cohn and Blunsom, 2005), as well as discriminative parsing models and their latent variable variants (Petrov and Klein, 2007). On the other hand, utilizing a slightly different loss function, one could arrive at the structured support vector machines (Tsochantaridis et al., 2004) and its latent variable variant (Yu and Joachims, 2009) as well. Furthermore, new models that integrate neural networks and graphical models, such as neural CRFs (Do et al., 2010) were also proposed.In this tutorial, we will be discussing how such a wide spectrum of existing structured prediction models can all be implemented under a unified framework (available at here) that involves some basic building blocks. Based on such a framework, we show how some seemingly complicated structured prediction models such as a semantic parsing model (Lu et al., 2008; Lu, 2014) can be implemented conveniently and quickly. Furthermore, we also show that the framework can be used to solve certain structured prediction problems that otherwise cannot be easily handled by conventional structured prediction models. Specifically, we show how to use such a framework to construct models that are capable of predicting non-conventional structures, such as overlapping structures (Lu and Roth, 2015; Muis and Lu, 2016a). We will also discuss how to make use of the framework to build other related models such as topic models and highlight its potential applications in some recent popular tasks (e.g., AMR parsing (Flanigan et al., 2014)).The framework has been extensively used by our research group for developing various structured prediction models, including models for information extraction (Lu and Roth, 2015; Muis and Lu, 2016a; Jie et al., 2017), noun phrase chunking (Muis and Lu, 2016b), semantic parsing (Lu, 2015; Susanto and Lu, 2017), and sentiment analysis (Li and Lu, 2017). It is our hope that this tutorial will be helpful for many natural language processing researchers who are interested in designing their own structured prediction models rapidly. We also hope this tutorial allows researchers to strengthen their understandings on the connections between various structured prediction models, and that the open release of the framework will bring value to the NLP research community and enhance its overall productivity.The material associated with this tutorial will be available at the tutorial web site: https://web.archive.org/web/20180427113151/http://statnlp.org/tutorials/.

2016

pdf bib
Learning to Recognize Discontiguous Entities
Aldrian Obaja Muis | Wei Lu
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
A General Regularization Framework for Domain Adaptation
Wei Lu | Hai Leong Chieu | Jonathan Löfgren
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Learning to Capitalize with Character-Level Recurrent Neural Networks: An Empirical Study
Raymond Hendy Susanto | Hai Leong Chieu | Wei Lu
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Weak Semi-Markov CRFs for Noun Phrase Chunking in Informal Text
Aldrian Obaja Muis | Wei Lu
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2015

pdf bib
Joint Mention Extraction and Classification with Mention Hypergraphs
Wei Lu | Dan Roth
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Improving Semantic Parsing with Enriched Synchronous Context-Free Grammar
Junhui Li | Muhua Zhu | Wei Lu | Guodong Zhou
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Book Review: Language Processing with Perl and Prolog: Theories, Implemetation, and Application by Pierre M. Nugues
Wei Lu
Computational Linguistics, Volume 41, Issue 4 - December 2015

pdf bib
Constrained Semantic Forests for Improved Discriminative Semantic Parsing
Wei Lu
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

pdf bib
Multilingual Semantic Parsing : Parsing Multiple Languages into Semantic Representations
Zhanming Jie | Wei Lu
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Semantic Parsing with Relaxed Hybrid Trees
Wei Lu
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2012

pdf bib
Joint Inference for Event Timeline Construction
Quang Do | Wei Lu | Dan Roth
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Automatic Event Extraction with Structured Preference Modeling
Wei Lu | Dan Roth
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2011

pdf bib
A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions
Wei Lu | Hwee Tou Ng
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib
Better Punctuation Prediction with Dynamic Conditional Random Fields
Wei Lu | Hwee Tou Ng
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

2009

pdf bib
Natural Language Generation with Tree Conditional Random Fields
Wei Lu | Hwee Tou Ng | Wee Sun Lee
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
The NUS statistical machine translation system for IWSLT 2009
Preslav Nakov | Chang Liu | Wei Lu | Hwee Tou Ng
Proceedings of the 6th International Workshop on Spoken Language Translation: Evaluation Campaign

We describe the system developed by the team of the National University of Singapore for the Chinese-English BTEC task of the IWSLT 2009 evaluation campaign. We adopted a state-of-the-art phrase-based statistical machine translation approach and focused on experiments with different Chinese word segmentation standards. In our official submission, we trained a separate system for each segmenter and we combined the outputs in a subsequent re-ranking step. Given the small size of the training data, we further re-trained the system on the development data after tuning. The evaluation results show that both strategies yield sizeable and consistent improvements in translation quality.

2008

pdf bib
A Generative Model for Parsing Natural Language to Meaning Representations
Wei Lu | Hwee Tou Ng | Wee Sun Lee | Luke S. Zettlemoyer
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing