Linfeng Song


pdf bib
A Multi-Level Optimization Framework for End-to-End Text Augmentation
Sai Ashish Somayajula | Linfeng Song | Pengtao Xie
Transactions of the Association for Computational Linguistics, Volume 10

Text augmentation is an effective technique in alleviating overfitting in NLP tasks. In existing methods, text augmentation and downstream tasks are mostly performed separately. As a result, the augmented texts may not be optimal to train the downstream model. To address this problem, we propose a three-level optimization framework to perform text augmentation and the downstream task end-to- end. The augmentation model is trained in a way tailored to the downstream task. Our framework consists of three learning stages. A text summarization model is trained to perform data augmentation at the first stage. Each summarization example is associated with a weight to account for its domain difference with the text classification data. At the second stage, we use the model trained at the first stage to perform text augmentation and train a text classification model on the augmented texts. At the third stage, we evaluate the text classification model trained at the second stage and update weights of summarization examples by minimizing the validation loss. These three stages are performed end-to-end. We evaluate our method on several text classification datasets where the results demonstrate the effectiveness of our method. Code is available at

pdf bib
Variational Graph Autoencoding as Cheap Supervision for AMR Coreference Resolution
Irene Li | Linfeng Song | Kun Xu | Dong Yu
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Coreference resolution over semantic graphs like AMRs aims to group the graph nodes that represent the same entity. This is a crucial step for making document-level formal semantic representations. With annotated data on AMR coreference resolution, deep learning approaches have recently shown great potential for this task, yet they are usually data hunger and annotations are costly. We propose a general pretraining method using variational graph autoencoder (VGAE) for AMR coreference resolution, which can leverage any general AMR corpus and even automatically parsed AMR data. Experiments on benchmarks show that the pretraining approach achieves performance gains of up to 6% absolute F1 points. Moreover, our model significantly improves on the previous state-of-the-art model by up to 11% F1.


pdf bib
Video-aided Unsupervised Grammar Induction
Songyang Zhang | Linfeng Song | Lifeng Jin | Kun Xu | Dong Yu | Jiebo Luo
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

We investigate video-aided grammar induction, which learns a constituency parser from both unlabeled text and its corresponding video. Existing methods of multi-modal grammar induction focus on grammar induction from text-image pairs, with promising results showing that the information from static images is useful in induction. However, videos provide even richer information, including not only static objects but also actions and state changes useful for inducing verb phrases. In this paper, we explore rich features (e.g. action, object, scene, audio, face, OCR and speech) from videos, taking the recent Compound PCFG model as the baseline. We further propose a Multi-Modal Compound PCFG model (MMC-PCFG) to effectively aggregate these rich features from different modalities. Our proposed MMC-PCFG is trained end-to-end and outperforms each individual modality and previous state-of-the-art systems on three benchmarks, i.e. DiDeMo, YouCook2 and MSRVTT, confirming the effectiveness of leveraging video information for unsupervised grammar induction.

pdf bib
RAST: Domain-Robust Dialogue Rewriting as Sequence Tagging
Jie Hao | Linfeng Song | Liwei Wang | Kun Xu | Zhaopeng Tu | Dong Yu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

The task of dialogue rewriting aims to reconstruct the latest dialogue utterance by copying the missing content from the dialogue context. Until now, the existing models for this task suffer from the robustness issue, i.e., performances drop dramatically when testing on a different dataset. We address this robustness issue by proposing a novel sequence-tagging-based model so that the search space is significantly reduced, yet the core of this task is still well covered. As a common issue of most tagging models for text generation, the model’s outputs may lack fluency. To alleviate this issue, we inject the loss signal from BLEU or GPT-2 under a REINFORCE framework. Experiments show huge improvements of our model over the current state-of-the-art systems when transferring to another dataset.

pdf bib
Instance-adaptive training with noise-robust losses against noisy labels
Lifeng Jin | Linfeng Song | Kun Xu | Dong Yu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

In order to alleviate the huge demand for annotated datasets for different tasks, many recent natural language processing datasets have adopted automated pipelines for fast-tracking usable data. However, model training with such datasets poses a challenge because popular optimization objectives are not robust to label noise induced in the annotation generation process. Several noise-robust losses have been proposed and evaluated on tasks in computer vision, but they generally use a single dataset-wise hyperparamter to control the strength of noise resistance. This work proposes novel instance-adaptive training frameworks to change single dataset-wise hyperparameters of noise resistance in such losses to be instance-wise. Such instance-wise noise resistance hyperparameters are predicted by special instance-level label quality predictors, which are trained along with the main classification models. Experiments on noisy and corrupted NLP datasets show that proposed instance-adaptive training frameworks help increase the noise-robustness provided by such losses, promoting the use of the frameworks and associated losses in NLP models trained with noisy data.

pdf bib
JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge Graphs
Pei Ke | Haozhe Ji | Yu Ran | Xin Cui | Liwei Wang | Linfeng Song | Xiaoyan Zhu | Minlie Huang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
End-to-End AMR Coreference Resolution
Qiankun Fu | Linfeng Song | Wenyu Du | Yue Zhang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Although parsing to Abstract Meaning Representation (AMR) has become very popular and AMR has been shown effective on the many sentence-level downstream tasks, little work has studied how to generate AMRs that can represent multi-sentence information. We introduce the first end-to-end AMR coreference resolution model in order to build multi-sentence AMRs. Compared with the previous pipeline and rule-based approaches, our model alleviates error propagation and it is more robust for both in-domain and out-domain situations. Besides, the document-level AMRs obtained by our model can significantly improve over the AMRs generated by a rule-based method (Liu et al., 2015) on text summarization.

pdf bib
Semantic Representation for Dialogue Modeling
Xuefeng Bai | Yulong Chen | Linfeng Song | Yue Zhang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Although neural models have achieved competitive results in dialogue systems, they have shown limited ability in representing core semantics, such as ignoring important entities. To this end, we exploit Abstract Meaning Representation (AMR) to help dialogue modeling. Compared with the textual input, AMR explicitly provides core semantic knowledge and reduces data sparsity. We develop an algorithm to construct dialogue-level AMR graphs from sentence-level AMRs and explore two ways to incorporate AMRs into dialogue systems. Experimental results on both dialogue understanding and response generation tasks show the superiority of our model. To our knowledge, we are the first to leverage a formal semantic representation into neural dialogue modeling.

pdf bib
Domain-Adaptive Pretraining Methods for Dialogue Understanding
Han Wu | Kun Xu | Linfeng Song | Lifeng Jin | Haisong Zhang | Linqi Song
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Language models like BERT and SpanBERT pretrained on open-domain data have obtained impressive gains on various NLP tasks. In this paper, we probe the effectiveness of domain-adaptive pretraining objectives on downstream tasks. In particular, three objectives, including a novel objective focusing on modeling predicate-argument relations, are evaluated on two challenging dialogue understanding tasks. Experimental results demonstrate that domain-adaptive pretraining with proper objectives can significantly improve the performance of a strong baseline on these tasks, achieving the new state-of-the-art performances.

pdf bib
TexSmart: A System for Enhanced Natural Language Understanding
Lemao Liu | Haisong Zhang | Haiyun Jiang | Yangming Li | Enbo Zhao | Kun Xu | Linfeng Song | Suncong Zheng | Botong Zhou | Dick Zhu | Xiao Feng | Tao Chen | Tao Yang | Dong Yu | Feng Zhang | ZhanHui Kang | Shuming Shi
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations

This paper introduces TexSmart, a text understanding system that supports fine-grained named entity recognition (NER) and enhanced semantic analysis functionalities. Compared to most previous publicly available text understanding systems and tools, TexSmart holds some unique features. First, the NER function of TexSmart supports over 1,000 entity types, while most other public tools typically support several to (at most) dozens of entity types. Second, TexSmart introduces new semantic analysis functions like semantic expansion and deep semantic representation, that are absent in most previous systems. Third, a spectrum of algorithms (from very fast algorithms to those that are relatively slow but more accurate) are implemented for one function in TexSmart, to fulfill the requirements of different academic and industrial applications. The adoption of unsupervised or weakly-supervised algorithms is especially emphasized, with the goal of easily updating our models to include fresh data with less human annotation efforts.


pdf bib
ZPR2: Joint Zero Pronoun Recovery and Resolution using Multi-Task Learning and BERT
Linfeng Song | Kun Xu | Yue Zhang | Jianshu Chen | Dong Yu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Zero pronoun recovery and resolution aim at recovering the dropped pronoun and pointing out its anaphoric mentions, respectively. We propose to better explore their interaction by solving both tasks together, while the previous work treats them separately. For zero pronoun resolution, we study this task in a more realistic setting, where no parsing trees or only automatic trees are available, while most previous work assumes gold trees. Experiments on two benchmarks show that joint modeling significantly outperforms our baseline that already beats the previous state of the arts.

pdf bib
Structural Information Preserving for Graph-to-Text Generation
Linfeng Song | Ante Wang | Jinsong Su | Yue Zhang | Kun Xu | Yubin Ge | Dong Yu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The task of graph-to-text generation aims at producing sentences that preserve the meaning of input graphs. As a crucial defect, the current state-of-the-art models may mess up or even drop the core structural information of input graphs when generating outputs. We propose to tackle this problem by leveraging richer training signals that can guide our model for preserving input information. In particular, we introduce two types of autoencoding losses, each individually focusing on different aspects (a.k.a. views) of input graphs. The losses are then back-propagated to better calibrate our model via multi-task training. Experiments on two benchmarks for graph-to-text generation show the effectiveness of our approach over a state-of-the-art baseline.

pdf bib
Online Back-Parsing for AMR-to-Text Generation
Xuefeng Bai | Linfeng Song | Yue Zhang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

AMR-to-text generation aims to recover a text containing the same meaning as an input AMR graph. Current research develops increasingly powerful graph encoders to better represent AMR graphs, with decoders based on standard language modeling being used to generate outputs. We propose a decoder that back predicts projected AMR graphs on the target sentence during text generation. As the result, our outputs can better preserve the input meaning than standard decoders. Experiments on two AMR benchmarks show the superiority of our model over the previous state-of-the-art system based on graph Transformer.

pdf bib
Semantic Role Labeling Guided Multi-turn Dialogue ReWriter
Kun Xu | Haochen Tan | Linfeng Song | Han Wu | Haisong Zhang | Linqi Song | Dong Yu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

For multi-turn dialogue rewriting, the capacity of effectively modeling the linguistic knowledge in dialog context and getting ride of the noises is essential to improve its performance. Existing attentive models attend to all words without prior focus, which results in inaccurate concentration on some dispensable words. In this paper, we propose to use semantic role labeling (SRL), which highlights the core semantic information of who did what to whom, to provide additional guidance for the rewriter model. Experiments show that this information significantly improves a RoBERTa-based model that already outperforms previous state-of-the-art systems.

pdf bib
Rich Syntactic and Semantic Information Helps Unsupervised Text Style Transfer
Hongyu Gong | Linfeng Song | Suma Bhat
Proceedings of the 13th International Conference on Natural Language Generation

Text style transfer aims to change an input sentence to an output sentence by changing its text style while preserving the content. Previous efforts on unsupervised text style transfer only use the surface features of words and sentences. As a result, the transferred sentences may either have inaccurate or missing information compared to the inputs. We address this issue by explicitly enriching the inputs via syntactic and semantic structures, from which richer features are then extracted to better capture the original information. Experiments on two text-style-transfer tasks show that our approach improves the content preservation of a strong unsupervised baseline model thereby demonstrating improved transfer performance.


pdf bib
Leveraging Dependency Forest for Neural Medical Relation Extraction
Linfeng Song | Yue Zhang | Daniel Gildea | Mo Yu | Zhiguo Wang | Jinsong Su
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Medical relation extraction discovers relations between entity mentions in text, such as research articles. For this task, dependency syntax has been recognized as a crucial source of features. Yet in the medical domain, 1-best parse trees suffer from relatively low accuracies, diminishing their usefulness. We investigate a method to alleviate this problem by utilizing dependency forests. Forests contain more than one possible decisions and therefore have higher recall but more noise compared with 1-best outputs. A graph neural network is used to represent the forests, automatically distinguishing the useful syntactic information from parsing noise. Results on two benchmarks show that our method outperforms the standard tree-based methods, giving the state-of-the-art results in the literature.

pdf bib
Progressive Self-Supervised Attention Learning for Aspect-Level Sentiment Analysis
Jialong Tang | Ziyao Lu | Jinsong Su | Yubin Ge | Linfeng Song | Le Sun | Jiebo Luo
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In aspect-level sentiment classification (ASC), it is prevalent to equip dominant neural models with attention mechanisms, for the sake of acquiring the importance of each context word on the given aspect. However, such a mechanism tends to excessively focus on a few frequent words with sentiment polarities, while ignoring infrequent ones. In this paper, we propose a progressive self-supervised attention learning approach for neural ASC models, which automatically mines useful attention supervision information from a training corpus to refine attention mechanisms. Specifically, we iteratively conduct sentiment predictions on all training instances. Particularly, at each iteration, the context word with the maximum attention weight is extracted as the one with active/misleading influence on the correct/incorrect prediction of every instance, and then the word itself is masked for subsequent iterations. Finally, we augment the conventional training objective with a regularization term, which enables ASC models to continue equally focusing on the extracted active context words while decreasing weights of those misleading ones. Experimental results on multiple datasets show that our proposed approach yields better attention mechanisms, leading to substantial improvements over the two state-of-the-art neural ASC models. Source code and trained models are available at

pdf bib
SemBleu: A Robust Metric for AMR Parsing Evaluation
Linfeng Song | Daniel Gildea
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Evaluating AMR parsing accuracy involves comparing pairs of AMR graphs. The major evaluation metric, SMATCH (Cai and Knight, 2013), searches for one-to-one mappings between the nodes of two AMRs with a greedy hill-climbing algorithm, which leads to search errors. We propose SEMBLEU, a robust metric that extends BLEU (Papineni et al., 2002) to AMRs. It does not suffer from search errors and considers non-local correspondences in addition to local ones. SEMBLEU is fully content-driven and punishes situations where a system’s output does not preserve most information from the input. Preliminary experiments on both sentence and corpus levels show that SEMBLEU has slightly higher consistency with human judgments than SMATCH. Our code is available at freesunshine0316/sembleu.

pdf bib
Multi-Granular Text Encoding for Self-Explaining Categorization
Zhiguo Wang | Yue Zhang | Mo Yu | Wei Zhang | Lin Pan | Linfeng Song | Kun Xu | Yousef El-Kurdi
Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Self-explaining text categorization requires a classifier to make a prediction along with supporting evidence. A popular type of evidence is sub-sequences extracted from the input text which are sufficient for the classifier to make the prediction. In this work, we define multi-granular ngrams as basic units for explanation, and organize all ngrams into a hierarchical structure, so that shorter ngrams can be reused while computing longer ngrams. We leverage the tree-structured LSTM to learn a context-independent representation for each unit via parameter sharing. Experiments on medical disease classification show that our model is more accurate, efficient and compact than the BiLSTM and CNN baselines. More importantly, our model can extract intuitive multi-granular evidence to support its predictions.

pdf bib
Semantic Neural Machine Translation Using AMR
Linfeng Song | Daniel Gildea | Yue Zhang | Zhiguo Wang | Jinsong Su
Transactions of the Association for Computational Linguistics, Volume 7

It is intuitive that semantic representations can be useful for machine translation, mainly because they can help in enforcing meaning preservation and handling data sparsity (many sentences correspond to one meaning) of machine translation models. On the other hand, little work has been done on leveraging semantics for neural machine translation (NMT). In this work, we study the usefulness of AMR (abstract meaning representation) on NMT. Experiments on a standard English-to-German dataset show that incorporating AMR as additional knowledge can significantly improve a strong attention-based sequence-to-sequence neural translation model.


pdf bib
Neural Transition-based Syntactic Linearization
Linfeng Song | Yue Zhang | Daniel Gildea
Proceedings of the 11th International Conference on Natural Language Generation

The task of linearization is to find a grammatical order given a set of words. Traditional models use statistical methods. Syntactic linearization systems, which generate a sentence along with its syntactic tree, have shown state-of-the-art performance. Recent work shows that a multilayer LSTM language model outperforms competitive statistical syntactic linearization systems without using syntax. In this paper, we study neural syntactic linearization, building a transition-based syntactic linearizer leveraging a feed forward neural network, observing significantly better results compared to LSTM language models on this task.

pdf bib
N-ary Relation Extraction using Graph-State LSTM
Linfeng Song | Yue Zhang | Zhiguo Wang | Daniel Gildea
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Cross-sentence n-ary relation extraction detects relations among n entities across multiple sentences. Typical methods formulate an input as a document graph, integrating various intra-sentential and inter-sentential dependencies. The current state-of-the-art method splits the input graph into two DAGs, adopting a DAG-structured LSTM for each. Though being able to model rich linguistic knowledge by leveraging graph edges, important information can be lost in the splitting procedure. We propose a graph-state LSTM model, which uses a parallel state to model each word, recurrently enriching state values via message passing. Compared with DAG LSTMs, our graph LSTM keeps the original graph structure, and speeds up computation by allowing more parallelization. On a standard benchmark, our model shows the best result in the literature.

pdf bib
Sentence-State LSTM for Text Representation
Yue Zhang | Qi Liu | Linfeng Song
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Bi-directional LSTMs are a powerful tool for text representation. On the other hand, they have been shown to suffer various limitations due to their sequential nature. We investigate an alternative LSTM structure for encoding text, which consists of a parallel state for each word. Recurrent steps are used to perform local and global information exchange between words simultaneously, rather than incremental reading of a sequence of words. Results on various classification and sequence labelling benchmarks show that the proposed model has strong representation power, giving highly competitive performances compared to stacked BiLSTM models with similar parameter numbers.

pdf bib
A Graph-to-Sequence Model for AMR-to-Text Generation
Linfeng Song | Yue Zhang | Zhiguo Wang | Daniel Gildea
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The problem of AMR-to-text generation is to recover a text representing the same meaning as an input AMR graph. The current state-of-the-art method uses a sequence-to-sequence model, leveraging LSTM for encoding a linearized AMR structure. Although being able to model non-local semantic information, a sequence LSTM can lose information from the AMR graph structure, and thus facing challenges with large-graphs, which result in long sequences. We introduce a neural graph-to-sequence model, using a novel LSTM structure for directly encoding graph-level semantics. On a standard benchmark, our model shows superior results to existing methods in the literature.

pdf bib
Sequence-to-sequence Models for Cache Transition Systems
Xiaochang Peng | Linfeng Song | Daniel Gildea | Giorgio Satta
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this paper, we present a sequence-to-sequence based approach for mapping natural language sentences to AMR semantic graphs. We transform the sequence to graph mapping problem to a word sequence to transition action sequence problem using a special transition system called a cache transition system. To address the sparsity issue of neural AMR parsing, we feed feature embeddings from the transition state to provide relevant local information for each decoder state. We present a monotonic hard attention model for the transition framework to handle the strictly left-to-right alignment between each transition state and the current buffer input focus. We evaluate our neural transition model on the AMR parsing task, and our parser outperforms other sequence-to-sequence approaches and achieves competitive results in comparison with the best-performing models.

pdf bib
Leveraging Context Information for Natural Question Generation
Linfeng Song | Zhiguo Wang | Wael Hamza | Yue Zhang | Daniel Gildea
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

The task of natural question generation is to generate a corresponding question given the input passage (fact) and answer. It is useful for enlarging the training set of QA systems. Previous work has adopted sequence-to-sequence models that take a passage with an additional bit to indicate answer position as input. However, they do not explicitly model the information between answer and other context within the passage. We propose a model that matches the answer with the passage before generating the question. Experiments show that our model outperforms the existing state of the art using rich features.


pdf bib
AMR-to-text Generation with Synchronous Node Replacement Grammar
Linfeng Song | Xiaochang Peng | Yue Zhang | Zhiguo Wang | Daniel Gildea
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

This paper addresses the task of AMR-to-text generation by leveraging synchronous node replacement grammar. During training, graph-to-string rules are learned using a heuristic extraction algorithm. At test time, a graph transducer is applied to collapse input AMRs and generate output sentences. Evaluated on a standard benchmark, our method gives the state-of-the-art result.


pdf bib
Sense Embedding Learning for Word Sense Induction
Linfeng Song | Zhiguo Wang | Haitao Mi | Daniel Gildea
Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics

pdf bib
AMR-to-text generation as a Traveling Salesman Problem
Linfeng Song | Yue Zhang | Xiaochang Peng | Zhiguo Wang | Daniel Gildea
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing


pdf bib
A Synchronous Hyperedge Replacement Grammar based approach for AMR parsing
Xiaochang Peng | Linfeng Song | Daniel Gildea
Proceedings of the Nineteenth Conference on Computational Natural Language Learning


pdf bib
Syntactic SMT Using a Discriminative Text Generation Model
Yue Zhang | Kai Song | Linfeng Song | Jingbo Zhu | Qun Liu
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)


pdf bib
Translation with Source Constituency and Dependency Trees
Fandong Meng | Jun Xie | Linfeng Song | Yajuan Lü | Qun Liu
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing


pdf bib
Bagging-based System Combination for Domain Adaption
Linfeng Song | Haitao Mi | Yajuan Lü | Qun Liu
Proceedings of Machine Translation Summit XIII: Papers

pdf bib
ETS: An Error Tolerable System for Coreference Resolution
Hao Xiong | Linfeng Song | Fandong Meng | Yang Liu | Qun Liu | Yajuan Lv
Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task