Ping Li


2024

pdf bib
Message Passing on Semantic-Anchor-Graphs for Fine-grained Emotion Representation Learning and Classification
Pinyi Zhang | Jingyang Chen | Junchen Shen | Zijie Zhai | Ping Li | Jie Zhang | Kai Zhang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Emotion classification has wide applications in education, robotics, virtual reality, etc. However, identifying subtle differences between fine-grained emotion categories remains challenging. Current methods typically aggregate numerous token embeddings of a sentence into a single vector, which, while being an efficient compressor, may not fully capture complex semantic and temporal distributions. To solve this problem, we propose SEmantic ANchor Graph Neural Networks (SEAN-GNN) for fine-grained emotion classification. It learns a group of representative, multi-faceted semantic anchors in the token embedding space: using these anchors as a global reference, any sentence can be projected onto them to form a “semantic-anchor graph”, with node attributes and edge weights quantifying the semantic and temporal information respectively. The graph structure is well aligned across sentences and, importantly, allows for generating comprehensive emotion representations regarding K different anchors. Message passing on this graph can further integrate and refine the learned features. Empirically, SEAN-GNN can generate meaningful semantic anchors and discriminative graph patterns for different emotion, with promising classification results on 6 popular benchmark datasets against state-of-the-arts.

pdf bib
ATQ: Activation Transformation forWeight-Activation Quantization of Large Language Models
Yundong Gai | Ping Li
Findings of the Association for Computational Linguistics: EMNLP 2024

There are many emerging quantization methods to resolve the problem that the huge demand on computational and storage costs hinders the deployment of Large language models (LLMs). However, their accuracy performance still can not satisfy the entire academic and industry community. In this work, we propose ATQ, an INT8 weight-activation quantization of LLMs, that can achieve almost lossless accuracy. We employ a mathematically equivalent transformation and a triangle inequality to constrain weight-activation quantization error to the sum of a weight quantization error and an activation quantization error. For the weight part, transformed weights are quantized along the |in-feature| dimension and the quantization error is compensated by optimizing following in-features. For the activation part, transformed activations are in the normal range and can be quantized easily. We provide comparison experiments to demonstrate that our ATQ method can achieve almost lossless in accuracy on OPT and LLaMA families in W8A8 quantization settings. The increase of perplexity is within 1 and the accuracy degradation is within 0.5 percent even in the worst case.

2023

pdf bib
A Semi-Autoregressive Graph Generative Model for Dependency Graph Parsing
Ye Ma | Mingming Sun | Ping Li
Findings of the Association for Computational Linguistics: ACL 2023

Recent years have witnessed the impressive progress in Neural Dependency Parsing. According to the different factorization approaches to the graph joint probabilities, existing parsers can be roughly divided into autoregressive and non-autoregressive patterns. The former means that the graph should be factorized into multiple sequentially dependent components, then it can be built up component by component. And the latter assumes these components to be independent so that they can be outputted in a one-shot manner. However, when treating the directed edge as an explicit dependency relationship, we discover that there is a mixture of independent and interdependent components in the dependency graph, signifying that both aforementioned models fail to precisely capture the explicit dependencies among nodes and edges. Based on this property, we design a Semi-Autoregressive Dependency Parser to generate dependency graphs via adding node groups and edge groups autoregressively while pouring out all group elements in parallel. The model gains a trade-off between non-autoregression and autoregression, which respectively suffer from the lack of target inter-dependencies and the uncertainty of graph generation orders. The experiments show the proposed parser outperforms strong baselines on Enhanced Universal Dependencies of multiple languages, especially achieving 4% average promotion at graph-level accuracy. Also, the performances of model variations show the importance of specific parts.

pdf bib
Denoising Enhanced Distantly Supervised Ultrafine Entity Typing
Yue Zhang | Hongliang Fei | Ping Li
Findings of the Association for Computational Linguistics: ACL 2023

Recently, the task of distantly supervised (DS) ultra-fine entity typing has received significant attention. However, DS data is noisy and often suffers from missing or wrong labeling issues resulting in low precision and low recall. This paper proposes a novel ultra-fine entity typing model with denoising capability. Specifically, we build a noise model to estimate the unknown labeling noise distribution over input contexts and noisy type labels. With the noise model, more trustworthy labels can be recovered by subtracting the estimated noise from the input. Furthermore, we propose an entity typing model, which adopts a bi-encoder architecture, is trained on the denoised data. Finally, the noise model and entity typing model are trained iteratively to enhance each other. We conduct extensive experiments on the Ultra-Fine entity typing dataset as well as OntoNotes dataset and demonstrate that our approach significantly outperforms other baseline methods.

pdf bib
Dual-Channel Span for Aspect Sentiment Triplet Extraction
Pan Li | Ping Li | Kai Zhang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Aspect Sentiment Triplet Extraction (ASTE) is one of the compound tasks of fine-grained aspect-based sentiment analysis (ABSA), aiming at extracting the triplets of aspect terms, corresponding opinion terms and the associated sentiment orientation. Recent efforts in exploiting span-level semantic interaction shown superior performance on ASTE task. However, most of the existing span-based approaches suffer from enumerating all possible spans, since it can introduce too much noise in sentiment triplet extraction. To ease this burden, we propose a dual-channel span generation method to coherently constrain the search space of span candidates. Specifically, we leverage the syntactic relations among aspect/opinion terms and the associated part-of-speech characteristics in those terms to generate span candidates, which reduces span enumeration by nearly half. Besides, feature representations are learned from syntactic and part-of-speech correlation among terms, which renders span representation fruitful linguistic information. Extensive experiments on two versions of public datasets demonstrate both the effectiveness of our design and the superiority on ASTE/ATE/OTE tasks.

2022

pdf bib
Continual Learning for Natural Language Generations with Transformer Calibration
Peng Yang | Dingcheng Li | Ping Li
Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL)

Conventional natural language process (NLP) generation models are trained offline with a given dataset for a particular task, which is referred to as isolated learning. Research on sequence-to-sequence language generation aims to study continual learning model to constantly learning from sequentially encountered tasks. However, continual learning studies often suffer from catastrophic forgetting, a persistent challenge for lifelong learning. In this paper, we present a novel NLP transformer model that attempts to mitigate catastrophic forgetting in online continual learning from a new perspective, i.e., attention calibration. We model the attention in the transformer as a calibrated unit in a general formulation, where the attention calibration could give benefits to balance the stability and plasticity of continual learning algorithms through influencing both their forward inference path and backward optimization path. Our empirical experiments, paraphrase generation and dialog response generation, demonstrate that this work outperforms state-of-the-art models by a considerable margin and effectively mitigate the forgetting.

pdf bib
Learning to Selectively Learn for Weakly Supervised Paraphrase Generation with Model-based Reinforcement Learning
Haiyan Yin | Dingcheng Li | Ping Li
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Paraphrase generation is an important language generation task attempting to interpret user intents and systematically generate new phrases of identical meanings to the given ones. However, the effectiveness of paraphrase generation is constrained by the access to the golden labeled data pairs where both the amount and the quality of the training data pairs are restricted. In this paper, we propose a new weakly supervised paraphrase generation approach that extends the success of a recent work that leverages reinforcement learning for effective model training with data selection. While data selection is privileged for the target task which has noisy data, developing a reinforced selective learning regime faces several unresolved challenges. In this paper, we carry on important discussions about the above problem and present a new model that could partially overcome the discussed issues with a model-based planning feature and a reward normalization feature. We perform extensive evaluation on four weakly supervised paraphrase generation tasks where the results show that our method could significantly improve the state-of-the-art performance on the evaluation datasets.

pdf bib
OIE@OIA: an Adaptable and Efficient Open Information Extraction Framework
Xin Wang | Minlong Peng | Mingming Sun | Ping Li
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Different Open Information Extraction (OIE) tasks require different types of information, so the OIE field requires strong adaptability of OIE algorithms to meet different task requirements. This paper discusses the adaptability problem in existing OIE systems and designs a new adaptable and efficient OIE system - OIE@OIA as a solution. OIE@OIA follows the methodology of Open Information eXpression (OIX): parsing a sentence to an Open Information Annotation (OIA) Graph and then adapting the OIA graph to different OIE tasks with simple rules. As the core of our OIE@OIA system, we implement an end-to-end OIA generator by annotating a dataset (we make it open available) and designing an efficient learning algorithm for the complex OIA graph. We easily adapt the OIE@OIA system to accomplish three popular OIE tasks. The experimental show that our OIE@OIA achieves new SOTA performances on these tasks, showing the great adaptability of our OIE@OIA system. Furthermore, compared to other end-to-end OIE baselines that need millions of samples for training, our OIE@OIA needs much fewer training samples (12K), showing a significant advantage in terms of efficiency.

pdf bib
PromptGen: Automatically Generate Prompts using Generative Models
Yue Zhang | Hongliang Fei | Dingcheng Li | Ping Li
Findings of the Association for Computational Linguistics: NAACL 2022

Recently, prompt learning has received significant attention, where the downstream tasks are reformulated to the mask-filling task with the help of a textual prompt. The key point of prompt learning is finding the most appropriate prompt. This paper proposes a novel model PromptGen, which can automatically generate prompts conditional on the input sentence. PromptGen is the first work considering dynamic prompt generation for knowledge probing, based on a pre-trained generative model. To mitigate any label information leaking from the pre-trained generative model, when given a generated prompt, we replace the query input with “None”. We pursue that this perturbed context-free prompt cannot trigger the correct label. We evaluate our model on the knowledge probing LAMA benchmark, and show that PromptGen significantly outperforms other baselines.

pdf bib
Multi-Hop Open-Domain Question Answering over Structured and Unstructured Knowledge
Yue Feng | Zhen Han | Mingming Sun | Ping Li
Findings of the Association for Computational Linguistics: NAACL 2022

Open-domain question answering systems need to answer question of our interests with structured and unstructured information. However, existing approaches only select one source to generate answer or only conduct reasoning on structured information. In this paper, we pro- pose a Document-Entity Heterogeneous Graph Network, referred to as DEHG, to effectively integrate different sources of information, and conduct reasoning on heterogeneous information. DEHG employs a graph constructor to integrate structured and unstructured information, a context encoder to represent nodes and question, a heterogeneous information reasoning layer to conduct multi-hop reasoning on both information sources, and an answer decoder to generate answers for the question. Experimental results on HybirdQA dataset show that DEHG outperforms the state-of-the-art methods.

pdf bib
Cross-Lingual Cross-Modal Consolidation for Effective Multilingual Video Corpus Moment Retrieval
Jiaheng Liu | Tan Yu | Hanyu Peng | Mingming Sun | Ping Li
Findings of the Association for Computational Linguistics: NAACL 2022

Existing multilingual video corpus moment retrieval (mVCMR) methods are mainly based on a two-stream structure. The visual stream utilizes the visual content in the video to estimate the query-visual similarity, and the subtitle stream exploits the query-subtitle similarity. The final query-video similarity ensembles similarities from two streams. In our work, we pro- pose a simple and effective strategy termed as Cross-lingual Cross-modal Consolidation (C3 ) to improve mVCMR accuracy. We adopt the ensemble similarity as the teacher to guide the training of each stream, leading to a more powerful ensemble similarity. Meanwhile, we use the teacher for a specific language to guide the student for another language to exploit the complementary knowledge across languages. Ex- tensive experiments on mTVR dataset demonstrate the effectiveness of our C3 method.

2021

pdf bib
Cross-lingual Cross-modal Pretraining for Multimodal Retrieval
Hongliang Fei | Tan Yu | Ping Li
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Recent pretrained vision-language models have achieved impressive performance on cross-modal retrieval tasks in English. Their success, however, heavily depends on the availability of many annotated image-caption datasets for pretraining, where the texts are not necessarily in English. Although we can utilize machine translation (MT) tools to translate non-English text to English, the performance still largely relies on MT’s quality and may suffer from high latency problems in real-world applications. This paper proposes a new approach to learn cross-lingual cross-modal representations for matching images and their relevant captions in multiple languages. We seamlessly combine cross-lingual pretraining objectives and cross-modal pretraining objectives in a unified framework to learn image and text in a joint embedding space from available English image-caption data, monolingual and parallel corpus. We show that our approach achieves SOTA performance in retrieval tasks on two multimodal multilingual image caption benchmarks: Multi30k with German captions and MSCOCO with Japanese captions.

pdf bib
A Network Science Approach to Bilingual Code-switching
Qihui Xu | Magdalena Markowska | Martin Chodorow | Ping Li
Proceedings of the Society for Computation in Linguistics 2021

pdf bib
A Deep Decomposable Model for Disentangling Syntax and Semantics in Sentence Representation
Dingcheng Li | Hongliang Fei | Shaogang Ren | Ping Li
Findings of the Association for Computational Linguistics: EMNLP 2021

Recently, disentanglement based on a generative adversarial network or a variational autoencoder has significantly advanced the performance of diverse applications in CV and NLP domains. Nevertheless, those models still work on coarse levels in the disentanglement of closely related properties, such as syntax and semantics in human languages. This paper introduces a deep decomposable model based on VAE to disentangle syntax and semantics by using total correlation penalties on KL divergences. Notably, we decompose the KL divergence term of the original VAE so that the generated latent variables can be separated in a more clear-cut and interpretable way. Experiments on benchmark datasets show that our proposed model can significantly improve the disentanglement quality between syntactic and semantic representations for semantic similarity tasks and syntactic similarity tasks.

pdf bib
Inflate and Shrink:Enriching and Reducing Interactions for Fast Text-Image Retrieval
Haoliang Liu | Tan Yu | Ping Li
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

By exploiting the cross-modal attention, cross-BERT methods have achieved state-of-the-art accuracy in cross-modal retrieval. Nevertheless, the heavy text-image interactions in the cross-BERT model are prohibitively slow for large-scale retrieval. Late-interaction methods trade off retrieval accuracy and efficiency by exploiting cross-modal interaction only in the late stage, attaining a satisfactory retrieval speed. In this work, we propose an inflating and shrinking approach to further boost the efficiency and accuracy of late-interaction methods. The inflating operation plugs several codes in the input of the encoder to exploit the text-image interactions more thoroughly for higher retrieval accuracy. Then the shrinking operation gradually reduces the text-image interactions through knowledge distilling for higher efficiency. Through an inflating operation followed by a shrinking operation, both efficiency and accuracy of a late-interaction model are boosted. Systematic experiments on public benchmarks demonstrate the effectiveness of our inflating and shrinking approach.

2020

pdf bib
Cross-Lingual Unsupervised Sentiment Classification with Multi-View Transfer Learning
Hongliang Fei | Ping Li
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Recent neural network models have achieved impressive performance on sentiment classification in English as well as other languages. Their success heavily depends on the availability of a large amount of labeled data or parallel corpus. In this paper, we investigate an extreme scenario of cross-lingual sentiment classification, in which the low-resource language does not have any labels or parallel corpus. We propose an unsupervised cross-lingual sentiment classification model named multi-view encoder-classifier (MVEC) that leverages an unsupervised machine translation (UMT) system and a language discriminator. Unlike previous language model (LM) based fine-tuning approaches that adjust parameters solely based on the classification error on training data, we employ the encoder-decoder framework of a UMT as a regularization component on the shared network parameters. In particular, the cross-lingual encoder of our model learns a shared representation, which is effective for both reconstructing input sentences of two languages and generating more representative views from the input for classification. Extensive experiments on five language pairs verify that our model significantly outperforms other models for 8/11 sentiment classification tasks.

pdf bib
Learning Interpretable Relationships between Entities, Relations and Concepts via Bayesian Structure Learning on Open Domain Facts
Jingyuan Zhang | Mingming Sun | Yue Feng | Ping Li
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Concept graphs are created as universal taxonomies for text understanding in the open-domain knowledge. The nodes in concept graphs include both entities and concepts. The edges are from entities to concepts, showing that an entity is an instance of a concept. In this paper, we propose the task of learning interpretable relationships from open-domain facts to enrich and refine concept graphs. The Bayesian network structures are learned from open-domain facts as the interpretable relationships between relations of facts and concepts of entities. We conduct extensive experiments on public English and Chinese datasets. Compared to the state-of-the-art methods, the learned network structures help improving the identification of concepts for entities based on the relations of entities on both datasets.

pdf bib
A Predicate-Function-Argument Annotation of Natural Language for Open-Domain Information eXpression
Mingming Sun | Wenyue Hua | Zoey Liu | Xin Wang | Kangjie Zheng | Ping Li
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Existing OIE (Open Information Extraction) algorithms are independent of each other such that there exist lots of redundant works; the featured strategies are not reusable and not adaptive to new tasks. This paper proposes a new pipeline to build OIE systems, where an Open-domain Information eXpression (OIX) task is proposed to provide a platform for all OIE strategies. The OIX is an OIE friendly expression of a sentence without information loss. The generation procedure of OIX contains shared works of OIE algorithms so that OIE strategies can be developed on the platform of OIX as inference operations focusing on more critical problems. Based on the same platform of OIX, the OIE strategies are reusable, and people can select a set of strategies to assemble their algorithm for a specific task so that the adaptability may be significantly increased. This paper focuses on the task of OIX and propose a solution – Open Information Annotation (OIA). OIA is a predicate-function-argument annotation for sentences. We label a data set of sentence-OIA pairs and propose a dependency-based rule system to generate OIA annotations from sentences. The evaluation results reveal that learning the OIA from a sentence is a challenge owing to the complexity of natural language sentences, and it is worthy of attracting more attention from the research community.

2019

pdf bib
End-to-end Deep Reinforcement Learning Based Coreference Resolution
Hongliang Fei | Xu Li | Dingcheng Li | Ping Li
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Recent neural network models have significantly advanced the task of coreference resolution. However, current neural coreference models are usually trained with heuristic loss functions that are computed over a sequence of local decisions. In this paper, we introduce an end-to-end reinforcement learning based coreference resolution model to directly optimize coreference evaluation metrics. Specifically, we modify the state-of-the-art higher-order mention ranking approach in Lee et al. (2018) to a reinforced policy gradient model by incorporating the reward associated with a sequence of coreference linking actions. Furthermore, we introduce maximum entropy regularization for adequate exploration to prevent the model from prematurely converging to a bad local optimum. Our proposed model achieves new state-of-the-art performance on the English OntoNotes v5.0 benchmark.

pdf bib
Integration of Knowledge Graph Embedding Into Topic Modeling with Hierarchical Dirichlet Process
Dingcheng Li | Siamak Zamani | Jingyuan Zhang | Ping Li
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Leveraging domain knowledge is an effective strategy for enhancing the quality of inferred low-dimensional representations of documents by topic models. In this paper, we develop topic modeling with knowledge graph embedding (TMKGE), a Bayesian nonparametric model to employ knowledge graph (KG) embedding in the context of topic modeling, for extracting more coherent topics. Specifically, we build a hierarchical Dirichlet process (HDP) based model to flexibly borrow information from KG to improve the interpretability of topics. An efficient online variational inference method based on a stick-breaking construction of HDP is developed for TMKGE, making TMKGE suitable for large document corpora and KGs. Experiments on three public datasets illustrate the superior performance of TMKGE in terms of topic coherence and document classification accuracy, compared to state-of-the-art topic modeling methods.

pdf bib
Reinforced Product Metadata Selection for Helpfulness Assessment of Customer Reviews
Miao Fan | Chao Feng | Mingming Sun | Ping Li
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

To automatically assess the helpfulness of a customer review online, conventional approaches generally acquire various linguistic and neural embedding features solely from the textual content of the review itself as the evidence. We, however, find out that a helpful review is largely concerned with the metadata (such as the name, the brand, the category, etc.) of its target product. It leaves us with a challenge of how to choose the correct key-value product metadata to help appraise the helpfulness of free-text reviews more precisely. To address this problem, we propose a novel framework composed of two mutual-benefit modules. Given a product, a selector (agent) learns from both the keys in the product metadata and one of its reviews to take an action that selects the correct value, and a successive predictor (network) makes the free-text review attend to this value to obtain better neural representations for helpfulness assessment. The predictor is directly optimized by SGD with the loss of helpfulness prediction, and the selector could be updated via policy gradient rewarded with the performance of the predictor. We use two real-world datasets from Amazon.com and Yelp.com, respectively, to compare the performance of our framework with other mainstream methods under two application scenarios: helpfulness identification and regression of customer reviews. Extensive results demonstrate that our framework can achieve state-of-the-art performance with substantial improvements.

pdf bib
On Efficient Retrieval of Top Similarity Vectors
Shulong Tan | Zhixin Zhou | Zhaozhuo Xu | Ping Li
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Retrieval of relevant vectors produced by representation learning critically influences the efficiency in natural language processing (NLP) tasks. In this paper, we demonstrate an efficient method for searching vectors via a typical non-metric matching function: inner product. Our method, which constructs an approximate Inner Product Delaunay Graph (IPDG) for top-1 Maximum Inner Product Search (MIPS), transforms retrieving the most suitable latent vectors into a graph search problem with great benefits of efficiency. Experiments on data representations learned for different machine learning tasks verify the outperforming effectiveness and efficiency of the proposed IPDG.

2018

pdf bib
Logician and Orator: Learning from the Duality between Language and Knowledge in Open Domain
Mingming Sun | Xu Li | Ping Li
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We propose the task of Open-Domain Information Narration (OIN) as the reverse task of Open Information Extraction (OIE), to implement the dual structure between language and knowledge in the open domain. Then, we develop an agent, called Orator, to accomplish the OIN task, and assemble the Orator and the recently proposed OIE agent — Logician into a dual system to utilize the duality structure with a reinforcement learning paradigm. Experimental results reveal the dual structure between OIE and OIN tasks helps to build better both OIE agents and OIN agents.

2007

pdf bib
A Sketch Algorithm for Estimating Two-Way and Multi-Way Associations
Ping Li | Kenneth W. Church
Computational Linguistics, Volume 33, Number 3, September 2007

2005

pdf bib
Using Sketches to Estimate Associations
Ping Li | Kenneth W. Church
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing