Chenguang Wang


2024

pdf bib
Measuring Social Norms of Large Language Models
Ye Yuan | Kexin Tang | Jianhao Shen | Ming Zhang | Chenguang Wang
Findings of the Association for Computational Linguistics: NAACL 2024

We present a new challenge to examine whether large language models understand social norms. In contrast to existing datasets, our dataset requires a fundamental understanding of social norms to solve. Our dataset features the largest set of social norm skills, consisting of 402 skills and 12,383 questions covering a wide set of social norms ranging from opinions and arguments to culture and laws. We design our dataset according to the K-12 curriculum. This enables the direct comparison of the social understanding of large language models to humans, more specifically, elementary students. While prior work generates nearly random accuracy on our benchmark, recent large language models such as GPT3.5-Turbo and LLaMA2-Chat are able to improve the performance significantly, only slightly below human performance. We then propose a multi-agent framework based on large language models to improve the models’ ability to understand social norms. This method further improves large language models to be on par with humans. Given the increasing adoption of large language models in real-world applications, our finding is particularly important and presents a unique direction for future improvements.

pdf bib
Re-Tuning: Overcoming the Compositionality Limits of Large Language Models with Recursive Tuning
Eric Pasewark | Kyle Montgomery | Kefei Duan | Dawn Song | Chenguang Wang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present a new method for large language models to solve compositional tasks. Although they have shown strong performance on traditional language understanding tasks, large language models struggle to solve compositional tasks, where the solution depends on solving smaller instances of the same problem. We propose a natural approach to solve compositional tasks recursively. Our method, Re-Tuning, tunes models to break down a problem into subproblems, solve those subproblems, and combine the results. We show that our method significantly improves model performance on three representative compositional tasks: integer addition, dynamic programming, and parity. Compared to state-of-the-art methods that keep intermediate steps towards solving the problems, Re-Tuning achieves significantly higher accuracy and is more GPU memory efficient.

2022

pdf bib
IELM: An Open Information Extraction Benchmark for Pre-Trained Language Models
Chenguang Wang | Xiao Liu | Dawn Song
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

We introduce a new open information extraction (OIE) benchmark for pre-trained language models (LM). Recent studies have demonstrated that pre-trained LMs, such as BERT and GPT, may store linguistic and relational knowledge. In particular, LMs are able to answer “fill-in-the-blank” questions when given a pre-defined relation category. Instead of focusing on pre-defined relations, we create an OIE benchmark aiming to fully examine the open relational information present in the pre-trained LMs. We accomplish this by turning pre-trained LMs into zero-shot OIE systems. Surprisingly, pre-trained LMs are able to obtain competitive performance on both standard OIE datasets (CaRB and Re-OIE2016) and two new large-scale factual OIE datasets (TAC KBP-OIE and Wikidata-OIE) that we establish via distant supervision. For instance, the zero-shot pre-trained LMs outperform the F1 score of the state-of-the-art supervised OIE methods on our factual OIE datasets without needing to use any training sets.

pdf bib
DeepStruct: Pretraining of Language Models for Structure Prediction
Chenguang Wang | Xiao Liu | Zui Chen | Haoyun Hong | Jie Tang | Dawn Song
Findings of the Association for Computational Linguistics: ACL 2022

We introduce a method for improving the structural understanding abilities of language models. Unlike previous approaches that finetune the models with task-specific augmentation, we pretrain language models to generate structures from the text on a collection of task-agnostic corpora. Our structure pretraining enables zero-shot transfer of the learned knowledge that models have about the structure tasks. We study the performance of this approach on 28 datasets, spanning 10 structure prediction tasks including open information extraction, joint entity and relation extraction, named entity recognition, relation classification, semantic role labeling, event extraction, coreference resolution, factual probe, intent detection, and dialogue state tracking. We further enhance the pretraining with the task-specific training sets. We show that a 10B parameter language model transfers non-trivially to most tasks and obtains state-of-the-art performance on 21 of 28 datasets that we evaluate. Our code and datasets will be made publicly available.

pdf bib
Fine-mixing: Mitigating Backdoors in Fine-tuned Language Models
Zhiyuan Zhang | Lingjuan Lyu | Xingjun Ma | Chenguang Wang | Xu Sun
Findings of the Association for Computational Linguistics: EMNLP 2022

Deep Neural Networks (DNNs) are known to be vulnerable to backdoor attacks. In Natural Language Processing (NLP), DNNs are often backdoored during the fine-tuning process of a large-scale Pre-trained Language Model (PLM) with poisoned samples. Although the clean weights of PLMs are readily available, existing methods have ignored this information in defending NLP models against backdoor attacks. In this work, we take the first step to exploit the pre-trained (unfine-tuned) weights to mitigate backdoors in fine-tuned language models. Specifically, we leverage the clean pre-trained weights via two complementary techniques: (1) a two-step Fine-mixing technique, which first mixes the backdoored weights (fine-tuned on poisoned data) with the pre-trained weights, then fine-tunes the mixed weights on a small subset of clean data; (2) an Embedding Purification (E-PUR) technique, which mitigates potential backdoors existing in the word embeddings. We compare Fine-mixing with typical backdoor mitigation methods on three single-sentence sentiment classification tasks and two sentence-pair classification tasks and show that it outperforms the baselines by a considerable margin in all scenarios. We also show that our E-PUR method can benefit existing mitigation methods. Our work establishes a simple but strong baseline defense for secure fine-tuned NLP models against backdoor attacks.

pdf bib
Benchmarking Language Models for Code Syntax Understanding
Da Shen | Xinyun Chen | Chenguang Wang | Koushik Sen | Dawn Song
Findings of the Association for Computational Linguistics: EMNLP 2022

Pre-trained language models have demonstrated impressive performance in both natural language processing and program understanding, which represent the input as a token sequence without explicitly modeling its structure. Some prior works show that pre-trained language models can capture the syntactic rules of natural languages without finetuning on syntax understanding tasks. However, there is limited understanding of how well pre-trained models understand the code structure so far. In this work, we perform the first thorough benchmarking of the state-of-the-art pre-trained models for identifying the syntactic structures of programs. Specifically, we introduce CodeSyntax, a large-scale dataset of programs annotated with the syntactic relationships in their corresponding abstract syntax trees. Our key observation is that pre-training on massive code data does not result in decent code syntax understanding. In fact, these pre-trained programming language models fail to match the performance of naive baselines based on positional offsets and keywords. We also present a natural language benchmark to highlight the differences between natural languages and programming languages in terms of understanding corresponding syntactic structures. Our findings point out key limitations of existing pre-training methods and suggest the importance of modeling syntactic structures for the programming language.

pdf bib
PALT: Parameter-Lite Transfer of Language Models for Knowledge Graph Completion
Jianhao Shen | Chenguang Wang | Ye Yuan | Jiawei Han | Heng Ji | Koushik Sen | Ming Zhang | Dawn Song
Findings of the Association for Computational Linguistics: EMNLP 2022

This paper presents a parameter-lite transfer learning approach of pretrained language models (LM) for knowledge graph (KG) completion. Instead of finetuning, which modifies all LM parameters, we only tune a few new parameters while keeping the original LM parameters fixed. We establish this via reformulating KG completion as a “fill-in-the-blank” task, and introducing a parameter-lite encoder on top of the original LMs. We show that, by tuning far fewer parameters than finetuning, LMs transfer non-trivially to most tasks and reach competitiveness with prior state-of-the-art approaches. For instance, we outperform the fully finetuning approaches on a KG completion benchmark by tuning only 1% of the parameters.

pdf bib
Joint Language Semantic and Structure Embedding for Knowledge Graph Completion
Jianhao Shen | Chenguang Wang | Linyuan Gong | Dawn Song
Proceedings of the 29th International Conference on Computational Linguistics

The task of completing knowledge triplets has broad downstream applications. Both structural and semantic information plays an important role in knowledge graph completion. Unlike previous approaches that rely on either the structures or semantics of the knowledge graphs, we propose to jointly embed the semantics in the natural language description of the knowledge triplets with their structure information. Our method embeds knowledge graphs for the completion task via fine-tuning pre-trained language models with respect to a probabilistic structured loss, where the forward pass of the language models captures semantics and the loss reconstructs structures. Our extensive experiments on a variety of knowledge graph benchmarks have demonstrated the state-of-the-art performance of our method. We also show that our method can significantly improve the performance in a low-resource regime, thanks to the better use of semantics. The code and datasets are available at https://github.com/pkusjh/LASS.

2021

pdf bib
基于风格化嵌入的中文文本风格迁移(Chinese text style transfer based on stylized embedding)
Chenguang Wang (王晨光) | Hongfei Lin (林鸿飞) | Liang Yang (杨亮)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

对话风格能够反映对话者的属性,例如情感、性别和教育背景等。在对话系统中,通过理解用户的对话风格,能够更好地对用户进行建模。同样的,面对不同背景的用户,对话机器人也应该使用不同的语言风格与之交流。语言表达风格是文本的内在属性,然而现有的大多数文本风格迁移研究,集中在英文领域,在中文领域则研究较少。本文构建了三个可用于中文文本风格迁移研究的数据集,并将多种已有的文本风格迁移方法应用于该数据集。同时,本文提出了基于DeepStyle算法与Transformer的风格迁移模型,通过预训练可以获得不同风格的隐层向量表示。并基于Transformer构建生成端模型,在解码阶段,通过重建源文本的方式,保留生成文本的内容信息,并且引入对立风格的嵌入表示,使得模型能够生成不同风格的文本。实验结果表明,本文提出的模型在构建的中文数据集上均优于现有模型。

pdf bib
Zero-Shot Information Extraction as a Unified Text-to-Triple Translation
Chenguang Wang | Xiao Liu | Zui Chen | Haoyun Hong | Jie Tang | Dawn Song
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

We cast a suite of information extraction tasks into a text-to-triple translation framework. Instead of solving each task relying on task-specific datasets and models, we formalize the task as a translation between task-specific input text and output triples. By taking the task-specific input, we enable a task-agnostic translation by leveraging the latent knowledge that a pre-trained language model has about the task. We further demonstrate that a simple pre-training task of predicting which relational information corresponds to which input text is an effective way to produce task-specific outputs. This enables the zero-shot transfer of our framework to downstream tasks. We study the zero-shot performance of this framework on open information extraction (OIE2016, NYT, WEB, PENN), relation classification (FewRel and TACRED), and factual probe (Google-RE and T-REx). The model transfers non-trivially to most tasks and is often competitive with a fully supervised method without the need for any task-specific training. For instance, we significantly outperform the F1 score of the supervised open information extraction without needing to use its training set.

2020

pdf bib
PoD: Positional Dependency-Based Word Embedding for Aspect Term Extraction
Yichun Yin | Chenguang Wang | Ming Zhang
Proceedings of the 28th International Conference on Computational Linguistics

Dependency context-based word embedding jointly learns the representations of word and dependency context, and has been proved effective in aspect term extraction. In this paper, we design the positional dependency-based word embedding (PoD) which considers both dependency context and positional context for aspect term extraction. Specifically, the positional context is modeled via relative position encoding. Besides, we enhance the dependency context by integrating more lexical information (e.g., POS tags) along dependency paths. Experiments on SemEval 2014/2015/2016 datasets show that our approach outperforms other embedding methods in aspect term extraction.

2017

pdf bib
CROWD-IN-THE-LOOP: A Hybrid Approach for Annotating Semantic Roles
Chenguang Wang | Alan Akbik | Laura Chiticariu | Yunyao Li | Fei Xia | Anbang Xu
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Crowdsourcing has proven to be an effective method for generating labeled data for a range of NLP tasks. However, multiple recent attempts of using crowdsourcing to generate gold-labeled training data for semantic role labeling (SRL) reported only modest results, indicating that SRL is perhaps too difficult a task to be effectively crowdsourced. In this paper, we postulate that while producing SRL annotation does require expert involvement in general, a large subset of SRL labeling tasks is in fact appropriate for the crowd. We present a novel workflow in which we employ a classifier to identify difficult annotation tasks and route each task either to experts or crowd workers according to their difficulties. Our experimental evaluation shows that the proposed approach reduces the workload for experts by over two-thirds, and thus significantly reduces the cost of producing SRL annotation at little loss in quality.

2013

pdf bib
Paraphrasing Adaptation for Web Search Ranking
Chenguang Wang | Nan Duan | Ming Zhou | Ming Zhang
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2011

pdf bib
ENGtube: an Integrated Subtitle Environment for ESL
Chi-Ho Li | Shujie Liu | Chenguang Wang | Ming Zhou
Proceedings of Machine Translation Summit XIII: System Presentations