Sheng Zhang


2022

pdf bib
Locally Aggregated Feature Attribution on Natural Language Model Understanding
Sheng Zhang | Jin Wang | Haitao Jiang | Rui Song
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

With the growing popularity of deep-learning models, model understanding becomes more important. Much effort has been devoted to demystify deep neural networks for better explainability. Some feature attribution methods have shown promising results in computer vision, especially the gradient-based methods where effectively smoothing the gradients with reference data is the key to a robust and faithful result. However, direct application of these gradient-based methods to NLP tasks is not trivial due to the fact that the input consists of discrete tokens and the “reference” tokens are not explicitly defined. In this work, we propose Locally Aggregated Feature Attribution (LAFA), a novel gradient-based feature attribution method for NLP models. Instead of relying on obscure reference tokens, it smooths gradients by aggregating similar reference texts derived from language model embeddings. For evaluation purpose, we also design experiments on different NLP tasks including Entity Recognition and Sentiment Analysis on public datasets and key words detection on constructed Amazon catalogue dataset. The superior performance of the proposed method is demonstrated through experiments.

pdf bib
FairLex: A Multilingual Benchmark for Evaluating Fairness in Legal Text Processing
Ilias Chalkidis | Tommaso Pasini | Sheng Zhang | Letizia Tomada | Sebastian Schwemer | Anders Søgaard
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present a benchmark suite of four datasets for evaluating the fairness of pre-trained language models and the techniques used to fine-tune them for downstream tasks. Our benchmarks cover four jurisdictions (European Council, USA, Switzerland, and China), five languages (English, German, French, Italian and Chinese) and fairness across five attributes (gender, age, region, language, and legal area). In our experiments, we evaluate pre-trained language models using several group-robust fine-tuning techniques and show that performance group disparities are vibrant in many cases, while none of these techniques guarantee fairness, nor consistently mitigate group disparities. Furthermore, we provide a quantitative and qualitative analysis of our results, highlighting open challenges in the development of robustness methods in legal NLP.

pdf bib
Zero-Shot Dependency Parsing with Worst-Case Aware Automated Curriculum Learning
Miryam de Lhoneux | Sheng Zhang | Anders Søgaard
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Large multilingual pretrained language models such as mBERT and XLM-RoBERTa have been found to be surprisingly effective for cross-lingual transfer of syntactic parsing models Wu and Dredze (2019), but only between related languages. However, source and training languages are rarely related, when parsing truly low-resource languages. To close this gap, we adopt a method from multi-task learning, which relies on automated curriculum learning, to dynamically optimize for parsing performance on outlier languages. We show that this approach is significantly better than uniform and size-proportional sampling in the zero-shot setting.

2021

pdf bib
Joint Universal Syntactic and Semantic Parsing
Elias Stengel-Eskin | Kenton Murray | Sheng Zhang | Aaron Steven White | Benjamin Van Durme
Transactions of the Association for Computational Linguistics, Volume 9

While numerous attempts have been made to jointly parse syntax and semantics, high performance in one domain typically comes at the price of performance in the other. This trade-off contradicts the large body of research focusing on the rich interactions at the syntax–semantics interface. We explore multiple model architectures that allow us to exploit the rich syntactic and semantic annotations contained in the Universal Decompositional Semantics (UDS) dataset, jointly parsing Universal Dependencies and UDS to obtain state-of-the-art results in both formalisms. We analyze the behavior of a joint model of syntax and semantics, finding patterns supported by linguistic theory at the syntax–semantics interface. We then investigate to what degree joint modeling generalizes to a multilingual setting, where we find similar trends across 8 languages.

pdf bib
Sociolectal Analysis of Pretrained Language Models
Sheng Zhang | Xin Zhang | Weiming Zhang | Anders Søgaard
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Using data from English cloze tests, in which subjects also self-reported their gender, age, education, and race, we examine performance differences of pretrained language models across demographic groups, defined by these (protected) attributes. We demonstrate wide performance gaps across demographic groups and show that pretrained language models systematically disfavor young non-white male speakers; i.e., not only do pretrained language models learn social biases (stereotypical associations) – pretrained language models also learn sociolectal biases, learning to speak more like some than like others. We show, however, that, with the exception of BERT models, larger pretrained language models reduce some the performance gaps between majority and minority groups.

pdf bib
Modular Self-Supervision for Document-Level Relation Extraction
Sheng Zhang | Cliff Wong | Naoto Usuyama | Sarthak Jain | Tristan Naumann | Hoifung Poon
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Extracting relations across large text spans has been relatively underexplored in NLP, but it is particularly important for high-value domains such as biomedicine, where obtaining high recall of the latest findings is crucial for practical applications. Compared to conventional information extraction confined to short text spans, document-level relation extraction faces additional challenges in both inference and learning. Given longer text spans, state-of-the-art neural architectures are less effective and task-specific self-supervision such as distant supervision becomes very noisy. In this paper, we propose decomposing document-level relation extraction into relation detection and argument resolution, taking inspiration from Davidsonian semantics. This enables us to incorporate explicit discourse modeling and leverage modular self-supervision for each sub-problem, which is less noise-prone and can be further refined end-to-end via variational EM. We conduct a thorough evaluation in biomedical machine reading for precision oncology, where cross-paragraph relation mentions are prevalent. Our method outperforms prior state of the art, such as multi-scale learning and graph neural networks, by over 20 absolute F1 points. The gain is particularly pronounced among the most challenging relation instances whose arguments never co-occur in a paragraph.

2020

pdf bib
Knowledge-guided Open Attribute Value Extraction with Reinforcement Learning
Ye Liu | Sheng Zhang | Rui Song | Suo Feng | Yanghua Xiao
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Open attribute value extraction for emerging entities is an important but challenging task. A lot of previous works formulate the problem as a question-answering (QA) task. While the collections of articles from web corpus provide updated information about the emerging entities, the retrieved texts can be noisy, irrelevant, thus leading to inaccurate answers. Effectively filtering out noisy articles as well as bad answers is the key to improve extraction accuracy. Knowledge graph (KG), which contains rich, well organized information about entities, provides a good resource to address the challenge. In this work, we propose a knowledge-guided reinforcement learning (RL) framework for open attribute value extraction. Informed by relevant knowledge in KG, we trained a deep Q-network to sequentially compare extracted answers to improve extraction accuracy. The proposed framework is applicable to different information extraction system. Our experimental results show that our method outperforms the baselines by 16.5 - 27.8%.

pdf bib
Universal Decompositional Semantic Parsing
Elias Stengel-Eskin | Aaron Steven White | Sheng Zhang | Benjamin Van Durme
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We introduce a transductive model for parsing into Universal Decompositional Semantics (UDS) representations, which jointly learns to map natural language utterances into UDS graph structures and annotate the graph with decompositional semantic attribute scores. We also introduce a strong pipeline model for parsing into the UDS graph structure, and show that our transductive parser performs comparably while additionally performing attribute prediction. By analyzing the attribute prediction errors, we find the model captures natural relationships between attribute groups.

pdf bib
The Universal Decompositional Semantics Dataset and Decomp Toolkit
Aaron Steven White | Elias Stengel-Eskin | Siddharth Vashishtha | Venkata Subrahmanyan Govindarajan | Dee Ann Reisinger | Tim Vieira | Keisuke Sakaguchi | Sheng Zhang | Francis Ferraro | Rachel Rudinger | Kyle Rawlins | Benjamin Van Durme
Proceedings of the 12th Language Resources and Evaluation Conference

We present the Universal Decompositional Semantics (UDS) dataset (v1.0), which is bundled with the Decomp toolkit (v0.1). UDS1.0 unifies five high-quality, decompositional semantics-aligned annotation sets within a single semantic graph specification—with graph structures defined by the predicative patterns produced by the PredPatt tool and real-valued node and edge attributes constructed using sophisticated normalization procedures. The Decomp toolkit provides a suite of Python 3 tools for querying UDS graphs using SPARQL. Both UDS1.0 and Decomp0.1 are publicly available at http://decomp.io.

2019

pdf bib
Unsupervised Deep Structured Semantic Models for Commonsense Reasoning
Shuohang Wang | Sheng Zhang | Yelong Shen | Xiaodong Liu | Jingjing Liu | Jianfeng Gao | Jing Jiang
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Commonsense reasoning is fundamental to natural language understanding. While traditional methods rely heavily on human-crafted features and knowledge bases, we explore learning commonsense knowledge from a large amount of raw text via unsupervised learning. We propose two neural network models based on the Deep Structured Semantic Models (DSSM) framework to tackle two classic commonsense reasoning tasks, Winograd Schema challenges (WSC) and Pronoun Disambiguation (PDP). Evaluation shows that the proposed models effectively capture contextual information in the sentence and co-reference information between pronouns and nouns, and achieve significant improvement over previous state-of-the-art approaches.

pdf bib
Broad-Coverage Semantic Parsing as Transduction
Sheng Zhang | Xutai Ma | Kevin Duh | Benjamin Van Durme
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We unify different broad-coverage semantic parsing tasks into a transduction parsing paradigm, and propose an attention-based neural transducer that incrementally builds meaning representation via a sequence of semantic relations. By leveraging multiple attention mechanisms, the neural transducer can be effectively trained without relying on a pre-trained aligner. Experiments separately conducted on three broad-coverage semantic parsing tasks – AMR, SDP and UCCA – demonstrate that our attention-based neural transducer improves the state of the art on both AMR and UCCA, and is competitive with the state of the art on SDP.

pdf bib
Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing
Simon Ostermann | Sheng Zhang | Michael Roth | Peter Clark
Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing

pdf bib
Commonsense Inference in Natural Language Processing (COIN) - Shared Task Report
Simon Ostermann | Sheng Zhang | Michael Roth | Peter Clark
Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing

This paper reports on the results of the shared tasks of the COIN workshop at EMNLP-IJCNLP 2019. The tasks consisted of two machine comprehension evaluations, each of which tested a system’s ability to answer questions/queries about a text. Both evaluations were designed such that systems need to exploit commonsense knowledge, for example, in the form of inferences over information that is available in the common ground but not necessarily mentioned in the text. A total of five participating teams submitted systems for the shared tasks, with the best submitted system achieving 90.6% accuracy and 83.7% F1-score on task 1 and task 2, respectively.

pdf bib
Deep Generalized Canonical Correlation Analysis
Adrian Benton | Huda Khayrallah | Biman Gujral | Dee Ann Reisinger | Sheng Zhang | Raman Arora
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

We present Deep Generalized Canonical Correlation Analysis (DGCCA) – a method for learning nonlinear transformations of arbitrarily many views of data, such that the resulting transformations are maximally informative of each other. While methods for nonlinear two view representation learning (Deep CCA, (Andrew et al., 2013)) and linear many-view representation learning (Generalized CCA (Horst, 1961)) exist, DGCCA combines the flexibility of nonlinear (deep) representation learning with the statistical power of incorporating information from many sources, or views. We present the DGCCA formulation as well as an efficient stochastic optimization algorithm for solving it. We learn and evaluate DGCCA representations for three downstream tasks: phonetic transcription from acoustic & articulatory measurements, recommending hashtags and recommending friends on a dataset of Twitter users.

pdf bib
AMR Parsing as Sequence-to-Graph Transduction
Sheng Zhang | Xutai Ma | Kevin Duh | Benjamin Van Durme
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We propose an attention-based model that treats AMR parsing as sequence-to-graph transduction. Unlike most AMR parsers that rely on pre-trained aligners, external semantic resources, or data augmentation, our proposed parser is aligner-free, and it can be effectively trained with limited amounts of labeled AMR data. Our experimental results outperform all previously reported SMATCH scores, on both AMR 2.0 (76.3% on LDC2017T10) and AMR 1.0 (70.2% on LDC2014T12).

2018

pdf bib
Halo: Learning Semantics-Aware Representations for Cross-Lingual Information Extraction
Hongyuan Mei | Sheng Zhang | Kevin Duh | Benjamin Van Durme
Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics

Cross-lingual information extraction (CLIE) is an important and challenging task, especially in low resource scenarios. To tackle this challenge, we propose a training method, called Halo, which enforces the local region of each hidden state of a neural model to only generate target tokens with the same semantic structure tag. This simple but powerful technique enables a neural model to learn semantics-aware representations that are robust to noise, without introducing any extra parameter, thus yielding better generalization in both high and low resource settings.

pdf bib
Fine-grained Entity Typing through Increased Discourse Context and Adaptive Classification Thresholds
Sheng Zhang | Kevin Duh | Benjamin Van Durme
Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics

Fine-grained entity typing is the task of assigning fine-grained semantic types to entity mentions. We propose a neural architecture which learns a distributional semantic representation that leverages a greater amount of semantic context – both document and sentence level information – than prior work. We find that additional context improves performance, with further improvements gained by utilizing adaptive classification thresholds. Experiments show that our approach without reliance on hand-crafted features achieves the state-of-the-art results on three benchmark datasets.

pdf bib
CRF-LSTM Text Mining Method Unveiling the Pharmacological Mechanism of Off-target Side Effect of Anti-Multiple Myeloma Drugs
Kaiyin Zhou | Sheng Zhang | Xiangyu Meng | Qi Luo | Yuxing Wang | Ke Ding | Yukun Feng | Mo Chen | Kevin Cohen | Jingbo Xia
Proceedings of the BioNLP 2018 workshop

Sequence labeling of biomedical entities, e.g., side effects or phenotypes, was a long-term task in BioNLP and MedNLP communities. Thanks to effects made among these communities, adverse reaction NER has developed dramatically in recent years. As an illuminative application, to achieve knowledge discovery via the combination of the text mining result and bioinformatics idea shed lights on the pharmacological mechanism research.

pdf bib
Neural-Davidsonian Semantic Proto-role Labeling
Rachel Rudinger | Adam Teichert | Ryan Culkin | Sheng Zhang | Benjamin Van Durme
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We present a model for semantic proto-role labeling (SPRL) using an adapted bidirectional LSTM encoding strategy that we call NeuralDavidsonian: predicate-argument structure is represented as pairs of hidden states corresponding to predicate and argument head tokens of the input sequence. We demonstrate: (1) state-of-the-art results in SPRL, and (2) that our network naturally shares parameters between attributes, allowing for learning new attribute types with limited added supervision.

pdf bib
Cross-lingual Decompositional Semantic Parsing
Sheng Zhang | Xutai Ma | Rachel Rudinger | Kevin Duh | Benjamin Van Durme
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We introduce the task of cross-lingual decompositional semantic parsing: mapping content provided in a source language into a decompositional semantic analysis based on a target language. We present: (1) a form of decompositional semantic analysis designed to allow systems to target varying levels of structural complexity (shallow to deep analysis), (2) an evaluation metric to measure the similarity between system output and reference semantic analysis, (3) an end-to-end model with a novel annotating mechanism that supports intra-sentential coreference, and (4) an evaluation dataset on which our model outperforms strong baselines by at least 1.75 F1 score.

2017

pdf bib
Ordinal Common-sense Inference
Sheng Zhang | Rachel Rudinger | Kevin Duh | Benjamin Van Durme
Transactions of the Association for Computational Linguistics, Volume 5

Humans have the capacity to draw common-sense inferences from natural language: various things that are likely but not certain to hold based on established discourse, and are rarely stated explicitly. We propose an evaluation of automated common-sense inference based on an extension of recognizing textual entailment: predicting ordinal human responses on the subjective likelihood of an inference holding in a given context. We describe a framework for extracting common-sense knowledge from corpora, which is then used to construct a dataset for this ordinal entailment task. We train a neural sequence-to-sequence model on this dataset, which we use to score and generate possible inferences. Further, we annotate subsets of previously established datasets via our ordinal annotation protocol in order to then analyze the distinctions between these and what we have constructed.

pdf bib
MT/IE: Cross-lingual Open Information Extraction with Neural Sequence-to-Sequence Models
Sheng Zhang | Kevin Duh | Benjamin Van Durme
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

Cross-lingual information extraction is the task of distilling facts from foreign language (e.g. Chinese text) into representations in another language that is preferred by the user (e.g. English tuples). Conventional pipeline solutions decompose the task as machine translation followed by information extraction (or vice versa). We propose a joint solution with a neural sequence model, and show that it outperforms the pipeline in a cross-lingual open information extraction setting by 1-4 BLEU and 0.5-0.8 F1.

pdf bib
FuRongWang at SemEval-2017 Task 3: Deep Neural Networks for Selecting Relevant Answers in Community Question Answering
Sheng Zhang | Jiajun Cheng | Hui Wang | Xin Zhang | Pei Li | Zhaoyun Ding
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

We describes deep neural networks frameworks in this paper to address the community question answering (cQA) ranking task (SemEval-2017 task 3). Convolutional neural networks and bi-directional long-short term memory networks are applied in our methods to extract semantic information from questions and answers (comments). In addition, in order to take the full advantage of question-comment semantic relevance, we deploy interaction layer and augmented features before calculating the similarity. The results show that our methods have the great effectiveness for both subtask A and subtask C.

pdf bib
Selective Decoding for Cross-lingual Open Information Extraction
Sheng Zhang | Kevin Duh | Benjamin Van Durme
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Cross-lingual open information extraction is the task of distilling facts from the source language into representations in the target language. We propose a novel encoder-decoder model for this problem. It employs a novel selective decoding mechanism, which explicitly models the sequence labeling process as well as the sequence generation process on the decoder side. Compared to a standard encoder-decoder model, selective decoding significantly increases the performance on a Chinese-English cross-lingual open IE dataset by 3.87-4.49 BLEU and 1.91-5.92 F1. We also extend our approach to low-resource scenarios, and gain promising improvement.

pdf bib
An Evaluation of PredPatt and Open IE via Stage 1 Semantic Role Labeling
Sheng Zhang | Rachel Rudinger | Benjamin Van Durme
IWCS 2017 — 12th International Conference on Computational Semantics — Short papers

2016

pdf bib
Universal Decompositional Semantics on Universal Dependencies
Aaron Steven White | Drew Reisinger | Keisuke Sakaguchi | Tim Vieira | Sheng Zhang | Rachel Rudinger | Kyle Rawlins | Benjamin Van Durme
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

2015

pdf bib
Semantic Interpretation of Superlative Expressions via Structured Knowledge Bases
Sheng Zhang | Yansong Feng | Songfang Huang | Kun Xu | Zhe Han | Dongyan Zhao
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)