Yu Zhang


pdf bib
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
Junyi Ao | Rui Wang | Long Zhou | Chengyi Wang | Shuo Ren | Yu Wu | Shujie Liu | Tom Ko | Qing Li | Yu Zhang | Zhihua Wei | Yao Qian | Jinyu Li | Furu Wei
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural language processing models, we propose a unified-modal SpeechT5 framework that explores the encoder-decoder pre-training for self-supervised speech/text representation learning. The SpeechT5 framework consists of a shared encoder-decoder network and six modal-specific (speech/text) pre/post-nets. After preprocessing the input speech/text through the pre-nets, the shared encoder-decoder network models the sequence-to-sequence transformation, and then the post-nets generate the output in the speech/text modality based on the output of the decoder. Leveraging large-scale unlabeled speech and text data, we pre-train SpeechT5 to learn a unified-modal representation, hoping to improve the modeling capability for both speech and text. To align the textual and speech information into this unified semantic space, we propose a cross-modal vector quantization approach that randomly mixes up speech/text states with latent units as the interface between encoder and decoder. Extensive evaluations show the superiority of the proposed SpeechT5 framework on a wide variety of spoken language processing tasks, including automatic speech recognition, speech synthesis, speech translation, voice conversion, speech enhancement, and speaker identification.

pdf bib
Seed-Guided Topic Discovery with Out-of-Vocabulary Seeds
Yu Zhang | Yu Meng | Xuan Wang | Sheng Wang | Jiawei Han
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Discovering latent topics from text corpora has been studied for decades. Many existing topic models adopt a fully unsupervised setting, and their discovered topics may not cater to users’ particular interests due to their inability of leveraging user guidance. Although there exist seed-guided topic discovery approaches that leverage user-provided seeds to discover topic-representative terms, they are less concerned with two factors: (1) the existence of out-of-vocabulary seeds and (2) the power of pre-trained language models (PLMs). In this paper, we generalize the task of seed-guided topic discovery to allow out-of-vocabulary seeds. We propose a novel framework, named SeeTopic, wherein the general knowledge of PLMs and the local semantics learned from the input corpus can mutually benefit each other. Experiments on three real datasets from different domains demonstrate the effectiveness of SeeTopic in terms of topic coherence, accuracy, and diversity.

pdf bib
JointLK: Joint Reasoning with Language Models and Knowledge Graphs for Commonsense Question Answering
Yueqing Sun | Qi Shi | Le Qi | Yu Zhang
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Existing KG-augmented models for commonsense question answering primarily focus on designing elaborate Graph Neural Networks (GNNs) to model knowledge graphs (KGs). However, they ignore (i) the effectively fusing and reasoning over question context representations and the KG representations, and (ii) automatically selecting relevant nodes from the noisy KGs during reasoning. In this paper, we propose a novel model, JointLK, which solves the above limitations through the joint reasoning of LM and GNN and the dynamic KGs pruning mechanism. Specifically, JointLK performs joint reasoning between LM and GNN through a novel dense bidirectional attention module, in which each question token attends on KG nodes and each KG node attends on question tokens, and the two modal representations fuse and update mutually by multi-step interactions. Then, the dynamic pruning module uses the attention weights generated by joint reasoning to prune irrelevant KG nodes recursively. We evaluate JointLK on the CommonsenseQA and OpenBookQA datasets, and demonstrate its improvements to the existing LM and LM+KG models, as well as its capability to perform interpretable reasoning.

pdf bib
DuReadervis: A Chinese Dataset for Open-domain Document Visual Question Answering
Le Qi | Shangwen Lv | Hongyu Li | Jing Liu | Yu Zhang | Qiaoqiao She | Hua Wu | Haifeng Wang | Ting Liu
Findings of the Association for Computational Linguistics: ACL 2022

Open-domain question answering has been used in a wide range of applications, such as web search and enterprise search, which usually takes clean texts extracted from various formats of documents (e.g., web pages, PDFs, or Word documents) as the information source. However, designing different text extraction approaches is time-consuming and not scalable. In order to reduce human cost and improve the scalability of QA systems, we propose and study an \textbf{Open-domain} \textbf{Doc}ument \textbf{V}isual \textbf{Q}uestion \textbf{A}nswering (Open-domain DocVQA) task, which requires answering questions based on a collection of document images directly instead of only document texts, utilizing layouts and visual features additionally. Towards this end, we introduce the first Chinese Open-domain DocVQA dataset called DuReadervis, containing about 15K question-answering pairs and 158K document images from the Baidu search engine. There are three main challenges in DuReadervis: (1) long document understanding, (2) noisy texts, and (3) multi-span answer extraction. The extensive experiments demonstrate that the dataset is challenging. Additionally, we propose a simple approach that incorporates the layout and visual features, and the experimental results show the effectiveness of the proposed approach. The dataset and code will be publicly available at https://github.com/baidu/DuReader/tree/master/DuReader-vis.

pdf bib
All Information is Valuable: Question Matching over Full Information Transmission Network
Le Qi | Yu Zhang | Qingyu Yin | Guidong Zheng | Wen Junjie | Jinlong Li | Ting Liu
Findings of the Association for Computational Linguistics: NAACL 2022

Question matching is the task of identifying whether two questions have the same intent. For better reasoning the relationship between questions, existing studies adopt multiple interaction modules and perform multi-round reasoning via deep neural networks. In this process, there are two kinds of critical information that are commonly employed: the representation information of original questions and the interactive information between pairs of questions. However, previous studies tend to transmit only one kind of information, while failing to utilize both kinds of information simultaneously. To address this problem, in this paper, we propose a Full Information Transmission Network (FITN) that can transmit both representation and interactive information together in a simultaneous fashion. More specifically, we employ a novel memory-based attention for keeping and transmitting the interactive information through a global interaction matrix. Besides, we apply an original-average mixed connection method to effectively transmit the representation information between different reasoning rounds, which helps to preserve the original representation features of questions along with the historical hidden features. Experiments on two standard benchmarks demonstrate that our approach outperforms strong baseline models.


pdf bib
Logic-level Evidence Retrieval and Graph-based Verification Network for Table-based Fact Verification
Qi Shi | Yu Zhang | Qingyu Yin | Ting Liu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Table-based fact verification task aims to verify whether the given statement is supported by the given semi-structured table. Symbolic reasoning with logical operations plays a crucial role in this task. Existing methods leverage programs that contain rich logical information to enhance the verification process. However, due to the lack of fully supervised signals in the program generation process, spurious programs can be derived and employed, which leads to the inability of the model to catch helpful logical operations. To address the aforementioned problems, in this work, we formulate the table-based fact verification task as an evidence retrieval and reasoning framework, proposing the Logic-level Evidence Retrieval and Graph-based Verification network (LERGV). Specifically, we first retrieve logic-level program-like evidence from the given table and statement as supplementary evidence for the table. After that, we construct a logic-level graph to capture the logical relations between entities and functions in the retrieved evidence, and design a graph-based verification network to perform logic-level graph-based reasoning based on the constructed graph to classify the final entailment relation. Experimental results on the large-scale benchmark TABFACT show the effectiveness of the proposed approach.

pdf bib
Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training
Yu Meng | Yunyi Zhang | Jiaxin Huang | Xuan Wang | Yu Zhang | Heng Ji | Jiawei Han
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

We study the problem of training named entity recognition (NER) models using only distantly-labeled data, which can be automatically obtained by matching entity mentions in the raw text with entity types in a knowledge base. The biggest challenge of distantly-supervised NER is that the distant supervision may induce incomplete and noisy labels, rendering the straightforward application of supervised learning ineffective. In this paper, we propose (1) a noise-robust learning scheme comprised of a new loss function and a noisy label removal step, for training NER models on distantly-labeled data, and (2) a self-training method that uses contextualized augmentations created by pre-trained language models to improve the generalization ability of the NER model. On three benchmark datasets, our method achieves superior performance, outperforming existing distantly-supervised NER models by significant margins.

pdf bib
A Coarse-to-Fine Labeling Framework for Joint Word Segmentation, POS Tagging, and Constituent Parsing
Yang Hou | Houquan Zhou | Zhenghua Li | Yu Zhang | Min Zhang | Zhefeng Wang | Baoxing Huai | Nicholas Jing Yuan
Proceedings of the 25th Conference on Computational Natural Language Learning

The most straightforward approach to joint word segmentation (WS), part-of-speech (POS) tagging, and constituent parsing is converting a word-level tree into a char-level tree, which, however, leads to two severe challenges. First, a larger label set (e.g., ≥ 600) and longer inputs both increase computational costs. Second, it is difficult to rule out illegal trees containing conflicting production rules, which is important for reliable model evaluation. If a POS tag (like VV) is above a phrase tag (like VP) in the output tree, it becomes quite complex to decide word boundaries. To deal with both challenges, this work proposes a two-stage coarse-to-fine labeling framework for joint WS-POS-PAR. In the coarse labeling stage, the joint model outputs a bracketed tree, in which each node corresponds to one of four labels (i.e., phrase, subphrase, word, subword). The tree is guaranteed to be legal via constrained CKY decoding. In the fine labeling stage, the model expands each coarse label into a final label (such as VP, VP*, VV, VV*). Experiments on Chinese Penn Treebank 5.1 and 7.0 show that our joint model consistently outperforms the pipeline approach on both settings of w/o and w/ BERT, and achieves new state-of-the-art performance.


pdf bib
Efficient Second-Order TreeCRF for Neural Dependency Parsing
Yu Zhang | Zhenghua Li | Min Zhang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In the deep learning (DL) era, parsing models are extremely simplified with little hurt on performance, thanks to the remarkable capability of multi-layer BiLSTMs in context representation. As the most popular graph-based dependency parser due to its high efficiency and performance, the biaffine parser directly scores single dependencies under the arc-factorization assumption, and adopts a very simple local token-wise cross-entropy training loss. This paper for the first time presents a second-order TreeCRF extension to the biaffine parser. For a long time, the complexity and inefficiency of the inside-outside algorithm hinder the popularity of TreeCRF. To address this issue, we propose an effective way to batchify the inside and Viterbi algorithms for direct large matrix operation on GPUs, and to avoid the complex outside algorithm via efficient back-propagation. Experiments and analysis on 27 datasets from 13 languages clearly show that techniques developed before the DL era, such as structural learning (global TreeCRF loss) and high-order modeling are still useful, and can further boost parsing performance over the state-of-the-art biaffine parser, especially for partially annotated training data. We release our code at https://github.com/yzhangcs/crfpar.

pdf bib
Learn to Combine Linguistic and Symbolic Information for Table-based Fact Verification
Qi Shi | Yu Zhang | Qingyu Yin | Ting Liu
Proceedings of the 28th International Conference on Computational Linguistics

Table-based fact verification is expected to perform both linguistic reasoning and symbolic reasoning. Existing methods lack attention to take advantage of the combination of linguistic information and symbolic information. In this work, we propose HeterTFV, a graph-based reasoning approach, that learns to combine linguistic information and symbolic information effectively. We first construct a program graph to encode programs, a kind of LISP-like logical form, to learn the semantic compositionality of the programs. Then we construct a heterogeneous graph to incorporate both linguistic information and symbolic information by introducing program nodes into the heterogeneous graph. Finally, we propose a graph-based reasoning approach to reason over the multiple types of nodes to make an effective combination of both types of information. Experimental results on a large-scale benchmark dataset TABFACT illustrate the effect of our approach.

pdf bib
Learn to Cross-lingual Transfer with Meta Graph Learning Across Heterogeneous Languages
Zheng Li | Mukul Kumar | William Headden | Bing Yin | Ying Wei | Yu Zhang | Qiang Yang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Recent emergence of multilingual pre-training language model (mPLM) has enabled breakthroughs on various downstream cross-lingual transfer (CLT) tasks. However, mPLM-based methods usually involve two problems: (1) simply fine-tuning may not adapt general-purpose multilingual representations to be task-aware on low-resource languages; (2) ignore how cross-lingual adaptation happens for downstream tasks. To address the issues, we propose a meta graph learning (MGL) method. Unlike prior works that transfer from scratch, MGL can learn to cross-lingual transfer by extracting meta-knowledge from historical CLT experiences (tasks), making mPLM insensitive to low-resource languages. Besides, for each CLT task, MGL formulates its transfer process as information propagation over a dynamic graph, where the geometric structure can automatically capture intrinsic language relationships to explicitly guide cross-lingual transfer. Empirically, extensive experiments on both public and real-world datasets demonstrate the effectiveness of the MGL method.

pdf bib
A Large Scale Speech Sentiment Corpus
Eric Chen | Zhiyun Lu | Hao Xu | Liangliang Cao | Yu Zhang | James Fan
Proceedings of the 12th Language Resources and Evaluation Conference

We present a multimodal corpus for sentiment analysis based on the existing Switchboard-1 Telephone Speech Corpus released by the Linguistic Data Consortium. This corpus extends the Switchboard-1 Telephone Speech Corpus by adding sentiment labels from 3 different human annotators for every transcript segment. Each sentiment label can be one of three options: positive, negative, and neutral. Annotators are recruited using Google Cloud’s data labeling service and the labeling task was conducted over the internet. The corpus contains a total of 49500 labeled speech segments covering 140 hours of audio. To the best of our knowledge, this is the largest multimodal Corpus for sentiment analysis that includes both speech and text features.


pdf bib
Transferable End-to-End Aspect-based Sentiment Analysis with Selective Adversarial Learning
Zheng Li | Xin Li | Ying Wei | Lidong Bing | Yu Zhang | Qiang Yang
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Joint extraction of aspects and sentiments can be effectively formulated as a sequence labeling problem. However, such formulation hinders the effectiveness of supervised methods due to the lack of annotated sequence data in many domains. To address this issue, we firstly explore an unsupervised domain adaptation setting for this task. Prior work can only use common syntactic relations between aspect and opinion words to bridge the domain gaps, which highly relies on external linguistic resources. To resolve it, we propose a novel Selective Adversarial Learning (SAL) method to align the inferred correlation vectors that automatically capture their latent relations. The SAL method can dynamically learn an alignment weight for each word such that more important words can possess higher alignment weights to achieve fine-grained (word-level) adaptation. Empirically, extensive experiments demonstrate the effectiveness of the proposed SAL method.

pdf bib
HLT@SUDA at SemEval-2019 Task 1: UCCA Graph Parsing as Constituent Tree Parsing
Wei Jiang | Zhenghua Li | Yu Zhang | Min Zhang
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes a simple UCCA semantic graph parsing approach. The key idea is to convert a UCCA semantic graph into a constituent tree, in which extra labels are deliberately designed to mark remote edges and discontinuous nodes for future recovery. In this way, we can make use of existing syntactic parsing techniques. Based on the data statistics, we recover discontinuous nodes directly according to the output labels of the constituent parser and use a biaffine classification model to recover the more complex remote edges. The classification model and the constituent parser are simultaneously trained under the multi-task learning framework. We use the multilingual BERT as extra features in the open tracks. Our system ranks the first place in the six English/German closed/open tracks among seven participating systems. For the seventh cross-lingual track, where there is little training data for French, we propose a language embedding approach to utilize English and German training data, and our result ranks the second place.


pdf bib
Cross-lingual Knowledge Graph Alignment via Graph Convolutional Networks
Zhichun Wang | Qingsong Lv | Xiaohan Lan | Yu Zhang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Multilingual knowledge graphs (KGs) such as DBpedia and YAGO contain structured knowledge of entities in several distinct languages, and they are useful resources for cross-lingual AI and NLP applications. Cross-lingual KG alignment is the task of matching entities with their counterparts in different languages, which is an important way to enrich the cross-lingual links in multilingual KGs. In this paper, we propose a novel approach for cross-lingual KG alignment via graph convolutional networks (GCNs). Given a set of pre-aligned entities, our approach trains GCNs to embed entities of each language into a unified vector space. Entity alignments are discovered based on the distances between entities in the embedding space. Embeddings can be learned from both the structural and attribute information of entities, and the results of structure embedding and attribute embedding are combined to get accurate alignments. In the experiments on aligning real multilingual KGs, our approach gets the best performance compared with other embedding-based KG alignment approaches.

pdf bib
Simple Recurrent Units for Highly Parallelizable Recurrence
Tao Lei | Yu Zhang | Sida I. Wang | Hui Dai | Yoav Artzi
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Common recurrent neural architectures scale poorly due to the intrinsic difficulty in parallelizing their state computations. In this work, we propose the Simple Recurrent Unit (SRU), a light recurrent unit that balances model capacity and scalability. SRU is designed to provide expressive recurrence, enable highly parallelized implementation, and comes with careful initialization to facilitate training of deep models. We demonstrate the effectiveness of SRU on multiple NLP tasks. SRU achieves 5—9x speed-up over cuDNN-optimized LSTM on classification and question answering datasets, and delivers stronger results than LSTM and convolutional models. We also obtain an average of 0.7 BLEU improvement over the Transformer model (Vaswani et al., 2017) on translation by incorporating SRU into the architecture.

pdf bib
Deep Reinforcement Learning for Chinese Zero Pronoun Resolution
Qingyu Yin | Yu Zhang | Wei-Nan Zhang | Ting Liu | William Yang Wang
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent neural network models for Chinese zero pronoun resolution gain great performance by capturing semantic information for zero pronouns and candidate antecedents, but tend to be short-sighted, operating solely by making local decisions. They typically predict coreference links between the zero pronoun and one single candidate antecedent at a time while ignoring their influence on future decisions. Ideally, modeling useful information of preceding potential antecedents is crucial for classifying later zero pronoun-candidate antecedent pairs, a need which leads traditional models of zero pronoun resolution to draw on reinforcement learning. In this paper, we show how to integrate these goals, applying deep reinforcement learning to deal with the task. With the help of the reinforcement learning agent, our system learns the policy of selecting antecedents in a sequential manner, where useful information provided by earlier predicted antecedents could be utilized for making later coreference decisions. Experimental results on OntoNotes 5.0 show that our approach substantially outperforms the state-of-the-art methods under three experimental settings.

pdf bib
Zero Pronoun Resolution with Attention-based Neural Network
Qingyu Yin | Yu Zhang | Weinan Zhang | Ting Liu | William Yang Wang
Proceedings of the 27th International Conference on Computational Linguistics

Recent neural network methods for zero pronoun resolution explore multiple models for generating representation vectors for zero pronouns and their candidate antecedents. Typically, contextual information is utilized to encode the zero pronouns since they are simply gaps that contain no actual content. To better utilize contexts of the zero pronouns, we here introduce the self-attention mechanism for encoding zero pronouns. With the help of the multiple hops of attention, our model is able to focus on some informative parts of the associated texts and therefore produces an efficient way of encoding the zero pronouns. In addition, an attention-based recurrent neural network is proposed for encoding candidate antecedents by their contents. Experiment results are encouraging: our proposed attention-based model gains the best performance on the Chinese portion of the OntoNotes corpus, substantially surpasses existing Chinese zero pronoun resolution baseline systems.


pdf bib
SCIR-QA at SemEval-2017 Task 3: CNN Model Based on Similar and Dissimilar Information between Keywords for Question Similarity
Le Qi | Yu Zhang | Ting Liu
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

We describe a method of calculating the similarity of questions in community QA. Question in cQA are usually very long and there are a lot of useless information about calculating the similarity of questions. Therefore,we implement a CNN model based on similar and dissimilar information between question’s keywords. We extract the keywords of questions, and then model the similar and dissimilar information between the keywords, and use the CNN model to calculate the similarity.

pdf bib
Benben: A Chinese Intelligent Conversational Robot
Wei-Nan Zhang | Ting Liu | Bing Qin | Yu Zhang | Wanxiang Che | Yanyan Zhao | Xiao Ding
Proceedings of ACL 2017, System Demonstrations

pdf bib
Chinese Zero Pronoun Resolution with Deep Memory Network
Qingyu Yin | Yu Zhang | Weinan Zhang | Ting Liu
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Existing approaches for Chinese zero pronoun resolution typically utilize only syntactical and lexical features while ignoring semantic information. The fundamental reason is that zero pronouns have no descriptive information, which brings difficulty in explicitly capturing their semantic similarities with antecedents. Meanwhile, representing zero pronouns is challenging since they are merely gaps that convey no actual content. In this paper, we address this issue by building a deep memory network that is capable of encoding zero pronouns into vector representations with information obtained from their contexts and potential antecedents. Consequently, our resolver takes advantage of semantic information by using these continuous distributed representations. Experiments on the OntoNotes 5.0 dataset show that the proposed memory network could substantially outperform the state-of-the-art systems in various experimental settings.


pdf bib
SLS at SemEval-2016 Task 3: Neural-based Approaches for Ranking in Community Question Answering
Mitra Mohtarami | Yonatan Belinkov | Wei-Ning Hsu | Yu Zhang | Tao Lei | Kfir Bar | Scott Cyphers | Jim Glass
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
Neural Attention for Learning to Rank Questions in Community Question Answering
Salvatore Romeo | Giovanni Da San Martino | Alberto Barrón-Cedeño | Alessandro Moschitti | Yonatan Belinkov | Wei-Ning Hsu | Yu Zhang | Mitra Mohtarami | James Glass
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In real-world data, e.g., from Web forums, text is often contaminated with redundant or irrelevant content, which leads to introducing noise in machine learning algorithms. In this paper, we apply Long Short-Term Memory networks with an attention mechanism, which can select important parts of text for the task of similar question retrieval from community Question Answering (cQA) forums. In particular, we use the attention weights for both selecting entire sentences and their subparts, i.e., word/chunk, from shallow syntactic trees. More interestingly, we apply tree kernels to the filtered text representations, thus exploiting the implicit features of the subtree space for learning question reranking. Our results show that the attention-based pruning allows for achieving the top position in the cQA challenge of SemEval 2016, with a relatively large gap from the other participants while greatly decreasing running time.


pdf bib
Joint Learning of Phonetic Units and Word Pronunciations for ASR
Chia-ying Lee | Yu Zhang | James Glass
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing


pdf bib
The Use of Dependency Relation Graph to Enhance the Term Weighting in Question Retrieval
Weinan Zhang | Zhaoyan Ming | Yu Zhang | Liqiang Nie | Ting Liu | Tat-Seng Chua
Proceedings of COLING 2012


pdf bib
Bridging Topic Modeling and Personalized Search
Wei Song | Yu Zhang | Ting Liu | Sheng Li
Coling 2010: Posters


pdf bib
HIT: Web based Scoring Method for English Lexical Substitution
Shiqi Zhao | Lin Zhao | Yu Zhang | Ting Liu | Sheng Li
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)


pdf bib
Automated Generalization of Phrasal Paraphrases from the Web
Weigang Li | Ting Liu | Yu Zhang | Sheng Li | Wei He
Proceedings of the Third International Workshop on Paraphrasing (IWP2005)