Yu Wang


2024

pdf bib
Detecting Subtle Differences between Human and Model Languages Using Spectrum of Relative Likelihood
Yang Xu | Yu Wang | Hao An | Zhichen Liu | Yongyuan Li
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Human and model-generated texts can be distinguished by examining the magnitude of likelihood in language. However, it is becoming increasingly difficult as language model’s capabilities of generating human-like texts keep evolving. This study provides a new perspective by using the relative likelihood values instead of absolute ones, and extracting useful features from the spectrum-view of likelihood for the human-model text detection task. We propose a detection procedure with two classification methods, supervised and heuristic-based, respectively, which results in competitive performances with previous zero-shot detection methods and a new state-of-the-art on short-text detection. Our method can also reveal subtle differences between human and model languages, which find theoretical roots in psycholinguistics studies.

pdf bib
RA2FD: Distilling Faithfulness into Efficient Dialogue Systems
Zhiyuan Zhu | Yusheng Liao | Chenxin Xu | Yunfeng Guan | Yanfeng Wang | Yu Wang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Generating faithful and fast responses is crucial in the knowledge-grounded dialogue. Retrieval Augmented Generation (RAG) strategies are effective but are inference inefficient, while previous Retrieval Free Generations (RFG) are more efficient but sacrifice faithfulness. To solve this faithfulness-efficiency trade-off dilemma, we propose a novel retrieval-free model training scheme named Retrieval Augmented to Retrieval Free Distillation (RA2FD) to build a retrieval-free model that achieves higher faithfulness than the previous RFG method while maintaining inference efficiency. The core idea of RA2FD is to use a teacher-student framework to distill the faithfulness capacity of a teacher, which is an oracle RAG model that generates multiple knowledge-infused responses. The student retrieval-free model learns how to generate faithful responses from these teacher labels through sequence-level distillation and contrastive learning. Experiment results show that RA2FD let the faithfulness performance of an RFG model surpass the previous SOTA RFG baseline on three knowledge-grounded dialogue datasets by an average of 33% and even matching an RAG model’s performance while significantly improving inference efficiency. Our code is available at https://github.com/zzysjtuiwct/RA2FD.

pdf bib
Revisiting the Phenomenon of Syntactic Complexity Convergence on German Dialogue Data
Yu Wang | Hendrik Buschmeier
Proceedings of the 20th Conference on Natural Language Processing (KONVENS 2024)

pdf bib
How Much Does Nonverbal Communication Conform to Entropy Rate Constancy?: A Case Study on Listener Gaze in Interaction
Yu Wang | Yang Xu | Gabriel Skantze | Hendrik Buschmeier
Findings of the Association for Computational Linguistics: ACL 2024

According to the Entropy Rate Constancy (ERC) principle, the information density of a text is approximately constant over its length. Whether this principle also applies to nonverbal communication signals is still under investigation. We perform empirical analyses of video-recorded dialogue data and investigate whether listener gaze, as an important nonverbal communication signal, adheres to the ERC principle. Results show (1) that the ERC principle holds for listener gaze; and (2) that the two linguistic factors syntactic complexity and turn transition potential are weakly correlated with local entropy of listener gaze.

pdf bib
SDA: Semantic Discrepancy Alignment for Text-conditioned Image Retrieval
Yuchen Yang | Yu Wang | Yanfeng Wang
Findings of the Association for Computational Linguistics: ACL 2024

In the realm of text-conditioned image retrieval, models utilize a query composed of a reference image and modification text to retrieve corresponding images. Despite its significance, this task is fraught with challenges, including small-scale datasets due to labeling costs and the complexity of attributes in modification texts. These challenges often result in models learning a generalized representation of the query, thereby missing the semantic correlations of image and text attributes.In this paper, we introduce a general boosting framework designed to address these issues by employing semantic discrepancy alignment. Our framework first leverages the ChatGPT to augment text data by modifying the original modification text’s attributes. The augmented text is then combined with the original reference image to create an augmented composed query. Then we generate corresponding images using GPT-4 for the augmented composed query.We realize the cross-modal semantic discrepancy alignment by formulating distance consistency and neighbor consistency between the image and text domains. Through this novel approach, attribute in the text domain can be more effectively transferred to the image domain, enhancing retrieval performance. Extensive experiments on three prominent datasets validate the effectiveness of our approach, with state-of-the-art results on a majority of evaluation metrics compared to various baseline methods.

pdf bib
DictLLM: Harnessing Key-Value Data Structures with Large Language Models for Enhanced Medical Diagnostics
YiQiu Guo | Yuchen Yang | Ya Zhang | Yu Wang | Yanfeng Wang
Findings of the Association for Computational Linguistics: ACL 2024

Structured data offers an efficient means of organizing information. Exsisting text-serialization based methods for processing structured data using large language models (LLMs) are not designed to explicitly capture the heterogeneity of structured data. Such methods are suboptimal for LLMs to process structured data, and may lead to large input token size and poor robustness to input perturbation. In this paper, we propose a novel framework called DictLLM, which is an efficient and effective framework for the modeling of medical lab report to deal with the report-assisted diagnosis generation task. DictLLM introduce 1) group positional encoding to maintain the permutation invariance, 2) hierarchical attention bias to capture the inductive bias of structured data, and 3) a optimal transport alignment layer to align the embeddings generated by the dict encoder with the LLM, producing a list of fixed-length virtual tokens. We conduct experiments with multiple LLM models on a large-scale real-world medical lab report dataset for automatic diagnosis generation. The results show that our proposed framework outperforms the baseline methods and few-shot GPT-4 in terms of both Rouge-L and Knowledge F1 score. We also conduct multiple experiments and analyze the scalability and robustness of our proposed framework, demonstrating the superiority of our method in modeling the heterogeneous structure of medical dictionaries data.

pdf bib
CF-TCIR: A Compositor-Free Framework for Hierarchical Text-Conditioned Image Retrieval
Yuchen Yang | Yu Wang | Yanfeng Wang
Findings of the Association for Computational Linguistics: ACL 2024

In text-conditioned image retrieval (TCIR), the combination of a reference image and modification text forms a query tuple, aiming to locate the most congruent target image within a dataset. The advantages of rich image semantic information and text flexibility are combined in this manner for more accurate retrieval. While traditional techniques often employ attention-driven compositors to craft a unified image-text representation, our paper introduces a compositor-free framework, CF-TCIR, which eschews the standard compositor. Compositor-based methods are designed to learn a joint representation of images and text, but they struggle to directly capture the correlations between attributes across the image and text modalities. Instead, we reformulate the retrieval process as a cross-modal interaction between a synthesized image feature and its corresponding text descriptor. This novel methodology offers advantages in terms of computational efficiency, scalability, and superior performance. To optimize the retrieval performance, we advocate a tiered retrieval mechanism, blending both coarse-grain and fine-grain paradigms. Moreover, to enrich the contextual relationship within the query tuple, we integrate a generative cross-modal alignment technique, ensuring synchronization of sequential attributes between image and text data.

pdf bib
MedCare: Advancing Medical LLMs through Decoupling Clinical Alignment and Knowledge Aggregation
Yusheng Liao | Shuyang Jiang | Zhe Chen | Yu Wang | Yanfeng Wang
Findings of the Association for Computational Linguistics: EMNLP 2024

Large language models (LLMs) have shown substantial progress in natural language understanding and generation, proving valuable especially in the medical field. Despite advancements, challenges persist due to the complexity and diversity inherent in medical tasks, which can be categorized as knowledge-intensive tasks and alignment-required tasks. Previous approaches either ignore the latter task or focus on a minority of tasks and hence lose generalization. To address these drawbacks, we propose a progressive fine-tuning pipeline. This pipeline employs a and a to encode diverse knowledge in the first stage and filter out detrimental information. In the second stage, we drop the to avoid the interference of suboptimal representation and leverage an additional alignment module optimized towards an orthogonal direction to the knowledge space to mitigate knowledge forgetting. Based on this two-stage paradigm, we proposed a Medical LLM through decoupling Clinical Alignment and Knowledge Aggregation (), which is designed to achieve promising performance on over 20 medical tasks, as well as results on specific medical alignment tasks. Various model sizes of (1.8B, 7B, 14B) all demonstrate significant improvements over existing models with similar model sizes. Our code and datasets are available at https://github.com/BlueZeros/MedCare.

pdf bib
HSDreport: Heart Sound Diagnosis with Echocardiography Reports
Zihan Zhao | Pingjie Wang | Liudan Zhao | Yuchen Yang | Ya Zhang | Kun Sun | Xin Sun | Xin Zhou | Yu Wang | Yanfeng Wang
Findings of the Association for Computational Linguistics: EMNLP 2024

Heart sound auscultation holds significant importance in the diagnosis of congenital heart disease. However, existing methods for Heart Sound Diagnosis (HSD) tasks are predominantly limited to a few fixed categories, framing the HSD task as a rigid classification problem that does not fully align with medical practice and offers only limited information to physicians. Besides, such methods do not utilize echocardiography reports, the gold standard in the diagnosis of related diseases. To tackle this challenge, we introduce HSDreport, a new benchmark for HSD, which mandates the direct utilization of heart sounds obtained from auscultation to predict echocardiography reports. This benchmark aims to merge the convenience of auscultation with the comprehensive nature of echocardiography reports. First, we collect a new dataset for this benchmark, comprising 2,275 heart sound samples along with their corresponding reports. Subsequently, we develop a knowledge-aware query-based transformer to handle this task. The intent is to leverage the capabilities of medically pre-trained models and the internal knowledge of large language models (LLMs) to address the task’s inherent complexity and variability, thereby enhancing the robustness and scientific validity of the method. Furthermore, our experimental results indicate that our method significantly outperforms traditional HSD approaches and existing multimodal LLMs in detecting key abnormalities in heart sounds.

pdf bib
Automatic Quote Attribution in Chinese Literary Works
Xingxing Yang | Yu Wang
Proceedings of the 10th SIGHAN Workshop on Chinese Language Processing (SIGHAN-10)

Quote attribution in fiction refers to the extraction of dialogues and speaker identification of dialogues, which can be divided into 2 steps: quotation annotation and speaker annotation. We use a pipeline for quote attribution, which involves classification, extractive QA, multi-choice QA, and coreference resolution. We also had an evaluation of our model performance by predicting explicit and implicit speakers using a combination of different models.

pdf bib
M3AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset
Zhe Chen | Heyang Liu | Wenyi Yu | Guangzhi Sun | Hongcheng Liu | Ji Wu | Chao Zhang | Yu Wang | Yanfeng Wang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Publishing open-source academic video recordings is an emergent and prevalent approach to sharing knowledge online. Such videos carry rich multimodal information including speech, the facial and body movements of the speakers, as well as the texts and pictures in the slides and possibly even the papers. Although multiple academic video datasets have been constructed and released, few of them support both multimodal content recognition and understanding tasks, which is partially due to the lack of high-quality human annotations. In this paper, we propose a novel multimodal, multigenre, and multipurpose audio-visual academic lecture dataset (M3AV), which has almost 367 hours of videos from five sources covering computer science, mathematics, and medical and biology topics. With high-quality human annotations of the slide text and spoken words, in particular high-valued name entities, the dataset can be used for multiple audio-visual recognition and understanding tasks. Evaluations performed on contextual speech recognition, speech synthesis, and slide and script generation tasks demonstrate that the diversity of M3AV makes it a challenging dataset.

pdf bib
MM-SAP: A Comprehensive Benchmark for Assessing Self-Awareness of Multimodal Large Language Models in Perception
Yuhao Wang | Yusheng Liao | Heyang Liu | Hongcheng Liu | Yanfeng Wang | Yu Wang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated exceptional capabilities in visual perception and understanding. However, these models also suffer from hallucinations, which limit their reliability as AI systems. We believe that these hallucinations are partially due to the models’ struggle with understanding what they can and cannot perceive from images, a capability we refer to as self-awareness in perception. Despite its importance, this aspect of MLLMs has been overlooked in prior studies. In this paper, we aim to define and evaluate the self-awareness of MLLMs in perception. To do this, we first introduce the knowledge quadrant in perception, which helps define what MLLMs know and do not know about images. Using this framework, we propose a novel benchmark, the Self-Awareness in Perception for MLLMs (MM-SAP), specifically designed to assess this capability. We apply MM-SAP to a variety of popular MLLMs, offering a comprehensive analysis of their self-awareness and providing detailed insights. The experiment results reveal that current MLLMs possess limited self-awareness capabilities, pointing to a crucial area for future advancement in the development of trustworthy MLLMs. Code and data are available at https://github.com/YHWmz/MM-SAP.

pdf bib
CE-VDG: Counterfactual Entropy-based Bias Reduction for Video-grounded Dialogue Generation
Hongcheng Liu | Pingjie Wang | Zhiyuan Zhu | Yanfeng Wang | Yu Wang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The Video-Grounded Dialogue generation (VDG) is a challenging task requiring a comprehensive understanding of the multi-modal information to produce a pertinent response. However, VDG models may rely on dataset bias as a shortcut and fail to learn the multi-modal knowledge from both video and audio. Counterfactual reasoning is an effective method that can estimate and eliminate bias on some special aspects of classification tasks. However, conventional counterfactual reasoning cannot be applied to VDG tasks directly due to the BPE algorithm. In this paper, we reformulate the counterfactual reasoning from the information entropy perspective and extend it from the classification task to the generative task, which can effectively reduce the question-related bias in the auto-regressive generation task. We design CE-VDG to demonstrate the effectiveness in bias elimination of the reformulated counterfactual reasoning by using the proposed counterfactual entropy as an external loss. Extensive experiment results on two popular VDG datasets show the superiority of CE-VDG over the existing baseline method, demonstrating the effective debiasing capability in our model considering counterfactual entropy.

pdf bib
Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn Medical Interview
Heyang Liu | Yanfeng Wang | Yu Wang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

End-to-end (E2E) approach is gradually replacing hybrid models for automatic speech recognition (ASR) tasks. However, the optimization of E2E models lacks an intuitive method for handling decoding shifts, especially in scenarios with a large number of domain-specific rare words that hold specific important meanings. Furthermore, the absence of knowledge-intensive speech datasets in academia has been a significant limiting factor, and the commonly used speech corpora exhibit significant disparities with realistic conversation. To address these challenges, we present Medical Interview (MED-IT), a multi-turn consultation speech dataset that contains a substantial number of knowledge-intensive named entities. We also explore methods to enhance the recognition performance of rare words for E2E models. We propose a novel approach, post-decoder biasing, which constructs a transform probability matrix based on the distribution of training transcriptions. This guides the model to prioritize recognizing words in the biasing list. In our experiments, for subsets of rare words appearing in the training speech between 10 and 20 times, and between 1 and 5 times, the proposed method achieves a relative improvement of 9.3% and 5.1%, respectively.

pdf bib
Pruning before Fine-tuning: A Retraining-free Compression Framework for Pre-trained Language Models
Pingjie Wang | Hongcheng Liu | Yanfeng Wang | Yu Wang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Structured pruning is an effective technique for compressing pre-trained language models (PLMs), reducing model size and improving inference speed for efficient deployment. However, most of existing pruning algorithms require retraining, leading to additional computational overhead. While some retraining-free approaches have been proposed for classification tasks, they still require a fully fine-tuned model for the task, and may cause catastrophic performance degradation on generative tasks. To address these challenges, we propose P-pruning (pre-pruning), an innovative task-specific compression framework. P-pruning prunes redundant modules of PLMs before fine-tuning, reducing the costs associated with fine-tuning. We also introduce a pruning algorithm for this framework, which includes two techniques: (1) module clustering, which clusters the outputs of all heads and neurons based on the task input; and (2) centroid selection, which identifies the most salient element in each cluster and prunes the others. We apply our method to BERT and GPT-2 and evaluate its effectiveness on GLUE, SQuAD, WikiText-2, WikiText-103, and PTB datasets. Experimental results demonstrate that our approach achieves higher performance in both classification and generative tasks, while also reducing the time required for fine-tuning.

2023

pdf bib
Does Listener Gaze in Face-to-Face Interaction Follow the Entropy Rate Constancy Principle: An Empirical Study
Yu Wang | Hendrik Buschmeier
Findings of the Association for Computational Linguistics: EMNLP 2023

It is generally assumed that language (written and spoken) follows the entropy rate constancy (ERC) principle, which states that the information density of a text is constant over time. Recently, this has also been found for nonverbal gestures used in monologue, but it is still unclear whether the ERC principle also applies to listeners’ nonverbal signals. We focus on listeners’ gaze behaviour extracted from video-recorded conversations and trained a transformer-based neural sequence model to process the gaze data of the dialogues and compute its information density. We also compute the information density of the corresponding speech using a pre-trained language model. Our results show (1) that listeners’ gaze behaviour in dialogues roughly follows the ERC principle, as well as (2) a congruence between information density of speech and listeners’ gaze behaviour.

pdf bib
Learning Language-guided Adaptive Hyper-modality Representation for Multimodal Sentiment Analysis
Haoyu Zhang | Yu Wang | Guanghao Yin | Kejun Liu | Yuanyuan Liu | Tianshu Yu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Though Multimodal Sentiment Analysis (MSA) proves effective by utilizing rich information from multiple sources (*e.g.,* language, video, and audio), the potential sentiment-irrelevant and conflicting information across modalities may hinder the performance from being further improved. To alleviate this, we present Adaptive Language-guided Multimodal Transformer (ALMT), which incorporates an Adaptive Hyper-modality Learning (AHL) module to learn an irrelevance/conflict-suppressing representation from visual and audio features under the guidance of language features at different scales. With the obtained hyper-modality representation, the model can obtain a complementary and joint representation through multimodal fusion for effective MSA. In practice, ALMT achieves state-of-the-art performance on several popular datasets (*e.g.,* MOSI, MOSEI and CH-SIMS) and an abundance of ablation demonstrates the validity and necessity of our irrelevance/conflict suppression mechanism.

pdf bib
MedEval: A Multi-Level, Multi-Task, and Multi-Domain Medical Benchmark for Language Model Evaluation
Zexue He | Yu Wang | An Yan | Yao Liu | Eric Chang | Amilcare Gentili | Julian McAuley | Chun-Nan Hsu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Curated datasets for healthcare are often limited due to the need of human annotations from experts. In this paper, we present MedEval, a multi-level, multi-task, and multi-domain medical benchmark to facilitate the development of language models for healthcare. MedEval is comprehensive and consists of data from several healthcare systems and spans 35 human body regions from 8 examination modalities. With 22,779 collected sentences and 21,228 reports, we provide expert annotations at multiple levels, offering a granular potential usage of the data and supporting a wide range of tasks. Moreover, we systematically evaluated 10 generic and domain-specific language models under zero-shot and finetuning settings, from domain-adapted baselines in healthcare to general-purposed state-of-the-art large language models (e.g., ChatGPT). Our evaluations reveal varying effectiveness of the two categories of language models across different tasks, from which we notice the importance of instruction tuning for few-shot usage of large language models. Our investigation paves the way toward benchmarking language models for healthcare and provides valuable insights into the strengths and limitations of adopting large language models in medical domains, informing their practical applications and future advancements.

pdf bib
We Need to Talk About Reproducibility in NLP Model Comparison
Yan Xue | Xuefei Cao | Xingli Yang | Yu Wang | Ruibo Wang | Jihong Li
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

NLPers frequently face reproducibility crisis in a comparison of various models of a real-world NLP task. Many studies have empirically showed that the standard splits tend to produce low reproducible and unreliable conclusions, and they attempted to improve the splits by using more random repetitions. However, the improvement on the reproducibility in a comparison of NLP models is limited attributed to a lack of investigation on the relationship between the reproducibility and the estimator induced by a splitting strategy. In this paper, we formulate the reproducibility in a model comparison into a probabilistic function with regard to a conclusion. Furthermore, we theoretically illustrate that the reproducibility is qualitatively dominated by the signal-to-noise ratio (SNR) of a model performance estimator obtained on a corpus splitting strategy. Specifically, a higher value of the SNR of an estimator probably indicates a better reproducibility. On the basis of the theoretical motivations, we develop a novel mixture estimator of the performance of an NLP model with a regularized corpus splitting strategy based on a blocked 3× 2 cross-validation. We conduct numerical experiments on multiple NLP tasks to show that the proposed estimator achieves a high SNR, and it substantially increases the reproducibility. Therefore, we recommend the NLP practitioners to use the proposed method to compare NLP models instead of the methods based on the widely-used standard splits and the random splits with multiple repetitions.

pdf bib
Self-Improvement of Non-autoregressive Model via Sequence-Level Distillation
Yusheng Liao | Shuyang Jiang | Yiqi Li | Yu Wang | Yanfeng Wang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Although Non-autoregressive Transformer (NAT) models have achieved great success in terms of fast inference speed, this speedup comes with a performance drop due to the inherent multi-modality problem of the NAT model. Previous works commonly alleviate this problem by replacing the target side of the raw data with distilled data generated by Autoregressive Transformer (AT) models. However, the multi-modality problem in the distilled data is still significant and thus limits further improvement of the NAT models. In this paper, we propose a method called Sequence-Level Self-Distillation (SLSD), which aims to generate distilled data by the NAT model itself, eliminating the need for additional teacher networks. Furthermore, SLSD can adapt to different NAT models without precise adjustments since the self-distilled data is generated from the same types of NAT models. We conduct extensive experiments on WMT14 ENDE and WMT16 ENRO and choose four classic NAT models as the backbones to validate the generality and effectiveness of SLSD. The results show that our approach can consistently improve all models on both raw data and distilled data without sacrificing the inference speed.

pdf bib
Existence Justifies Reason: A Data Analysis on Chinese Classifiers Based on Eye Tracking and Transformers
Yu Wang | Emmanuele Chersoni | Chu-Ren Huang
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation

pdf bib
Towards Optimizing Pre-trained Language Model Ensemble Learning for Task-oriented Dialogue System
Zhiyuan Zhu | Yusheng Liao | Zhe Chen | Yu Wang | Yunfeng Guan
Proceedings of The Eleventh Dialog System Technology Challenge

Task-oriented dialogue systems that employ external knowledge to generate informative responses have become an important field of research. This paper outlines our contribution to Track 5 of the Eleventh Dialog System Technology Challenge (DSTC11), which focuses on constructing high-performing, subjective knowledge-enriched task-oriented dialogue systems. Specifically, we investigate the complementarity of various language models to tackle the diverse knowledge selection task that involves multiple external sources. Based on this investigation, we propose pre- and post-generation model ensemble approaches to mitigate potential biases inherent in using a single model for the knowledge selection task. Finally, we utilize the consensus decoding approach to combine fine-tuned ensemble models and improve the performance of the generation system. Our system ranked 1st in human evaluation, even outperforming human annotation.

2022

pdf bib
A New Concept of Knowledge based Question Answering (KBQA) System for Multi-hop Reasoning
Yu Wang | V.srinivasan@samsung.com V.srinivasan@samsung.com | Hongxia Jin
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Knowledge based question answering (KBQA) is a complex task for natural language understanding. Many KBQA approaches have been proposed in recent years, and most of them are trained based on labeled reasoning path. This hinders the system’s performance as many correct reasoning paths are not labeled as ground truth, and thus they cannot be learned. In this paper, we introduce a new concept of KBQA system which can leverage multiple reasoning paths’ information and only requires labeled answer as supervision. We name it as Mutliple Reasoning Paths KBQA System (MRP-QA). We conduct experiments on several benchmark datasets containing both single-hop simple questions as well as muti-hop complex questions, including WebQuestionSP (WQSP), ComplexWebQuestion-1.1 (CWQ), and PathQuestion-Large (PQL), and demonstrate strong performance.

pdf bib
Efficient Federated Learning on Knowledge Graphs via Privacy-preserving Relation Embedding Aggregation
Kai Zhang | Yu Wang | Hongyi Wang | Lifu Huang | Carl Yang | Xun Chen | Lichao Sun
Findings of the Association for Computational Linguistics: EMNLP 2022

Federated learning (FL) can be essential in knowledge representation, reasoning, and data mining applications over multi-source knowledge graphs (KGs). A recent study FedE first proposes an FL framework that shares entity embeddings of KGs across all clients. However, entity embedding sharing from FedE would incur a severe privacy leakage. Specifically, the known entity embedding can be used to infer whether a specific relation between two entities exists in a private client. In this paper, we introduce a novel attack method that aims to recover the original data based on the embedding information, which is further used to evaluate the vulnerabilities of FedE. Furthermore, we propose a Federated learning paradigm with privacy-preserving Relation embedding aggregation (FedR) to tackle the privacy issue in FedE. Besides, relation embedding sharing can significantly reduce the communication cost due to its smaller size of queries. We conduct extensive experiments to evaluate FedR with five different KG embedding models and three datasets. Compared to FedE, FedR achieves similar utility and significant improvements regarding privacy-preserving effect and communication efficiency on the link prediction task.

pdf bib
Controlling Bias Exposure for Fair Interpretable Predictions
Zexue He | Yu Wang | Julian McAuley | Bodhisattwa Prasad Majumder
Findings of the Association for Computational Linguistics: EMNLP 2022

Recent work on reducing bias in NLP models usually focuses on protecting or isolating information related to a sensitive attribute (like gender or race). However, when sensitive information is semantically entangled with the task information of the input, e.g., gender information is predictive for a profession, a fair trade-off between task performance and bias mitigation is difficult to achieve. Existing approaches perform this trade-off by eliminating bias information from the latent space, lacking control over how much bias is necessarily required to be removed. We argue that a favorable debiasing method should use sensitive information ‘fairly’, rather than blindly eliminating it (Caliskan et al., 2017; Sun et al., 2019; Bogen et al., 2020). In this work, we provide a novel debiasing algorithm by adjustingthe predictive model’s belief to (1) ignore the sensitive information if it is not useful for the task; (2) use sensitive information minimally as necessary for the prediction (while also incurring a penalty). Experimental results on two text classification tasks (influenced by gender) and an open-ended generation task (influenced by race) indicate that our model achieves a desirable trade-off between debiasing and task performance along with producing debiased rationales as evidence.

pdf bib
双重否定结构自动识别研究(The Research on Automatic Recognition of the Double Negation Structure)
Yu Wang (王昱) | Yulin Yuan (袁毓林)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“双重否定结构是一种“通过两次否定表示肯定意义”的特殊结构,其存在会对自然语言处理中的语义判断与情感分类产生重要影响。本文以“eg eg P== extgreater P”为标准,对现代汉语中所有的“否定词+否定词”结构进行了遍历研究,将双重否定结构按照格式分为了3大类,25小类,常用双重否定结构或构式132个。结合动词的叙实性、否定焦点、语义否定与语用否定等相关理论,本文归纳了双重否定结构的三大成立条件,并据此设计实现了基于规则的双重否定结构自动识别程序。程序实验的精确率为98.85%,召回率为98.90%,F1值为98.85%。同时,程序还从96281句语料中获得了8640句精确率约为99%的含有双重否定结构的句子,为后续基于统计的深度学习模型提供了语料支持的可能。”

2021

pdf bib
MedAI at SemEval-2021 Task 10: Negation-aware Pre-training for Source-free Negation Detection Domain Adaptation
Jinquan Sun | Qi Zhang | Yu Wang | Lei Zhang
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

Due to the increasing concerns for data privacy, source-free unsupervised domain adaptation attracts more and more research attention, where only a trained source model is assumed to be available, while the labeled source data remain private. To get promising adaptation results, we need to find effective ways to transfer knowledge learned in source domain and leverage useful domain specific information from target domain at the same time. This paper describes our winning contribution to SemEval 2021 Task 10: Source-Free Domain Adaptation for Semantic Processing. Our key idea is to leverage the model trained on source domain data to generate pseudo labels for target domain samples. Besides, we propose Negation-aware Pre-training (NAP) to incorporate negation knowledge into model. Our method win the 1st place with F1-score of 0.822 on the official blind test set of Negation Detection Track.

pdf bib
Chase: A Large-Scale and Pragmatic Chinese Dataset for Cross-Database Context-Dependent Text-to-SQL
Jiaqi Guo | Ziliang Si | Yu Wang | Qian Liu | Ming Fan | Jian-Guang Lou | Zijiang Yang | Ting Liu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

The cross-database context-dependent Text-to-SQL (XDTS) problem has attracted considerable attention in recent years due to its wide range of potential applications. However, we identify two biases in existing datasets for XDTS: (1) a high proportion of context-independent questions and (2) a high proportion of easy SQL queries. These biases conceal the major challenges in XDTS to some extent. In this work, we present Chase, a large-scale and pragmatic Chinese dataset for XDTS. It consists of 5,459 coherent question sequences (17,940 questions with their SQL queries annotated) over 280 databases, in which only 35% of questions are context-independent, and 28% of SQL queries are easy. We experiment on Chase with three state-of-the-art XDTS approaches. The best approach only achieves an exact match accuracy of 40% over all questions and 16% over all question sequences, indicating that Chase highlights the challenging problems of XDTS. We believe that XDTS can provide fertile soil for addressing the problems.

2020

pdf bib
Synchronous Double-channel Recurrent Network for Aspect-Opinion Pair Extraction
Shaowei Chen | Jie Liu | Yu Wang | Wenzheng Zhang | Ziming Chi
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Opinion entity extraction is a fundamental task in fine-grained opinion mining. Related studies generally extract aspects and/or opinion expressions without recognizing the relations between them. However, the relations are crucial for downstream tasks, including sentiment classification, opinion summarization, etc. In this paper, we explore Aspect-Opinion Pair Extraction (AOPE) task, which aims at extracting aspects and opinion expressions in pairs. To deal with this task, we propose Synchronous Double-channel Recurrent Network (SDRN) mainly consisting of an opinion entity extraction unit, a relation detection unit, and a synchronization unit. The opinion entity extraction unit and the relation detection unit are developed as two channels to extract opinion entities and relations simultaneously. Furthermore, within the synchronization unit, we design Entity Synchronization Mechanism (ESM) and Relation Synchronization Mechanism (RSM) to enhance the mutual benefit on the above two channels. To verify the performance of SDRN, we manually build three datasets based on SemEval 2014 and 2015 benchmarks. Extensive experiments demonstrate that SDRN achieves state-of-the-art performances.

pdf bib
The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding
Xiaodong Liu | Yu Wang | Jianshu Ji | Hao Cheng | Xueyun Zhu | Emmanuel Awa | Pengcheng He | Weizhu Chen | Hoifung Poon | Guihong Cao | Jianfeng Gao
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We present MT-DNN, an open-source natural language understanding (NLU) toolkit that makes it easy for researchers and developers to train customized deep learning models. Built upon PyTorch and Transformers, MT-DNN is designed to facilitate rapid customization for a broad spectrum of NLU tasks, using a variety of objectives (classification, regression, structured prediction) and text encoders (e.g., RNNs, BERT, RoBERTa, UniLM). A unique feature of MT-DNN is its built-in support for robust and transferable learning using the adversarial multi-task learning paradigm. To enable efficient production deployment, MT-DNN supports multi-task knowledge distillation, which can substantially compress a deep neural model without significant performance drop. We demonstrate the effectiveness of MT-DNN on a wide range of NLU applications across general and biomedical domains. The software and pre-trained models will be publicly available at https://github.com/namisan/mt-dnn.

pdf bib
How Effective is Task-Agnostic Data Augmentation for Pretrained Transformers?
Shayne Longpre | Yu Wang | Chris DuBois
Findings of the Association for Computational Linguistics: EMNLP 2020

Task-agnostic forms of data augmentation have proven widely effective in computer vision, even on pretrained models. In NLP similar results are reported most commonly for low data regimes, non-pretrained models, or situationally for pretrained models. In this paper we ask how effective these techniques really are when applied to pretrained transformers. Using two popular varieties of task-agnostic data augmentation (not tailored to any particular task), Easy Data Augmentation (Wei andZou, 2019) and Back-Translation (Sennrichet al., 2015), we conduct a systematic examination of their effects across 5 classification tasks, 6 datasets, and 3 variants of modern pretrained transformers, including BERT, XLNet, and RoBERTa. We observe a negative result, finding that techniques which previously reported strong improvements for non-pretrained models fail to consistently improve performance for pretrained transformers, even when training data is limited. We hope this empirical analysis helps inform practitioners where data augmentation techniques may confer improvements.

pdf bib
基于规则的双重否定识别——以“不v1不v2”为例(Double Negative Recognition Based on Rules——Taking “不v1不v2” as an Example)
Yu Wang (王昱)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

“不v1不v2”是汉语中典型的双重否定结构形式之一,它包括“不+助动词+不+v2”(不得不去)、“不+是+不v2”(不是不好)、述宾结构“不v1...不v2”(不认为他不去)等多种双重否定结构,情况复杂。本文以“不v1不v2”为例,结合“元语否定”、“动词叙实性”、“否定焦点”等概念,对“不v1不v2”进行了全面的考察,制定了“不v1不v2”双重否定结构的识别策略。根据识别策略,设计了双重否定自动识别程序,并在此过程中补充了助动词表、非叙实动词表等词库。最终,对28033句语料进行了识别,识别正确率为97.87%,召回率约为93.10%。

pdf bib
HIT: Nested Named Entity Recognition via Head-Tail Pair and Token Interaction
Yu Wang | Yun Li | Hanghang Tong | Ziye Zhu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Named Entity Recognition (NER) is a fundamental task in natural language processing. In order to identify entities with nested structure, many sophisticated methods have been recently developed based on either the traditional sequence labeling approaches or directed hypergraph structures. Despite being successful, these methods often fall short in striking a good balance between the expression power for nested structure and the model complexity. To address this issue, we present a novel nested NER model named HIT. Our proposed HIT model leverages two key properties pertaining to the (nested) named entity, including (1) explicit boundary tokens and (2) tight internal connection between tokens within the boundary. Specifically, we design (1) Head-Tail Detector based on the multi-head self-attention mechanism and bi-affine classifier to detect boundary tokens, and (2) Token Interaction Tagger based on traditional sequence labeling approaches to characterize the internal token connection within the boundary. Experiments on three public NER datasets demonstrate that the proposed HIT achieves state-of-the-art performance.

2019

pdf bib
Single Training Dimension Selection for Word Embedding with PCA
Yu Wang
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

In this paper, we present a fast and reliable method based on PCA to select the number of dimensions for word embeddings. First, we train one embedding with a generous upper bound (e.g. 1,000) of dimensions. Then we transform the embeddings using PCA and incrementally remove the lesser dimensions one at a time while recording the embeddings’ performance on language tasks. Lastly, we select the number of dimensions, balancing model size and accuracy. Experiments using various datasets and language tasks demonstrate that we are able to train about 10 times fewer sets of embeddings while retaining optimal performance. Researchers interested in training the best-performing embeddings for downstream tasks, such as sentiment analysis, question answering and hypernym extraction, as well as those interested in embedding compression should find the method helpful.

2018

pdf bib
A Bi-Model Based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling
Yu Wang | Yilin Shen | Hongxia Jin
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

Intent detection and slot filling are two main tasks for building a spoken language understanding(SLU) system. Multiple deep learning based models have demonstrated good results on these tasks . The most effective algorithms are based on the structures of sequence to sequence models (or “encoder-decoder” models), and generate the intents and semantic tags either using separate models. Most of the previous studies, however, either treat the intent detection and slot filling as two separate parallel tasks, or use a sequence to sequence model to generate both semantic tags and intent. None of the approaches consider the cross-impact between the intent detection task and the slot filling task. In this paper, new Bi-model based RNN semantic frame parsing network structures are designed to perform the intent detection and slot filling tasks jointly, by considering their cross-impact to each other using two correlated bidirectional LSTMs (BLSTM). Our Bi-model structure with a decoder achieves state-of-art result on the benchmark ATIS data, with about 0.5% intent accuracy improvement and 0.9 % slot filling improvement.

pdf bib
A New Concept of Deep Reinforcement Learning based Augmented General Tagging System
Yu Wang | Abhishek Patel | Hongxia Jin
Proceedings of the 27th International Conference on Computational Linguistics

In this paper, a new deep reinforcement learning based augmented general tagging system is proposed. The new system contains two parts: a deep neural network (DNN) based sequence labeling model and a deep reinforcement learning (DRL) based augmented tagger. The augmented tagger helps improve system performance by modeling the data with minority tags. The new system is evaluated on SLU and NLU sequence labeling tasks using ATIS and CoNLL-2003 benchmark datasets, to demonstrate the new system’s outstanding performance on general tagging tasks. Evaluated by F1 scores, it shows that the new system outperforms the current state-of-the-art model on ATIS dataset by 1.9% and that on CoNLL-2003 dataset by 1.4%.

pdf bib
A Neural Transition-based Model for Nested Mention Recognition
Bailin Wang | Wei Lu | Yu Wang | Hongxia Jin
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

It is common that entity mentions can contain other mentions recursively. This paper introduces a scalable transition-based method to model the nested structure of mentions. We first map a sentence with nested mentions to a designated forest where each mention corresponds to a constituent of the forest. Our shift-reduce based system then learns to construct the forest structure in a bottom-up manner through an action sequence whose maximal length is guaranteed to be three times of the sentence length. Based on Stack-LSTM which is employed to efficiently and effectively represent the states of the system in a continuous space, our system is further incorporated with a character-based component to capture letter-level patterns. Our model gets the state-of-the-art performances in ACE datasets, showing its effectiveness in detecting nested mentions.

2016

pdf bib
Off-topic Response Detection for Spontaneous Spoken English Assessment
Andrey Malinin | Rogier Van Dalen | Kate Knill | Yu Wang | Mark Gales
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2014

pdf bib
Towards Tracking Political Sentiment through Microblog Data
Yu Wang | Tom Clark | Jeffrey Staton | Eugene Agichtein
Proceedings of the Joint Workshop on Social Dynamics and Personal Attributes in Social Media

2010

pdf bib
Query Ambiguity Revisited: Clickthrough Measures for Distinguishing Informational and Ambiguous Queries
Yu Wang | Eugene Agichtein
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics