Xiang Wan


2022

pdf bib
A Simple yet Effective Relation Information Guided Approach for Few-Shot Relation Extraction
Yang Liu | Jinpeng Hu | Xiang Wan | Tsung-Hui Chang
Findings of the Association for Computational Linguistics: ACL 2022

Few-Shot Relation Extraction aims at predicting the relation for a pair of entities in a sentence by training with a few labelled examples in each relation. Some recent works have introduced relation information (i.e., relation labels or descriptions) to assist model learning based on Prototype Network. However, most of them constrain the prototypes of each relation class implicitly with relation information, generally through designing complex network structures, like generating hybrid features, combining with contrastive learning or attention networks. We argue that relation information can be introduced more explicitly and effectively into the model. Thus, this paper proposes a direct addition approach to introduce relation information. Specifically, for each relation class, the relation representation is first generated by concatenating two views of relations (i.e., [CLS] token embedding and the mean value of embeddings of all tokens) and then directly added to the original prototype for both train and prediction. Experimental results on the benchmark dataset FewRel 1.0 show significant improvements and achieve comparable results to the state-of-the-art, which demonstrates the effectiveness of our proposed approach. Besides, further analyses verify that the direct addition is a much more effective way to integrate the relation representations and the original prototypes.

pdf bib
Learn from Relation Information: Towards Prototype Representation Rectification for Few-Shot Relation Extraction
Yang Liu | Jinpeng Hu | Xiang Wan | Tsung-Hui Chang
Findings of the Association for Computational Linguistics: NAACL 2022

Few-shot Relation Extraction refers to fast adaptation to novel relation classes with few samples through training on the known relation classes. Most existing methods focus on implicitly introducing relation information (i.e., relation label or relation description) to constrain the prototype representation learning, such as contrastive learning, graphs, and specifically designed attentions, which may bring useless and even harmful parameters. Besides, these approaches are limited in handing outlier samples far away from the class center due to the weakly implicit constraint. In this paper, we propose an effective and parameter-less Prototype Rectification Method (PRM) to promote few-shot relation extraction, where we utilize a prototype rectification module to rectify original prototypes explicitly by the relation information. Specifically, PRM is composed of two gate mechanisms. One gate decides how much of the original prototype remains, and another one updates the remained prototype with relation information. In doing so, better and stabler global relation information can be captured for guiding prototype representations, and thus PRM can robustly deal with outliers. Moreover, we also extend PRM to both none-of-the-above (NOTA) and domain adaptation scenarios. Experimental results on FewRel 1.0 and 2.0 datasets demonstrate the effectiveness of our proposed method, which achieves state-of-the-art performance.

pdf bib
A Label-Aware Autoregressive Framework for Cross-Domain NER
Jinpeng Hu | He Zhao | Dan Guo | Xiang Wan | Tsung-Hui Chang
Findings of the Association for Computational Linguistics: NAACL 2022

Cross-domain named entity recognition (NER) aims to borrow the entity information from the source domain to help the entity recognition in the target domain with limited labeled data. Despite the promising performance of existing approaches, most of them focus on reducing the discrepancy of token representation between source and target domains, while the transfer of the valuable label information is often not explicitly considered or even ignored. Therefore, we propose a novel autoregressive framework to advance cross-domain NER by first enhancing the relationship between labels and tokens and then further improving the transferability of label information. Specifically, we associate each label with an embedding vector, and for each token, we utilize a bidirectional LSTM (Bi-LSTM) to encode the labels of its previous tokens for modeling internal context information and label dependence. Afterward, we propose a Bi-Attention module that merges the token representation from a pre-trained model and the label features from the Bi-LSTM as the label-aware information, which is concatenated to the token representation to facilitate cross-domain NER. In doing so, label information contained in the embedding vectors can be effectively transferred to the target domain, and Bi-LSTM can further model the label relationship among different domains by pre-train and then fine-tune setting. Experimental results on several datasets confirm the effectiveness of our model, where our model achieves significant improvements over the state of the arts.

pdf bib
Hero-Gang Neural Model For Named Entity Recognition
Jinpeng Hu | Yaling Shen | Yang Liu | Xiang Wan | Tsung-Hui Chang
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Named entity recognition (NER) is a fundamental and important task in NLP, aiming at identifying named entities (NEs) from free text. Recently, since the multi-head attention mechanism applied in the Transformer model can effectively capture longer contextual information, Transformer-based models have become the mainstream methods and have achieved significant performance in this task. Unfortunately, although these models can capture effective global context information, they are still limited in the local feature and position information extraction, which is critical in NER. In this paper, to address this limitation, we propose a novel Hero-Gang Neural structure (HGN), including the Hero and Gang module, to leverage both global and local information to promote NER. Specifically, the Hero module is composed of a Transformer-based encoder to maintain the advantage of the self-attention mechanism, and the Gang module utilizes a multi-window recurrent module to extract local features and position information under the guidance of the Hero module. Afterward, the proposed multi-window attention effectively combines global information and multiple local features for predicting entity labels. Experimental results on several benchmark datasets demonstrate the effectiveness of our proposed model.

pdf bib
Graph Enhanced Contrastive Learning for Radiology Findings Summarization
Jinpeng Hu | Zhuo Li | Zhihong Chen | Zhen Li | Xiang Wan | Tsung-Hui Chang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The impression section of a radiology report summarizes the most prominent observation from the findings section and is the most important section for radiologists to communicate to physicians. Summarizing findings is time-consuming and can be prone to error for inexperienced radiologists, and thus automatic impression generation has attracted substantial attention. With the encoder-decoder framework, most previous studies explore incorporating extra knowledge (e.g., static pre-defined clinical ontologies or extra background information). Yet, they encode such knowledge by a separate encoder to treat it as an extra input to their models, which is limited in leveraging their relations with the original findings. To address the limitation, we propose a unified framework for exploiting both extra knowledge and the original findings in an integrated way so that the critical information (i.e., key words and their relations) can be extracted in an appropriate way to facilitate impression generation. In detail, for each input findings, it is encoded by a text encoder and a graph is constructed through its entities and dependency tree. Then, a graph encoder (e.g., graph neural networks (GNNs)) is adopted to model relation information in the constructed graph. Finally, to emphasize the key words in the findings, contrastive learning is introduced to map positive samples (constructed by masking non-key words) closer and push apart negative ones (constructed by masking key words). The experimental results on two datasets, OpenI and MIMIC-CXR, confirm the effectiveness of our proposed method, where the state-of-the-art results are achieved.

2021

pdf bib
Relation Extraction with Type-aware Map Memories of Word Dependencies
Guimin Chen | Yuanhe Tian | Yan Song | Xiang Wan
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Word Graph Guided Summarization for Radiology Findings
Jinpeng Hu | Jianling Li | Zhihong Chen | Yaling Shen | Yan Song | Xiang Wan | Tsung-Hui Chang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Dependency-driven Relation Extraction with Attentive Graph Convolutional Networks
Yuanhe Tian | Guimin Chen | Yan Song | Xiang Wan
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Syntactic information, especially dependency trees, has been widely used by existing studies to improve relation extraction with better semantic guidance for analyzing the context information associated with the given entities. However, most existing studies suffer from the noise in the dependency trees, especially when they are automatically generated, so that intensively leveraging dependency information may introduce confusions to relation classification and necessary pruning is of great importance in this task. In this paper, we propose a dependency-driven approach for relation extraction with attentive graph convolutional networks (A-GCN). In this approach, an attention mechanism upon graph convolutional networks is applied to different contextual words in the dependency tree obtained from an off-the-shelf dependency parser, to distinguish the importance of different word dependencies. Consider that dependency types among words also contain important contextual guidance, which is potentially helpful for relation extraction, we also include the type information in A-GCN modeling. Experimental results on two English benchmark datasets demonstrate the effectiveness of our A-GCN, which outperforms previous studies and achieves state-of-the-art performance on both datasets.

pdf bib
Cross-modal Memory Networks for Radiology Report Generation
Zhihong Chen | Yaling Shen | Yan Song | Xiang Wan
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Medical imaging plays a significant role in clinical practice of medical diagnosis, where the text reports of the images are essential in understanding them and facilitating later treatments. By generating the reports automatically, it is beneficial to help lighten the burden of radiologists and significantly promote clinical automation, which already attracts much attention in applying artificial intelligence to medical domain. Previous studies mainly follow the encoder-decoder paradigm and focus on the aspect of text generation, with few studies considering the importance of cross-modal mappings and explicitly exploit such mappings to facilitate radiology report generation. In this paper, we propose a cross-modal memory networks (CMN) to enhance the encoder-decoder framework for radiology report generation, where a shared memory is designed to record the alignment between images and texts so as to facilitate the interaction and generation across modalities. Experimental results illustrate the effectiveness of our proposed model, where state-of-the-art performance is achieved on two widely used benchmark datasets, i.e., IU X-Ray and MIMIC-CXR. Further analyses also prove that our model is able to better align information from radiology images and texts so as to help generating more accurate reports in terms of clinical indicators.

pdf bib
Field Embedding: A Unified Grain-Based Framework for Word Representation
Junjie Luo | Xi Chen | Jichao Sun | Yuejia Xiang | Ningyu Zhang | Xiang Wan
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Word representations empowered with additional linguistic information have been widely studied and proved to outperform traditional embeddings. Current methods mainly focus on learning embeddings for words while embeddings of linguistic information (referred to as grain embeddings) are discarded after the learning. This work proposes a framework field embedding to jointly learn both word and grain embeddings by incorporating morphological, phonetic, and syntactical linguistic fields. The framework leverages an innovative fine-grained pipeline that integrates multiple linguistic fields and produces high-quality grain sequences for learning supreme word representations. A novel algorithm is also designed to learn embeddings for words and grains by capturing information that is contained within each field and that is shared across them. Experimental results of lexical tasks and downstream natural language processing tasks illustrate that our framework can learn better word embeddings and grain embeddings. Qualitative evaluations show grain embeddings effectively capture the semantic information.

pdf bib
Exploring Word Segmentation and Medical Concept Recognition for Chinese Medical Texts
Yang Liu | Yuanhe Tian | Tsung-Hui Chang | Song Wu | Xiang Wan | Yan Song
Proceedings of the 20th Workshop on Biomedical Language Processing

Chinese word segmentation (CWS) and medical concept recognition are two fundamental tasks to process Chinese electronic medical records (EMRs) and play important roles in downstream tasks for understanding Chinese EMRs. One challenge to these tasks is the lack of medical domain datasets with high-quality annotations, especially medical-related tags that reveal the characteristics of Chinese EMRs. In this paper, we collected a Chinese EMR corpus, namely, ACEMR, with human annotations for Chinese word segmentation and EMR-related tags. On the ACEMR corpus, we run well-known models (i.e., BiLSTM, BERT, and ZEN) and existing state-of-the-art systems (e.g., WMSeg and TwASP) for CWS and medical concept recognition. Experimental results demonstrate the necessity of building a dedicated medical dataset and show that models that leverage extra resources achieve the best performance for both tasks, which provides certain guidance for future studies on model selection in the medical domain.

2020

pdf bib
Improving Named Entity Recognition with Attentive Ensemble of Syntactic Information
Yuyang Nie | Yuanhe Tian | Yan Song | Xiang Ao | Xiang Wan
Findings of the Association for Computational Linguistics: EMNLP 2020

Named entity recognition (NER) is highly sensitive to sentential syntactic and semantic properties where entities may be extracted according to how they are used and placed in the running text. To model such properties, one could rely on existing resources to providing helpful knowledge to the NER task; some existing studies proved the effectiveness of doing so, and yet are limited in appropriately leveraging the knowledge such as distinguishing the important ones for particular context. In this paper, we improve NER by leveraging different types of syntactic information through attentive ensemble, which functionalizes by the proposed key-value memory networks, syntax attention, and the gate mechanism for encoding, weighting and aggregating such syntactic information, respectively. Experimental results on six English and Chinese benchmark datasets suggest the effectiveness of the proposed model and show that it outperforms previous studies on all experiment datasets.

pdf bib
Named Entity Recognition for Social Media Texts with Semantic Augmentation
Yuyang Nie | Yuanhe Tian | Xiang Wan | Yan Song | Bo Dai
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Existing approaches for named entity recognition suffer from data sparsity problems when conducted on short and informal texts, especially user-generated social media content. Semantic augmentation is a potential way to alleviate this problem. Given that rich semantic information is implicitly preserved in pre-trained word embeddings, they are potential ideal resources for semantic augmentation. In this paper, we propose a neural-based approach to NER for social media texts where both local (from running text) and augmented semantics are taken into account. In particular, we obtain the augmented semantic information from a large-scale corpus, and propose an attentive semantic augmentation module and a gate module to encode and aggregate such information, respectively. Extensive experiments are performed on three benchmark datasets collected from English and Chinese social media platforms, where the results demonstrate the superiority of our approach to previous studies across all three datasets.

pdf bib
Generating Radiology Reports via Memory-driven Transformer
Zhihong Chen | Yan Song | Tsung-Hui Chang | Xiang Wan
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Medical imaging is frequently used in clinical practice and trials for diagnosis and treatment. Writing imaging reports is time-consuming and can be error-prone for inexperienced radiologists. Therefore, automatically generating radiology reports is highly desired to lighten the workload of radiologists and accordingly promote clinical automation, which is an essential task to apply artificial intelligence to the medical domain. In this paper, we propose to generate radiology reports with memory-driven Transformer, where a relational memory is designed to record key information of the generation process and a memory-driven conditional layer normalization is applied to incorporating the memory into the decoder of Transformer. Experimental results on two prevailing radiology report datasets, IU X-Ray and MIMIC-CXR, show that our proposed approach outperforms previous models with respect to both language generation metrics and clinical evaluations. Particularly, this is the first work reporting the generation results on MIMIC-CXR to the best of our knowledge. Further analyses also demonstrate that our approach is able to generate long reports with necessary medical terms as well as meaningful image-text attention mappings.