Yan Song


2021

pdf bib
Enhancing Aspect-level Sentiment Analysis with Word Dependencies
Yuanhe Tian | Guimin Chen | Yan Song
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Aspect-level sentiment analysis (ASA) has received much attention in recent years. Most existing approaches tried to leverage syntactic information, such as the dependency parsing results of the input text, to improve sentiment analysis on different aspects. Although these approaches achieved satisfying results, their main focus is to leverage the dependency arcs among words where the dependency type information is omitted; and they model different dependencies equally where the noisy dependency results may hurt model performance. In this paper, we propose an approach to enhance aspect-level sentiment analysis with word dependencies, where the type information is modeled by key-value memory networks and different dependency results are selectively leveraged. Experimental results on five benchmark datasets demonstrate the effectiveness of our approach, where it outperforms baseline models on all datasets and achieves state-of-the-art performance on three of them.

pdf bib
Aspect-based Sentiment Analysis with Type-aware Graph Convolutional Networks and Layer Ensemble
Yuanhe Tian | Guimin Chen | Yan Song
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

It is popular that neural graph-based models are applied in existing aspect-based sentiment analysis (ABSA) studies for utilizing word relations through dependency parses to facilitate the task with better semantic guidance for analyzing context and aspect words. However, most of these studies only leverage dependency relations without considering their dependency types, and are limited in lacking efficient mechanisms to distinguish the important relations as well as learn from different layers of graph based models. To address such limitations, in this paper, we propose an approach to explicitly utilize dependency types for ABSA with type-aware graph convolutional networks (T-GCN), where attention is used in T-GCN to distinguish different edges (relations) in the graph and attentive layer ensemble is proposed to comprehensively learn from different layers of T-GCN. The validity and effectiveness of our approach are demonstrated in the experimental results, where state-of-the-art performance is achieved on six English benchmark datasets. Further experiments are conducted to analyze the contributions of each component in our approach and illustrate how different layers in T-GCN help ABSA with quantitative and qualitative analysis.

pdf bib
Taming Pre-trained Language Models with N-gram Representations for Low-Resource Domain Adaptation
Shizhe Diao | Ruijia Xu | Hongjin Su | Yilei Jiang | Yan Song | Tong Zhang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Large pre-trained models such as BERT are known to improve different downstream NLP tasks, even when such a model is trained on a generic domain. Moreover, recent studies have shown that when large domain-specific corpora are available, continued pre-training on domain-specific data can further improve the performance of in-domain tasks. However, this practice requires significant domain-specific data and computational resources which may not always be available. In this paper, we aim to adapt a generic pretrained model with a relatively small amount of domain-specific data. We demonstrate that by explicitly incorporating multi-granularity information of unseen and domain-specific words via the adaptation of (word based) n-grams, the performance of a generic pretrained model can be greatly improved. Specifically, we introduce a Transformer-based Domain-aware N-gram Adaptor, T-DNA, to effectively learn and incorporate the semantic representation of different combinations of words in the new domain. Experimental results illustrate the effectiveness of T-DNA on eight low-resource downstream tasks from four domains. We show that T-DNA is able to achieve significant improvements compared to existing methods on most tasks using limited data with lower computational costs. Moreover, further analyses demonstrate the importance and effectiveness of both unseen words and the information of different granularities. Our code is available at https://github.com/shizhediao/T-DNA.

pdf bib
Dependency-driven Relation Extraction with Attentive Graph Convolutional Networks
Yuanhe Tian | Guimin Chen | Yan Song | Xiang Wan
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Syntactic information, especially dependency trees, has been widely used by existing studies to improve relation extraction with better semantic guidance for analyzing the context information associated with the given entities. However, most existing studies suffer from the noise in the dependency trees, especially when they are automatically generated, so that intensively leveraging dependency information may introduce confusions to relation classification and necessary pruning is of great importance in this task. In this paper, we propose a dependency-driven approach for relation extraction with attentive graph convolutional networks (A-GCN). In this approach, an attention mechanism upon graph convolutional networks is applied to different contextual words in the dependency tree obtained from an off-the-shelf dependency parser, to distinguish the importance of different word dependencies. Consider that dependency types among words also contain important contextual guidance, which is potentially helpful for relation extraction, we also include the type information in A-GCN modeling. Experimental results on two English benchmark datasets demonstrate the effectiveness of our A-GCN, which outperforms previous studies and achieves state-of-the-art performance on both datasets.

pdf bib
Cross-modal Memory Networks for Radiology Report Generation
Zhihong Chen | Yaling Shen | Yan Song | Xiang Wan
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Medical imaging plays a significant role in clinical practice of medical diagnosis, where the text reports of the images are essential in understanding them and facilitating later treatments. By generating the reports automatically, it is beneficial to help lighten the burden of radiologists and significantly promote clinical automation, which already attracts much attention in applying artificial intelligence to medical domain. Previous studies mainly follow the encoder-decoder paradigm and focus on the aspect of text generation, with few studies considering the importance of cross-modal mappings and explicitly exploit such mappings to facilitate radiology report generation. In this paper, we propose a cross-modal memory networks (CMN) to enhance the encoder-decoder framework for radiology report generation, where a shared memory is designed to record the alignment between images and texts so as to facilitate the interaction and generation across modalities. Experimental results illustrate the effectiveness of our proposed model, where state-of-the-art performance is achieved on two widely used benchmark datasets, i.e., IU X-Ray and MIMIC-CXR. Further analyses also prove that our model is able to better align information from radiology images and texts so as to help generating more accurate reports in terms of clinical indicators.

pdf bib
Improving Arabic Diacritization with Regularized Decoding and Adversarial Training
Han Qin | Guimin Chen | Yuanhe Tian | Yan Song
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Arabic diacritization is a fundamental task for Arabic language processing. Previous studies have demonstrated that automatically generated knowledge can be helpful to this task. However, these studies regard the auto-generated knowledge instances as gold references, which limits their effectiveness since such knowledge is not always accurate and inferior instances can lead to incorrect predictions. In this paper, we propose to use regularized decoding and adversarial training to appropriately learn from such noisy knowledge for diacritization. Experimental results on two benchmark datasets show that, even with quite flawed auto-generated knowledge, our model can still learn adequate diacritics and outperform all previous studies, on both datasets.

pdf bib
Exploring Word Segmentation and Medical Concept Recognition for Chinese Medical Texts
Yang Liu | Yuanhe Tian | Tsung-Hui Chang | Song Wu | Xiang Wan | Yan Song
Proceedings of the 20th Workshop on Biomedical Language Processing

Chinese word segmentation (CWS) and medical concept recognition are two fundamental tasks to process Chinese electronic medical records (EMRs) and play important roles in downstream tasks for understanding Chinese EMRs. One challenge to these tasks is the lack of medical domain datasets with high-quality annotations, especially medical-related tags that reveal the characteristics of Chinese EMRs. In this paper, we collected a Chinese EMR corpus, namely, ACEMR, with human annotations for Chinese word segmentation and EMR-related tags. On the ACEMR corpus, we run well-known models (i.e., BiLSTM, BERT, and ZEN) and existing state-of-the-art systems (e.g., WMSeg and TwASP) for CWS and medical concept recognition. Experimental results demonstrate the necessity of building a dedicated medical dataset and show that models that leverage extra resources achieve the best performance for both tasks, which provides certain guidance for future studies on model selection in the medical domain.

pdf bib
RevCore: Review-Augmented Conversational Recommendation
Yu Lu | Junwei Bao | Yan Song | Zichen Ma | Shuguang Cui | Youzheng Wu | Xiaodong He
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Relation Extraction with Type-aware Map Memories of Word Dependencies
Guimin Chen | Yuanhe Tian | Yan Song | Xiang Wan
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Federated Chinese Word Segmentation with Global Character Associations
Yuanhe Tian | Guimin Chen | Han Qin | Yan Song
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
TILGAN: Transformer-based Implicit Latent GAN for Diverse and Coherent Text Generation
Shizhe Diao | Xinwei Shen | Kashun Shum | Yan Song | Tong Zhang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Word Graph Guided Summarization for Radiology Findings
Jinpeng Hu | Jianling Li | Zhihong Chen | Yaling Shen | Yan Song | Xiang Wan | Tsung-Hui Chang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib
Joint Aspect Extraction and Sentiment Analysis with Directional Graph Convolutional Networks
Guimin Chen | Yuanhe Tian | Yan Song
Proceedings of the 28th International Conference on Computational Linguistics

End-to-end aspect-based sentiment analysis (EASA) consists of two sub-tasks: the first extracts the aspect terms in a sentence and the second predicts the sentiment polarities for such terms. For EASA, compared to pipeline and multi-task approaches, joint aspect extraction and sentiment analysis provides a one-step solution to predict both aspect terms and their sentiment polarities through a single decoding process, which avoid the mismatches in between the results of aspect terms and sentiment polarities, as well as error propagation. Previous studies, especially recent ones, for this task focus on using powerful encoders (e.g., Bi-LSTM and BERT) to model contextual information from the input, with limited efforts paid to using advanced neural architectures (such as attentions and graph convolutional networks) or leveraging extra knowledge (such as syntactic information). To extend such efforts, in this paper, we propose directional graph convolutional networks (D-GCN) to jointly perform aspect extraction and sentiment analysis with encoding syntactic information, where dependency among words are integrated in our model to enhance its ability of representing input sentences and help EASA accordingly. Experimental results on three benchmark datasets demonstrate the effectiveness of our approach, where D-GCN achieves state-of-the-art performance on all datasets.

pdf bib
Summarizing Medical Conversations via Identifying Important Utterances
Yan Song | Yuanhe Tian | Nan Wang | Fei Xia
Proceedings of the 28th International Conference on Computational Linguistics

Summarization is an important natural language processing (NLP) task in identifying key information from text. For conversations, the summarization systems need to extract salient contents from spontaneous utterances by multiple speakers. In a special task-oriented scenario, namely medical conversations between patients and doctors, the symptoms, diagnoses, and treatments could be highly important because the nature of such conversation is to find a medical solution to the problem proposed by the patients. Especially consider that current online medical platforms provide millions of public available conversations between real patients and doctors, where the patients propose their medical problems and the registered doctors offer diagnosis and treatment, a conversation in most cases could be too long and the key information is hard to be located. Therefore, summarizations to the patients’ problems and the doctors’ treatments in the conversations can be highly useful, in terms of helping other patients with similar problems have a precise reference for potential medical solutions. In this paper, we focus on medical conversation summarization, using a dataset of medical conversations and corresponding summaries which were crawled from a well-known online healthcare service provider in China. We propose a hierarchical encoder-tagger model (HET) to generate summaries by identifying important utterances (with respect to problem proposing and solving) in the conversations. For the particular dataset used in this study, we show that high-quality summaries can be generated by extracting two types of utterances, namely, problem statements and treatment recommendations. Experimental results demonstrate that HET outperforms strong baselines and models from previous studies, and adding conversation-related features can further improve system performance.

pdf bib
Meet Changes with Constancy: Learning Invariance in Multi-Source Translation
Jianfeng Liu | Ling Luo | Xiang Ao | Yan Song | Haoran Xu | Jian Ye
Proceedings of the 28th International Conference on Computational Linguistics

Multi-source neural machine translation aims to translate from parallel sources of information (e.g. languages, images, etc.) to a single target language, which has shown better performance than most one-to-one systems. Despite the remarkable success of existing models, they usually neglect the fact that multiple source inputs may have inconsistencies. Such differences might bring noise to the task and limit the performance of existing multi-source NMT approaches due to their indiscriminate usage of input sources for target word predictions. In this paper, we attempt to leverage the potential complementary information among distinct sources and alleviate the occasional conflicts of them. To accomplish that, we propose a source invariance network to learn the invariant information of parallel sources. Such network can be easily integrated with multi-encoder based multi-source NMT methods (e.g. multi-encoder RNN and transformer) to enhance the translation results. Extensive experiments on two multi-source translation tasks demonstrate that the proposed approach not only achieves clear gains in translation quality but also captures implicit invariance between different sources.

pdf bib
Joint Chinese Word Segmentation and Part-of-speech Tagging via Multi-channel Attention of Character N-grams
Yuanhe Tian | Yan Song | Fei Xia
Proceedings of the 28th International Conference on Computational Linguistics

Chinese word segmentation (CWS) and part-of-speech (POS) tagging are two fundamental tasks for Chinese language processing. Previous studies have demonstrated that jointly performing them can be an effective one-step solution to both tasks and this joint task can benefit from a good modeling of contextual features such as n-grams. However, their work on modeling such contextual features is limited to concatenating the features or their embeddings directly with the input embeddings without distinguishing whether the contextual features are important for the joint task in the specific context. Therefore, their models for the joint task could be misled by unimportant contextual information. In this paper, we propose a character-based neural model for the joint task enhanced by multi-channel attention of n-grams. In the attention module, n-gram features are categorized into different groups according to several criteria, and n-grams in each group are weighted and distinguished according to their importance for the joint task in the specific context. To categorize n-grams, we try two criteria in this study, i.e., n-gram frequency and length, so that n-grams having different capabilities of carrying contextual information are discriminatively learned by our proposed attention module. Experimental results on five benchmark datasets for CWS and POS tagging demonstrate that our approach outperforms strong baseline models and achieves state-of-the-art performance on all five datasets.

pdf bib
Improving Constituency Parsing with Span Attention
Yuanhe Tian | Yan Song | Fei Xia | Tong Zhang
Findings of the Association for Computational Linguistics: EMNLP 2020

Constituency parsing is a fundamental and important task for natural language understanding, where a good representation of contextual information can help this task. N-grams, which is a conventional type of feature for contextual information, have been demonstrated to be useful in many tasks, and thus could also be beneficial for constituency parsing if they are appropriately modeled. In this paper, we propose span attention for neural chart-based constituency parsing to leverage n-gram information. Considering that current chart-based parsers with Transformer-based encoder represent spans by subtraction of the hidden states at the span boundaries, which may cause information loss especially for long spans, we incorporate n-grams into span representations by weighting them according to their contributions to the parsing process. Moreover, we propose categorical span attention to further enhance the model by weighting n-grams within different length categories, and thus benefit long-sentence parsing. Experimental results on three widely used benchmark datasets demonstrate the effectiveness of our approach in parsing Arabic, Chinese, and English, where state-of-the-art performance is obtained by our approach on all of them.

pdf bib
Improving Named Entity Recognition with Attentive Ensemble of Syntactic Information
Yuyang Nie | Yuanhe Tian | Yan Song | Xiang Ao | Xiang Wan
Findings of the Association for Computational Linguistics: EMNLP 2020

Named entity recognition (NER) is highly sensitive to sentential syntactic and semantic properties where entities may be extracted according to how they are used and placed in the running text. To model such properties, one could rely on existing resources to providing helpful knowledge to the NER task; some existing studies proved the effectiveness of doing so, and yet are limited in appropriately leveraging the knowledge such as distinguishing the important ones for particular context. In this paper, we improve NER by leveraging different types of syntactic information through attentive ensemble, which functionalizes by the proposed key-value memory networks, syntax attention, and the gate mechanism for encoding, weighting and aggregating such syntactic information, respectively. Experimental results on six English and Chinese benchmark datasets suggest the effectiveness of the proposed model and show that it outperforms previous studies on all experiment datasets.

pdf bib
ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations
Shizhe Diao | Jiaxin Bai | Yan Song | Tong Zhang | Yonggang Wang
Findings of the Association for Computational Linguistics: EMNLP 2020

The pre-training of text encoders normally processes text as a sequence of tokens corresponding to small text units, such as word pieces in English and characters in Chinese. It omits information carried by larger text granularity, and thus the encoders cannot easily adapt to certain combinations of characters. This leads to a loss of important semantic information, which is especially problematic for Chinese because the language does not have explicit word boundaries. In this paper, we propose ZEN, a BERT-based Chinese text encoder enhanced by n-gram representations, where different combinations of characters are considered during training, thus potential word or phrase boundaries are explicitly pre-trained and fine-tuned with the character encoder (BERT). Therefore ZEN incorporates the comprehensive information of both the character sequence and words or phrases it contains. Experimental results illustrated the effectiveness of ZEN on a series of Chinese NLP tasks, where state-of-the-art results is achieved on most tasks with requiring less resource than other published encoders. It is also shown that reasonable performance is obtained when ZEN is trained on a small corpus, which is important for applying pre-training techniques to scenarios with limited data. The code and pre-trained models of ZEN are available at https://github.com/sinovation/ZEN.

pdf bib
Conditional Augmentation for Aspect Term Extraction via Masked Sequence-to-Sequence Generation
Kun Li | Chengbo Chen | Xiaojun Quan | Qing Ling | Yan Song
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Aspect term extraction aims to extract aspect terms from review texts as opinion targets for sentiment analysis. One of the big challenges with this task is the lack of sufficient annotated data. While data augmentation is potentially an effective technique to address the above issue, it is uncontrollable as it may change aspect words and aspect labels unexpectedly. In this paper, we formulate the data augmentation as a conditional generation task: generating a new sentence while preserving the original opinion targets and labels. We propose a masked sequence-to-sequence method for conditional augmentation of aspect term extraction. Unlike existing augmentation approaches, ours is controllable and allows to generate more diversified sentences. Experimental results confirm that our method alleviates the data scarcity problem significantly. It also effectively boosts the performances of several current models for aspect term extraction.

pdf bib
Improving Chinese Word Segmentation with Wordhood Memory Networks
Yuanhe Tian | Yan Song | Fei Xia | Tong Zhang | Yonggang Wang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Contextual features always play an important role in Chinese word segmentation (CWS). Wordhood information, being one of the contextual features, is proved to be useful in many conventional character-based segmenters. However, this feature receives less attention in recent neural models and it is also challenging to design a framework that can properly integrate wordhood information from different wordhood measures to existing neural frameworks. In this paper, we therefore propose a neural framework, WMSeg, which uses memory networks to incorporate wordhood information with several popular encoder-decoder combinations for CWS. Experimental results on five benchmark datasets indicate the memory mechanism successfully models wordhood information for neural segmenters and helps WMSeg achieve state-of-the-art performance on all those datasets. Further experiments and analyses also demonstrate the robustness of our proposed framework with respect to different wordhood measures and the efficiency of wordhood information in cross-domain experiments.

pdf bib
Joint Chinese Word Segmentation and Part-of-speech Tagging via Two-way Attentions of Auto-analyzed Knowledge
Yuanhe Tian | Yan Song | Xiang Ao | Fei Xia | Xiaojun Quan | Tong Zhang | Yonggang Wang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Chinese word segmentation (CWS) and part-of-speech (POS) tagging are important fundamental tasks for Chinese language processing, where joint learning of them is an effective one-step solution for both tasks. Previous studies for joint CWS and POS tagging mainly follow the character-based tagging paradigm with introducing contextual information such as n-gram features or sentential representations from recurrent neural models. However, for many cases, the joint tagging needs not only modeling from context features but also knowledge attached to them (e.g., syntactic relations among words); limited efforts have been made by existing research to meet such needs. In this paper, we propose a neural model named TwASP for joint CWS and POS tagging following the character-based sequence labeling paradigm, where a two-way attention mechanism is used to incorporate both context feature and their corresponding syntactic knowledge for each input character. Particularly, we use existing language processing toolkits to obtain the auto-analyzed syntactic knowledge for the context, and the proposed attention module can learn and benefit from them although their quality may not be perfect. Our experiments illustrate the effectiveness of the two-way attentions for joint CWS and POS tagging, where state-of-the-art performance is achieved on five benchmark datasets.

pdf bib
Named Entity Recognition for Social Media Texts with Semantic Augmentation
Yuyang Nie | Yuanhe Tian | Xiang Wan | Yan Song | Bo Dai
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Existing approaches for named entity recognition suffer from data sparsity problems when conducted on short and informal texts, especially user-generated social media content. Semantic augmentation is a potential way to alleviate this problem. Given that rich semantic information is implicitly preserved in pre-trained word embeddings, they are potential ideal resources for semantic augmentation. In this paper, we propose a neural-based approach to NER for social media texts where both local (from running text) and augmented semantics are taken into account. In particular, we obtain the augmented semantic information from a large-scale corpus, and propose an attentive semantic augmentation module and a gate module to encode and aggregate such information, respectively. Extensive experiments are performed on three benchmark datasets collected from English and Chinese social media platforms, where the results demonstrate the superiority of our approach to previous studies across all three datasets.

pdf bib
Generating Radiology Reports via Memory-driven Transformer
Zhihong Chen | Yan Song | Tsung-Hui Chang | Xiang Wan
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Medical imaging is frequently used in clinical practice and trials for diagnosis and treatment. Writing imaging reports is time-consuming and can be error-prone for inexperienced radiologists. Therefore, automatically generating radiology reports is highly desired to lighten the workload of radiologists and accordingly promote clinical automation, which is an essential task to apply artificial intelligence to the medical domain. In this paper, we propose to generate radiology reports with memory-driven Transformer, where a relational memory is designed to record key information of the generation process and a memory-driven conditional layer normalization is applied to incorporating the memory into the decoder of Transformer. Experimental results on two prevailing radiology report datasets, IU X-Ray and MIMIC-CXR, show that our proposed approach outperforms previous models with respect to both language generation metrics and clinical evaluations. Particularly, this is the first work reporting the generation results on MIMIC-CXR to the best of our knowledge. Further analyses also demonstrate that our approach is able to generate long reports with necessary medical terms as well as meaningful image-text attention mappings.

pdf bib
Supertagging Combinatory Categorial Grammar with Attentive Graph Convolutional Networks
Yuanhe Tian | Yan Song | Fei Xia
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Supertagging is conventionally regarded as an important task for combinatory categorial grammar (CCG) parsing, where effective modeling of contextual information is highly important to this task. However, existing studies have made limited efforts to leverage contextual features except for applying powerful encoders (e.g., bi-LSTM). In this paper, we propose attentive graph convolutional networks to enhance neural CCG supertagging through a novel solution of leveraging contextual information. Specifically, we build the graph from chunks (n-grams) extracted from a lexicon and apply attention over the graph, so that different word pairs from the contexts within and across chunks are weighted in the model and facilitate the supertagging accordingly. The experiments performed on the CCGbank demonstrate that our approach outperforms all previous studies in terms of both supertagging and parsing. Further analyses illustrate the effectiveness of each component in our approach to discriminatively learn from word pairs to enhance CCG supertagging.

pdf bib
Studying Challenges in Medical Conversation with Structured Annotation
Nan Wang | Yan Song | Fei Xia
Proceedings of the First Workshop on Natural Language Processing for Medical Conversations

Medical conversation is a central part of medical care. Yet, the current state and quality of medical conversation is far from perfect. Therefore, a substantial amount of research has been done to obtain a better understanding of medical conversation and to address its practical challenges and dilemmas. In line with this stream of research, we have developed a multi-layer structure annotation scheme to analyze medical conversation, and are using the scheme to construct a corpus of naturally occurring medical conversation in Chinese pediatric primary care setting. Some of the preliminary findings are reported regarding 1) how a medical conversation starts, 2) where communication problems tend to occur, and 3) how physicians close a conversation. Challenges and opportunities for research on medical conversation with NLP techniques will be discussed.

2019

pdf bib
Reading Like HER: Human Reading Inspired Extractive Summarization
Ling Luo | Xiang Ao | Yan Song | Feiyang Pan | Min Yang | Qing He
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

In this work, we re-examine the problem of extractive text summarization for long documents. We observe that the process of extracting summarization of human can be divided into two stages: 1) a rough reading stage to look for sketched information, and 2) a subsequent careful reading stage to select key sentences to form the summary. By simulating such a two-stage process, we propose a novel approach for extractive summarization. We formulate the problem as a contextual-bandit problem and solve it with policy gradient. We adopt a convolutional neural network to encode gist of paragraphs for rough reading, and a decision making policy with an adapted termination mechanism for careful reading. Experiments on the CNN and DailyMail datasets show that our proposed method can provide high-quality summaries with varied length, and significantly outperform the state-of-the-art extractive methods in terms of ROUGE metrics.

pdf bib
What You See is What You Get: Visual Pronoun Coreference Resolution in Dialogues
Xintong Yu | Hongming Zhang | Yangqiu Song | Yan Song | Changshui Zhang
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Grounding a pronoun to a visual object it refers to requires complex reasoning from various information sources, especially in conversational scenarios. For example, when people in a conversation talk about something all speakers can see, they often directly use pronouns (e.g., it) to refer to it without previous introduction. This fact brings a huge challenge for modern natural language understanding systems, particularly conventional context-based pronoun coreference models. To tackle this challenge, in this paper, we formally define the task of visual-aware pronoun coreference resolution (PCR) and introduce VisPro, a large-scale dialogue PCR dataset, to investigate whether and how the visual information can help resolve pronouns in dialogues. We then propose a novel visual-aware PCR model, VisCoref, for this task and conduct comprehensive experiments and case studies on our dataset. Results demonstrate the importance of the visual information in this PCR case and show the effectiveness of the proposed model.

pdf bib
Multiplex Word Embeddings for Selectional Preference Acquisition
Hongming Zhang | Jiaxin Bai | Yan Song | Kun Xu | Changlong Yu | Yangqiu Song | Wilfred Ng | Dong Yu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Conventional word embeddings represent words with fixed vectors, which are usually trained based on co-occurrence patterns among words. In doing so, however, the power of such representations is limited, where the same word might be functionalized separately under different syntactic relations. To address this limitation, one solution is to incorporate relational dependencies of different words into their embeddings. Therefore, in this paper, we propose a multiplex word embedding model, which can be easily extended according to various relations among words. As a result, each word has a center embedding to represent its overall semantics, and several relational embeddings to represent its relational dependencies. Compared to existing models, our model can effectively distinguish words with respect to different relations without introducing unnecessary sparseness. Moreover, to accommodate various relations, we use a small dimension for relational embeddings and our model is able to keep their effectiveness. Experiments on selectional preference acquisition and word similarity demonstrate the effectiveness of the proposed model, and a further study of scalability also proves that our embeddings only need 1/20 of the original embedding size to achieve better performance.

pdf bib
Incorporating Context and External Knowledge for Pronoun Coreference Resolution
Hongming Zhang | Yan Song | Yangqiu Song
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Linking pronominal expressions to the correct references requires, in many cases, better analysis of the contextual information and external knowledge. In this paper, we propose a two-layer model for pronoun coreference resolution that leverages both context and external knowledge, where a knowledge attention mechanism is designed to ensure the model leveraging the appropriate source of external knowledge based on different context. Experimental results demonstrate the validity and effectiveness of our model, where it outperforms state-of-the-art models by a large margin.

pdf bib
ChiMed: A Chinese Medical Corpus for Question Answering
Yuanhe Tian | Weicheng Ma | Fei Xia | Yan Song
Proceedings of the 18th BioNLP Workshop and Shared Task

Question answering (QA) is a challenging task in natural language processing (NLP), especially when it is applied to specific domains. While models trained in the general domain can be adapted to a new target domain, their performance often degrades significantly due to domain mismatch. Alternatively, one can require a large amount of domain-specific QA data, but such data are rare, especially for the medical domain. In this study, we first collect a large-scale Chinese medical QA corpus called ChiMed; second we annotate a small fraction of the corpus to check the quality of the answers; third, we extract two datasets from the corpus and use them for the relevancy prediction task and the adoption prediction task. Several benchmark models are applied to the datasets, producing good results for both tasks.

pdf bib
WTMED at MEDIQA 2019: A Hybrid Approach to Biomedical Natural Language Inference
Zhaofeng Wu | Yan Song | Sicong Huang | Yuanhe Tian | Fei Xia
Proceedings of the 18th BioNLP Workshop and Shared Task

Natural language inference (NLI) is challenging, especially when it is applied to technical domains such as biomedical settings. In this paper, we propose a hybrid approach to biomedical NLI where different types of information are exploited for this task. Our base model includes a pre-trained text encoder as the core component, and a syntax encoder and a feature encoder to capture syntactic and domain-specific information. Then we combine the output of different base models to form more powerful ensemble models. Finally, we design two conflict resolution strategies when the test data contain multiple (premise, hypothesis) pairs with the same premise. We train our models on the MedNLI dataset, yielding the best performance on the test set of the MEDIQA 2019 Task 1.

pdf bib
Bridging the Gap: Improve Part-of-speech Tagging for Chinese Social Media Texts with Foreign Words
Dingmin Wang | Meng Fang | Yan Song | Juntao Li
Proceedings of the 5th Workshop on Semantic Deep Learning (SemDeep-5)

pdf bib
Knowledge-aware Pronoun Coreference Resolution
Hongming Zhang | Yan Song | Yangqiu Song | Dong Yu
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Resolving pronoun coreference requires knowledge support, especially for particular domains (e.g., medicine). In this paper, we explore how to leverage different types of knowledge to better resolve pronoun coreference with a neural model. To ensure the generalization ability of our model, we directly incorporate knowledge in the format of triplets, which is the most common format of modern knowledge graphs, instead of encoding it with features or rules as that in conventional approaches. Moreover, since not all knowledge is helpful in certain contexts, to selectively use them, we propose a knowledge attention module, which learns to select and use informative knowledge based on contexts, to enhance our model. Experimental results on two datasets from different domains prove the validity and effectiveness of our model, where it outperforms state-of-the-art baselines by a large margin. Moreover, since our model learns to use external knowledge rather than only fitting the training data, it also demonstrates superior performance to baselines in the cross-domain setting.

pdf bib
Reinforced Training Data Selection for Domain Adaptation
Miaofeng Liu | Yan Song | Hongbin Zou | Tong Zhang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Supervised models suffer from the problem of domain shifting where distribution mismatch in the data across domains greatly affect model performance. To solve the problem, training data selection (TDS) has been proven to be a prospective solution for domain adaptation in leveraging appropriate data. However, conventional TDS methods normally requires a predefined threshold which is neither easy to set nor can be applied across tasks, and models are trained separately with the TDS process. To make TDS self-adapted to data and task, and to combine it with model training, in this paper, we propose a reinforcement learning (RL) framework that synchronously searches for training instances relevant to the target domain and learns better representations for them. A selection distribution generator (SDG) is designed to perform the selection and is updated according to the rewards computed from the selected data, where a predictor is included in the framework to ensure a task-specific model can be trained on the selected data and provides feedback to rewards. Experimental results from part-of-speech tagging, dependency parsing, and sentiment analysis, as well as ablation studies, illustrate that the proposed framework is not only effective in data selection and representation, but also generalized to accommodate different NLP tasks.

pdf bib
Cross-lingual Knowledge Graph Alignment via Graph Matching Neural Network
Kun Xu | Liwei Wang | Mo Yu | Yansong Feng | Yan Song | Zhiguo Wang | Dong Yu
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Previous cross-lingual knowledge graph (KG) alignment studies rely on entity embeddings derived only from monolingual KG structural information, which may fail at matching entities that have different facts in two KGs. In this paper, we introduce the topic entity graph, a local sub-graph of an entity, to represent entities with their contextual information in KG. From this view, the KB-alignment task can be formulated as a graph matching problem; and we further propose a graph-attention based solution, which first matches all entities in two topic entity graphs, and then jointly model the local matching information to derive a graph-level matching vector. Experiments show that our model outperforms previous state-of-the-art methods by a large margin.

2018

pdf bib
Encoding Conversation Context for Neural Keyphrase Extraction from Microblog Posts
Yingyi Zhang | Jing Li | Yan Song | Chengzhi Zhang
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Existing keyphrase extraction methods suffer from data sparsity problem when they are conducted on short and informal texts, especially microblog messages. Enriching context is one way to alleviate this problem. Considering that conversations are formed by reposting and replying messages, they provide useful clues for recognizing essential content in target posts and are therefore helpful for keyphrase identification. In this paper, we present a neural keyphrase extraction framework for microblog posts that takes their conversation context into account, where four types of neural encoders, namely, averaged embedding, RNN, attention, and memory networks, are proposed to represent the conversation context. Experimental results on Twitter and Weibo datasets show that our framework with such encoders outperforms state-of-the-art approaches.

pdf bib
Directional Skip-Gram: Explicitly Distinguishing Left and Right Context for Word Embeddings
Yan Song | Shuming Shi | Jing Li | Haisong Zhang
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

In this paper, we present directional skip-gram (DSG), a simple but effective enhancement of the skip-gram model by explicitly distinguishing left and right context in word prediction. In doing so, a direction vector is introduced for each word, whose embedding is thus learned by not only word co-occurrence patterns in its context, but also the directions of its contextual words. Theoretical and empirical studies on complexity illustrate that our model can be trained as efficient as the original skip-gram model, when compared to other extensions of the skip-gram model. Experimental results show that our model outperforms others on different datasets in semantic (word similarity measurement) and syntactic (part-of-speech tagging) evaluations, respectively.

pdf bib
A Joint Model of Conversational Discourse Latent Topics on Microblogs
Jing Li | Yan Song | Zhongyu Wei | Kam-Fai Wong
Computational Linguistics, Volume 44, Issue 4 - December 2018

Conventional topic models are ineffective for topic extraction from microblog messages, because the data sparseness exhibited in short messages lacking structure and contexts results in poor message-level word co-occurrence patterns. To address this issue, we organize microblog messages as conversation trees based on their reposting and replying relations, and propose an unsupervised model that jointly learns word distributions to represent: (1) different roles of conversational discourse, and (2) various latent topics in reflecting content information. By explicitly distinguishing the probabilities of messages with varying discourse roles in containing topical words, our model is able to discover clusters of discourse words that are indicative of topical content. In an automatic evaluation on large-scale microblog corpora, our joint model yields topics with better coherence scores than competitive topic models from previous studies. Qualitative analysis on model outputs indicates that our model induces meaningful representations for both discourse and topics. We further present an empirical study on microblog summarization based on the outputs of our joint model. The results show that the jointly modeled discourse and topic representations can effectively indicate summary-worthy content in microblog conversations.

pdf bib
Constructing a Chinese Medical Conversation Corpus Annotated with Conversational Structures and Actions
Nan Wang | Yan Song | Fei Xia
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
hyperdoc2vec: Distributed Representations of Hypertext Documents
Jialong Han | Yan Song | Wayne Xin Zhao | Shuming Shi | Haisong Zhang
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Hypertext documents, such as web pages and academic papers, are of great importance in delivering information in our daily life. Although being effective on plain documents, conventional text embedding methods suffer from information loss if directly adapted to hyper-documents. In this paper, we propose a general embedding approach for hyper-documents, namely, hyperdoc2vec, along with four criteria characterizing necessary information that hyper-document embedding models should preserve. Systematic comparisons are conducted between hyperdoc2vec and several competitors on two tasks, i.e., paper classification and citation recommendation, in the academic paper domain. Analyses and experiments both validate the superiority of hyperdoc2vec to other models w.r.t. the four criteria.

pdf bib
Coding Structures and Actions with the COSTA Scheme in Medical Conversations
Nan Wang | Yan Song | Fei Xia
Proceedings of the BioNLP 2018 workshop

This paper describes the COSTA scheme for coding structures and actions in conversation. Informed by Conversation Analysis, the scheme introduces an innovative method for marking multi-layer structural organization of conversation and a structure-informed taxonomy of actions. In addition, we create a corpus of naturally occurring medical conversations, containing 318 video-recorded and manually transcribed pediatric consultations. Based on the annotated corpus, we investigate 1) treatment decision-making process in medical conversations, and 2) effects of physician-caregiver communication behaviors on antibiotic over-prescribing. Although the COSTA annotation scheme is developed based on data from the task-specific domain of pediatric consultations, it can be easily extended to apply to more general domains and other languages.

pdf bib
Domain Adaptation for Disease Phrase Matching with Adversarial Networks
Miaofeng Liu | Jialong Han | Haisong Zhang | Yan Song
Proceedings of the BioNLP 2018 workshop

With the development of medical information management, numerous medical data are being classified, indexed, and searched in various systems. Disease phrase matching, i.e., deciding whether two given disease phrases interpret each other, is a basic but crucial preprocessing step for the above tasks. Being capable of relieving the scarceness of annotations, domain adaptation is generally considered useful in medical systems. However, efforts on applying it to phrase matching remain limited. This paper presents a domain-adaptive matching network for disease phrases. Our network achieves domain adaptation by adversarial training, i.e., preferring features indicating whether the two phrases match, rather than which domain they come from. Experiments suggest that our model has the best performance among the very few non-adaptive or adaptive methods that can benefit from out-of-domain annotations.

pdf bib
A Hybrid Approach to Automatic Corpus Generation for Chinese Spelling Check
Dingmin Wang | Yan Song | Jing Li | Jialong Han | Haisong Zhang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Chinese spelling check (CSC) is a challenging yet meaningful task, which not only serves as a preprocessing in many natural language processing(NLP) applications, but also facilitates reading and understanding of running texts in peoples’ daily lives. However, to utilize data-driven approaches for CSC, there is one major limitation that annotated corpora are not enough in applying algorithms and building models. In this paper, we propose a novel approach of constructing CSC corpus with automatically generated spelling errors, which are either visually or phonologically resembled characters, corresponding to the OCR- and ASR-based methods, respectively. Upon the constructed corpus, different models are trained and evaluated for CSC with respect to three standard test sets. Experimental results demonstrate the effectiveness of the corpus, therefore confirm the validity of our approach.

pdf bib
Topic Memory Networks for Short Text Classification
Jichuan Zeng | Jing Li | Yan Song | Cuiyun Gao | Michael R. Lyu | Irwin King
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Many classification models work poorly on short texts due to data sparsity. To address this issue, we propose topic memory networks for short text classification with a novel topic memory mechanism to encode latent topic representations indicative of class labels. Different from most prior work that focuses on extending features with external knowledge or pre-trained topics, our model jointly explores topic inference and text classification with memory networks in an end-to-end manner. Experimental results on four benchmark datasets show that our model outperforms state-of-the-art models on short text classification, meanwhile generates coherent topics.

pdf bib
Generating Classical Chinese Poems via Conditional Variational Autoencoder and Adversarial Training
Juntao Li | Yan Song | Haisong Zhang | Dongmin Chen | Shuming Shi | Dongyan Zhao | Rui Yan
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

It is a challenging task to automatically compose poems with not only fluent expressions but also aesthetic wording. Although much attention has been paid to this task and promising progress is made, there exist notable gaps between automatically generated ones with those created by humans, especially on the aspects of term novelty and thematic consistency. Towards filling the gap, in this paper, we propose a conditional variational autoencoder with adversarial training for classical Chinese poem generation, where the autoencoder part generates poems with novel terms and a discriminator is applied to adversarially learn their thematic consistency with their titles. Experimental results on a large poetry corpus confirm the validity and effectiveness of our model, where its automatic and human evaluation scores outperform existing models.

pdf bib
Iterative Document Representation Learning Towards Summarization with Polishing
Xiuying Chen | Shen Gao | Chongyang Tao | Yan Song | Dongyan Zhao | Rui Yan
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

In this paper, we introduce Iterative Text Summarization (ITS), an iteration-based model for supervised extractive text summarization, inspired by the observation that it is often necessary for a human to read an article multiple times in order to fully understand and summarize its contents. Current summarization approaches read through a document only once to generate a document representation, resulting in a sub-optimal representation. To address this issue we introduce a model which iteratively polishes the document representation on many passes through the document. As part of our model, we also introduce a selective reading mechanism that decides more accurately the extent to which each sentence in the model should be updated. Experimental results on the CNN/DailyMail and DUC2002 datasets demonstrate that our model significantly outperforms state-of-the-art extractive systems when evaluated by machines and by humans.

2017

pdf bib
Learning Word Representations with Regularization from Prior Knowledge
Yan Song | Chia-Jung Lee | Fei Xia
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

Conventional word embeddings are trained with specific criteria (e.g., based on language modeling or co-occurrence) inside a single information source, disregarding the opportunity for further calibration using external knowledge. This paper presents a unified framework that leverages pre-learned or external priors, in the form of a regularizer, for enhancing conventional language model-based embedding learning. We consider two types of regularizers. The first type is derived from topic distribution by running LDA on unlabeled data. The second type is based on dictionaries that are created with human annotation efforts. To effectively learn with the regularizers, we propose a novel data structure, trajectory softmax, in this paper. The resulting embeddings are evaluated by word similarity and sentiment classification. Experimental results show that our learning framework with regularization from prior knowledge improves embedding quality across multiple datasets, compared to a diverse collection of baseline methods.

pdf bib
Learning User Embeddings from Emails
Yan Song | Chia-Jung Lee
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

Many important email-related tasks, such as email classification or search, highly rely on building quality document representations (e.g., bag-of-words or key phrases) to assist matching and understanding. Despite prior success on representing textual messages, creating quality user representations from emails was overlooked. In this paper, we propose to represent users using embeddings that are trained to reflect the email communication network. Our experiments on Enron dataset suggest that the resulting embeddings capture the semantic distance between users. To assess the quality of embeddings in a real-world application, we carry out auto-foldering task where the lexical representation of an email is enriched with user embedding features. Our results show that folder prediction accuracy is improved when embedding features are present across multiple settings.

2014

pdf bib
Modern Chinese Helps Archaic Chinese Processing: Finding and Exploiting the Shared Properties
Yan Song | Fei Xia
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Languages change over time and ancient languages have been studied in linguistics and other related fields. A main challenge in this research area is the lack of empirical data; for instance, ancient spoken languages often leave little trace of their linguistic properties. From the perspective of natural language processing (NLP), while the NLP community has created dozens of annotated corpora, very few of them are on ancient languages. As an effort toward bridging the gap, we have created a word segmented and POS tagged corpus for Archaic Chinese using articles from Huainanzi, a book written during China’s Western Han Dynasty (206 BC-9 AD). We then compare this corpus with the Chinese Penn Treebank (CTB), a well-known corpus for Modern Chinese, and report several interesting differences and similarities between the two corpora. Finally, we demonstrate that the CTB can be used to improve the performance of word segmenters and POS taggers for Archaic Chinese, but only through features that have similar behaviors in the two corpora.

2013

pdf bib
A Common Case of Jekyll and Hyde: The Synergistic Effect of Using Divided Source Training Data for Feature Augmentation
Yan Song | Fei Xia
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Non-Monotonic Sentence Alignment via Semisupervised Learning
Xiaojun Quan | Chunyu Kit | Yan Song
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

pdf bib
Using a Goodness Measurement for Domain Adaptation: A Case Study on Chinese Word Segmentation
Yan Song | Fei Xia
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Domain adaptation is an important topic for natural language processing. There has been extensive research on the topic and various methods have been explored, including training data selection, model combination, semi-supervised learning. In this study, we propose to use a goodness measure, namely, description length gain (DLG), for domain adaptation for Chinese word segmentation. We demonstrate that DLG can help domain adaptation in two ways: as additional features for supervised segmenters to improve system performance, and also as a similarity measure for selecting training data to better match a test set. We evaluated our systems on the Chinese Penn Treebank version 7.0, which has 1.2 million words from five different genres, and the Chinese Word Segmentation Bakeoff-3 data.

pdf bib
Entropy-based Training Data Selection for Domain Adaptation
Yan Song | Prescott Klassen | Fei Xia | Chunyu Kit
Proceedings of COLING 2012: Posters

2010

pdf bib
How Well Conditional Random Fields Can be Used in Novel Term Recognition
Xing Zhang | Yan Song | Alex Chengyu Fang
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

pdf bib
An Empirical Study on Development Set Selection Strategy for Machine Translation Learning
Cong Hui | Hai Zhao | Bao-Liang Lu | Yan Song
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
Reranking with Multiple Features for Better Transliteration
Yan Song | Chunyu Kit | Hai Zhao
Proceedings of the 2010 Named Entities Workshop

pdf bib
How Large a Corpus Do We Need: Statistical Method Versus Rule-based Method
Hai Zhao | Yan Song | Chunyu Kit
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We investigate the impact of input data scale in corpus-based learning using a study style of Zipf’s law. In our research, Chinese word segmentation is chosen as the study case and a series of experiments are specially conducted for it, in which two types of segmentation techniques, statistical learning and rule-based methods, are examined. The empirical results show that a linear performance improvement in statistical learning requires an exponential increasing of training corpus size at least. As for the rule-based method, an approximate negative inverse relationship between the performance and the size of the input lexicon can be observed.

2009

pdf bib
Cross Language Dependency Parsing using a Bilingual Lexicon
Hai Zhao | Yan Song | Chunyu Kit | Guodong Zhou
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Dependency Grammar Based English Subject-Verb Agreement Evaluation
Dongfeng Cai | Yonghua Hu | Xuelei Miao | Yan Song
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 1

pdf bib
Transliteration of Name Entity via Improved Statistical Translation on Character Sequences
Yan Song | Chunyu Kit | Xiao Chen
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)

2006

pdf bib
Chinese Word Segmentation Based on an Approach of Maximum Entropy Modeling
Yan Song | Jiaqing Guo | Dongfeng Cai
Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing

pdf bib
HowNet Based Chinese Question Classification
Dongfeng Cai | Jingguang Sun | Guiping Zhang | Dexin Lv | Yanju Dong | Yan Song | Chao Yu
Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation

pdf bib
Research on concept-sememe tree and semantic relevance computation
GuiPing Zhang | Chao Yu | DongFeng Cai | Yan Song | JingGuang Sun
Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation