Jun Zhao


2021

pdf bib
Classification, Extraction, and Normalization : CASIA_Unisound Team at the Social Media Mining for Health 2021 Shared Tasks
Tong Zhou | Zhucong Li | Zhen Gan | Baoli Zhang | Yubo Chen | Kun Niu | Jing Wan | Kang Liu | Jun Zhao | Yafei Shi | Weifeng Chong | Shengping Liu
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task

This is the system description of the CASIA_Unisound team for Task 1, Task 7b, and Task 8 of the sixth Social Media Mining for Health Applications (SMM4H) shared task in 2021. Targeting on deal with two shared challenges, the colloquial text and the imbalance annotation, among those tasks, we apply a customized pre-trained language model and propose various training strategies. Experimental results show the effectiveness of our system. Moreover, we got an F1-score of 0.87 in task 8, which is the highest among all participates.

pdf bib
Probing into the Root: A Dataset for Reason Extraction of Structural Events from Financial Documents
Pei Chen | Kang Liu | Yubo Chen | Taifeng Wang | Jun Zhao
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

This paper proposes a new task regarding event reason extraction from document-level texts. Unlike the previous causality detection task, we do not assign target events in the text, but only provide structural event descriptions, and such settings accord more with practice scenarios. Moreover, we annotate a large dataset FinReason for evaluation, which provides Reasons annotation for Financial events in company announcements. This task is challenging because the cases of multiple-events, multiple-reasons, and implicit-reasons are included. In total, FinReason contains 8,794 documents, 12,861 financial events and 11,006 reason spans. We also provide the performance of existing canonical methods in event extraction and machine reading comprehension on this task. The results show a 7 percentage point F1 score gap between the best model and human performance, and existing methods are far from resolving this problem.

pdf bib
Knowledge Guided Metric Learning for Few-Shot Text Classification
Dianbo Sui | Yubo Chen | Binjie Mao | Delai Qiu | Kang Liu | Jun Zhao
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Humans can distinguish new categories very efficiently with few examples, largely due to the fact that human beings can leverage knowledge obtained from relevant tasks. However, deep learning based text classification model tends to struggle to achieve satisfactory performance when labeled data are scarce. Inspired by human intelligence, we propose to introduce external knowledge into few-shot learning to imitate human knowledge. A novel parameter generator network is investigated to this end, which is able to use the external knowledge to generate different metrics for different tasks. Armed with this network, similar tasks can use similar metrics while different tasks use different metrics. Through experiments, we demonstrate that our method outperforms the SoTA few-shot text classification models.

pdf bib
A Large-Scale Chinese Multimodal NER Dataset with Speech Clues
Dianbo Sui | Zhengkun Tian | Yubo Chen | Kang Liu | Jun Zhao
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In this paper, we aim to explore an uncharted territory, which is Chinese multimodal named entity recognition (NER) with both textual and acoustic contents. To achieve this, we construct a large-scale human-annotated Chinese multimodal NER dataset, named CNERTA. Our corpus totally contains 42,987 annotated sentences accompanying by 71 hours of speech data. Based on this dataset, we propose a family of strong and representative baseline models, which can leverage textual features or multimodal features. Upon these baselines, to capture the natural monotonic alignment between the textual modality and the acoustic modality, we further propose a simple multimodal multitask model by introducing a speech-to-text alignment auxiliary task. Through extensive experiments, we observe that: (1) Progressive performance boosts as we move from unimodal to multimodal, verifying the necessity of integrating speech clues into Chinese NER. (2) Our proposed model yields state-of-the-art (SoTA) results on CNERTA, demonstrating its effectiveness. For further research, the annotated dataset is publicly available at http://github.com/DianboWork/CNERTA.

pdf bib
LearnDA: Learnable Knowledge-Guided Data Augmentation for Event Causality Identification
Xinyu Zuo | Pengfei Cao | Yubo Chen | Kang Liu | Jun Zhao | Weihua Peng | Yuguang Chen
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Modern models for event causality identification (ECI) are mainly based on supervised learning, which are prone to the data lacking problem. Unfortunately, the existing NLP-related augmentation methods cannot directly produce available data required for this task. To solve the data lacking problem, we introduce a new approach to augment training data for event causality identification, by iteratively generating new examples and classifying event causality in a dual learning framework. On the one hand, our approach is knowledge guided, which can leverage existing knowledge bases to generate well-formed new sentences. On the other hand, our approach employs a dual mechanism, which is a learnable augmentation framework, and can interactively adjust the generation process to generate task-related sentences. Experimental results on two benchmarks EventStoryLine and Causal-TimeBank show that 1) our method can augment suitable task-related training data for ECI; 2) our method outperforms previous methods on EventStoryLine and Causal-TimeBank (+2.5 and +2.1 points on F1 value respectively).

pdf bib
Knowledge-Enriched Event Causality Identification via Latent Structure Induction Networks
Pengfei Cao | Xinyu Zuo | Yubo Chen | Kang Liu | Jun Zhao | Yuguang Chen | Weihua Peng
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Identifying causal relations of events is an important task in natural language processing area. However, the task is very challenging, because event causality is usually expressed in diverse forms that often lack explicit causal clues. Existing methods cannot handle well the problem, especially in the condition of lacking training data. Nonetheless, humans can make a correct judgement based on their background knowledge, including descriptive knowledge and relational knowledge. Inspired by it, we propose a novel Latent Structure Induction Network (LSIN) to incorporate the external structural knowledge into this task. Specifically, to make use of the descriptive knowledge, we devise a Descriptive Graph Induction module to obtain and encode the graph-structured descriptive knowledge. To leverage the relational knowledge, we propose a Relational Graph Induction module which is able to automatically learn a reasoning structure for event causality reasoning. Experimental results on two widely used datasets indicate that our approach significantly outperforms previous state-of-the-art methods.

pdf bib
Alignment Rationale for Natural Language Inference
Zhongtao Jiang | Yuanzhe Zhang | Zhao Yang | Jun Zhao | Kang Liu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Deep learning models have achieved great success on the task of Natural Language Inference (NLI), though only a few attempts try to explain their behaviors. Existing explanation methods usually pick prominent features such as words or phrases from the input text. However, for NLI, alignments among words or phrases are more enlightening clues to explain the model. To this end, this paper presents AREC, a post-hoc approach to generate alignment rationale explanations for co-attention based models in NLI. The explanation is based on feature selection, which keeps few but sufficient alignments while maintaining the same prediction of the target model. Experimental results show that our method is more faithful and human-readable compared with many existing approaches. We further study and re-evaluate three typical models through our explanation beyond accuracy, and propose a simple method that greatly improves the model robustness.

pdf bib
Automatic ICD Coding via Interactive Shared Representation Networks with Self-distillation Mechanism
Tong Zhou | Pengfei Cao | Yubo Chen | Kang Liu | Jun Zhao | Kun Niu | Weifeng Chong | Shengping Liu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

The ICD coding task aims at assigning codes of the International Classification of Diseases in clinical notes. Since manual coding is very laborious and prone to errors, many methods have been proposed for the automatic ICD coding task. However, existing works either ignore the long-tail of code frequency or the noisy clinical notes. To address the above issues, we propose an Interactive Shared Representation Network with Self-Distillation Mechanism. Specifically, an interactive shared representation network targets building connections among codes while modeling the co-occurrence, consequently alleviating the long-tail problem. Moreover, to cope with the noisy text issue, we encourage the model to focus on the clinical note’s noteworthy part and extract valuable information through a self-distillation learning mechanism. Experimental results on two MIMIC datasets demonstrate the effectiveness of our method.

pdf bib
Document-level Event Extraction via Parallel Prediction Networks
Hang Yang | Dianbo Sui | Yubo Chen | Kang Liu | Jun Zhao | Taifeng Wang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Document-level event extraction (DEE) is indispensable when events are described throughout a document. We argue that sentence-level extractors are ill-suited to the DEE task where event arguments always scatter across sentences and multiple events may co-exist in a document. It is a challenging task because it requires a holistic understanding of the document and an aggregated ability to assemble arguments across multiple sentences. In this paper, we propose an end-to-end model, which can extract structured events from a document in a parallel manner. Specifically, we first introduce a document-level encoder to obtain the document-aware representations. Then, a multi-granularity non-autoregressive decoder is used to generate events in parallel. Finally, to train the entire model, a matching loss function is proposed, which can bootstrap a global optimization. The empirical results on the widely used DEE dataset show that our approach significantly outperforms current state-of-the-art methods in the challenging DEE task. Code will be available at https://github.com/HangYang-NLP/DE-PPN.

pdf bib
CogIE: An Information Extraction Toolkit for Bridging Texts and CogNet
Zhuoran Jin | Yubo Chen | Dianbo Sui | Chenhao Wang | Zhipeng Xue | Jun Zhao
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations

CogNet is a knowledge base that integrates three types of knowledge: linguistic knowledge, world knowledge and commonsense knowledge. In this paper, we propose an information extraction toolkit, called CogIE, which is a bridge connecting raw texts and CogNet. CogIE has three features: versatile, knowledge-grounded and extensible. First, CogIE is a versatile toolkit with a rich set of functional modules, including named entity recognition, entity typing, entity linking, relation extraction, event extraction and frame-semantic parsing. Second, as a knowledge-grounded toolkit, CogIE can ground the extracted facts to CogNet and leverage different types of knowledge to enrich extracted results. Third, for extensibility, owing to the design of three-tier architecture, CogIE is not only a plug-and-play toolkit for developers but also an extensible programming framework for researchers. We release an open-access online system to visually extract information from texts. Source code, datasets and pre-trained models are publicly available at GitHub, with a short instruction video.

pdf bib
TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing
Xiao Wang | Qin Liu | Tao Gui | Qi Zhang | Yicheng Zou | Xin Zhou | Jiacheng Ye | Yongxin Zhang | Rui Zheng | Zexiong Pang | Qinzhuo Wu | Zhengyan Li | Chong Zhang | Ruotian Ma | Zichu Fei | Ruijian Cai | Jun Zhao | Xingwu Hu | Zhiheng Yan | Yiding Tan | Yuan Hu | Qiyuan Bian | Zhihua Liu | Shan Qin | Bolin Zhu | Xiaoyu Xing | Jinlan Fu | Yue Zhang | Minlong Peng | Xiaoqing Zheng | Yaqian Zhou | Zhongyu Wei | Xipeng Qiu | Xuanjing Huang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations

TextFlint is a multilingual robustness evaluation toolkit for NLP tasks that incorporates universal text transformation, task-specific transformation, adversarial attack, subpopulation, and their combinations to provide comprehensive robustness analyses. This enables practitioners to automatically evaluate their models from various aspects or to customize their evaluations as desired with just a few lines of code. TextFlint also generates complete analytical reports as well as targeted augmented data to address the shortcomings of the model in terms of its robustness. To guarantee acceptability, all the text transformations are linguistically based and all the transformed data selected (up to 100,000 texts) scored highly under human evaluation. To validate the utility, we performed large-scale empirical evaluations (over 67,000) on state-of-the-art deep learning models, classic supervised methods, and real-world systems. The toolkit is already available at https://github.com/textflint with all the evaluation results demonstrated at textflint.io.

pdf bib
Improving Event Causality Identification via Self-Supervised Representation Learning on External Causal Statement
Xinyu Zuo | Pengfei Cao | Yubo Chen | Kang Liu | Jun Zhao | Weihua Peng | Yuguang Chen
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Named Entity Recognition via Noise Aware Training Mechanism with Data Filter
Xiusheng Huang | Yubo Chen | Shun Wu | Jun Zhao | Yuantao Xie | Weijian Sun
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib
Pre-trained Language Model Based Active Learning for Sentence Matching
Guirong Bai | Shizhu He | Kang Liu | Jun Zhao | Zaiqing Nie
Proceedings of the 28th International Conference on Computational Linguistics

Active learning is able to significantly reduce the annotation cost for data-driven techniques. However, previous active learning approaches for natural language processing mainly depend on the entropy-based uncertainty criterion, and ignore the characteristics of natural language. In this paper, we propose a pre-trained language model based active learning approach for sentence matching. Differing from previous active learning, it can provide linguistic criteria from the pre-trained language model to measure instances and help select more effective instances for annotation. Experiments demonstrate our approach can achieve greater accuracy with fewer labeled training instances.

pdf bib
KnowDis: Knowledge Enhanced Data Augmentation for Event Causality Detection via Distant Supervision
Xinyu Zuo | Yubo Chen | Kang Liu | Jun Zhao
Proceedings of the 28th International Conference on Computational Linguistics

Modern models of event causality detection (ECD) are mainly based on supervised learning from small hand-labeled corpora. However, hand-labeled training data is expensive to produce, low coverage of causal expressions, and limited in size, which makes supervised methods hard to detect causal relations between events. To solve this data lacking problem, we investigate a data augmentation framework for ECD, dubbed as Knowledge Enhanced Distant Data Augmentation (KnowDis). Experimental results on two benchmark datasets EventStoryLine corpus and Causal-TimeBank show that 1) KnowDis can augment available training data assisted with the lexical and causal commonsense knowledge for ECD via distant supervision, and 2) our method outperforms previous methods by a large margin assisted with automatically labeled training data.

pdf bib
Graph-Based Knowledge Integration for Question Answering over Dialogue
Jian Liu | Dianbo Sui | Kang Liu | Jun Zhao
Proceedings of the 28th International Conference on Computational Linguistics

Question answering over dialogue, a specialized machine reading comprehension task, aims to comprehend a dialogue and to answer specific questions. Despite many advances, existing approaches for this task did not consider dialogue structure and background knowledge (e.g., relationships between speakers). In this paper, we introduce a new approach for the task, featured by its novelty in structuring dialogue and integrating background knowledge for reasoning. Specifically, different from previous “structure-less” approaches, our method organizes a dialogue as a “relational graph”, using edges to represent relationships between entities. To encode this relational graph, we devise a relational graph convolutional network (R-GCN), which can traverse the graph’s topological structure and effectively encode multi-relational knowledge for reasoning. The extensive experiments have justified the effectiveness of our approach over competitive baselines. Moreover, a deeper analysis shows that our model is better at tackling complex questions requiring relational reasoning and defending adversarial attacks with distracting sentences.

pdf bib
HyperCore: Hyperbolic and Co-graph Representation for Automatic ICD Coding
Pengfei Cao | Yubo Chen | Kang Liu | Jun Zhao | Shengping Liu | Weifeng Chong
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The International Classification of Diseases (ICD) provides a standardized way for classifying diseases, which endows each disease with a unique code. ICD coding aims to assign proper ICD codes to a medical record. Since manual coding is very laborious and prone to errors, many methods have been proposed for the automatic ICD coding task. However, most of existing methods independently predict each code, ignoring two important characteristics: Code Hierarchy and Code Co-occurrence. In this paper, we propose a Hyperbolic and Co-graph Representation method (HyperCore) to address the above problem. Specifically, we propose a hyperbolic representation method to leverage the code hierarchy. Moreover, we propose a graph convolutional network to utilize the code co-occurrence. Experimental results on two widely used datasets demonstrate that our proposed model outperforms previous state-of-the-art methods.

pdf bib
MIE: A Medical Information Extractor towards Medical Dialogues
Yuanzhe Zhang | Zhongtao Jiang | Tao Zhang | Shiwan Liu | Jiarun Cao | Kang Liu | Shengping Liu | Jun Zhao
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Electronic Medical Records (EMRs) have become key components of modern medical care systems. Despite the merits of EMRs, many doctors suffer from writing them, which is time-consuming and tedious. We believe that automatically converting medical dialogues to EMRs can greatly reduce the burdens of doctors, and extracting information from medical dialogues is an essential step. To this end, we annotate online medical consultation dialogues in a window-sliding style, which is much easier than the sequential labeling annotation. We then propose a Medical Information Extractor (MIE) towards medical dialogues. MIE is able to extract mentioned symptoms, surgeries, tests, other information and their corresponding status. To tackle the particular challenges of the task, MIE uses a deep matching architecture, taking dialogue turn-interaction into account. The experimental results demonstrate MIE is a promising solution to extract medical information from doctor-patient dialogues.

pdf bib
Clinical-Coder: Assigning Interpretable ICD-10 Codes to Chinese Clinical Notes
Pengfei Cao | Chenwei Yan | Xiangling Fu | Yubo Chen | Kang Liu | Jun Zhao | Shengping Liu | Weifeng Chong
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

In this paper, we introduce Clinical-Coder, an online system aiming to assign ICD codes to Chinese clinical notes. ICD coding has been a research hotspot of clinical medicine, but the interpretability of prediction hinders its practical application. We exploit a Dilated Convolutional Attention network with N-gram Matching mechanism (DCANM) to capture semantic features for non-continuous words and continuous n-gram words, concentrating on explaining the reason why each ICD code to be predicted. The experiments demonstrate that our approach is effective and that our system is able to provide supporting information in clinical decision making.

pdf bib
Incremental Event Detection via Knowledge Consolidation Networks
Pengfei Cao | Yubo Chen | Jun Zhao | Taifeng Wang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Conventional approaches to event detection usually require a fixed set of pre-defined event types. Such a requirement is often challenged in real-world applications, as new events continually occur. Due to huge computation cost and storage budge, it is infeasible to store all previous data and re-train the model with all previous data and new data, every time new events arrive. We formulate such challenging scenarios as incremental event detection, which requires a model to learn new classes incrementally without performance degradation on previous classes. However, existing incremental learning methods cannot handle semantic ambiguity and training data imbalance problems between old and new classes in the task of incremental event detection. In this paper, we propose a Knowledge Consolidation Network (KCN) to address the above issues. Specifically, we devise two components, prototype enhanced retrospection and hierarchical distillation, to mitigate the adverse effects of semantic ambiguity and class imbalance, respectively. Experimental results demonstrate the effectiveness of the proposed method, outperforming the state-of-the-art model by 19% and 13.4% of whole F1 score on ACE benchmark and TAC KBP benchmark, respectively.

pdf bib
FedED: Federated Learning via Ensemble Distillation for Medical Relation Extraction
Dianbo Sui | Yubo Chen | Jun Zhao | Yantao Jia | Yuantao Xie | Weijian Sun
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Unlike other domains, medical texts are inevitably accompanied by private information, so sharing or copying these texts is strictly restricted. However, training a medical relation extraction model requires collecting these privacy-sensitive texts and storing them on one machine, which comes in conflict with privacy protection. In this paper, we propose a privacy-preserving medical relation extraction model based on federated learning, which enables training a central model with no single piece of private local data being shared or exchanged. Though federated learning has distinct advantages in privacy protection, it suffers from the communication bottleneck, which is mainly caused by the need to upload cumbersome local parameters. To overcome this bottleneck, we leverage a strategy based on knowledge distillation. Such a strategy uses the uploaded predictions of ensemble local models to train the central model without requiring uploading local parameters. Experiments on three publicly available medical relation extraction datasets demonstrate the effectiveness of our method.

pdf bib
Scene Restoring for Narrative Machine Reading Comprehension
Zhixing Tian | Yuanzhe Zhang | Kang Liu | Jun Zhao | Yantao Jia | Zhicheng Sheng
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

This paper focuses on machine reading comprehension for narrative passages. Narrative passages usually describe a chain of events. When reading this kind of passage, humans tend to restore a scene according to the text with their prior knowledge, which helps them understand the passage comprehensively. Inspired by this behavior of humans, we propose a method to let the machine imagine a scene during reading narrative for better comprehension. Specifically, we build a scene graph by utilizing Atomic as the external knowledge and propose a novel Graph Dimensional-Iteration Network (GDIN) to encode the graph. We conduct experiments on the ROCStories, a dataset of Story Cloze Test (SCT), and CosmosQA, a dataset of multiple choice. Our method achieves state-of-the-art.

pdf bib
Reconstructing Event Regions for Event Extraction via Graph Attention Networks
Pei Chen | Hang Yang | Kang Liu | Ruihong Huang | Yubo Chen | Taifeng Wang | Jun Zhao
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

Event information is usually scattered across multiple sentences within a document. The local sentence-level event extractors often yield many noisy event role filler extractions in the absence of a broader view of the document-level context. Filtering spurious extractions and aggregating event information in a document remains a challenging problem. Following the observation that a document has several relevant event regions densely populated with event role fillers, we build graphs with candidate role filler extractions enriched by sentential embeddings as nodes, and use graph attention networks to identify event regions in a document and aggregate event information. We characterize edges between candidate extractions in a graph into rich vector representations to facilitate event region identification. The experimental results on two datasets of two languages show that our approach yields new state-of-the-art performance for the challenging event extraction task.

pdf bib
Towards Causal Explanation Detection with Pyramid Salient-Aware Network
Xinyu Zuo | Yubo Chen | Kang Liu | Jun Zhao
Proceedings of the 19th Chinese National Conference on Computational Linguistics

Causal explanation analysis (CEA) can assist us to understand the reasons behind daily events, which has been found very helpful for understanding the coherence of messages. In this paper, we focus on Causal Explanation Detection, an important subtask of causal explanation analysis, which determines whether a causal explanation exists in one message. We design a Pyramid Salient-Aware Network (PSAN) to detect causal explanations on messages. PSAN can assist in causal explanation detection via capturing the salient semantics of discourses contained in their keywords with a bottom graph-based word-level salient network. Furthermore, PSAN can modify the dominance of discourses via a top attention-based discourse-level salient network to enhance explanatory semantics of messages. The experiments on the commonly used dataset of CEA shows that the PSAN outperforms the state-of-the-art method by 1.8% F1 value on the Causal Explanation Detection task.

pdf bib
Chinese Named Entity Recognition via Adaptive Multi-pass Memory Network with Hierarchical Tagging Mechanism
Pengfei Cao | Yubo Chen | Kang Liu | Jun Zhao
Proceedings of the 19th Chinese National Conference on Computational Linguistics

Named entity recognition (NER) aims to identify text spans that mention named entities and classify them into pre-defined categories. For Chinese NER task, most of the existing methods are character-based sequence labeling models and achieve great success. However, these methods usually ignore lexical knowledge, which leads to false prediction of entity boundaries. Moreover, these methods have difficulties in capturing tag dependencies. In this paper, we propose an Adaptive Multi-pass Memory Network with Hierarchical Tagging Mechanism (AMMNHT) to address all above problems. Specifically, to reduce the errors of predicting entity boundaries, we propose an adaptive multi-pass memory network to exploit lexical knowledge. In addition, we propose a hierarchical tagging layer to learn tag dependencies. Experimental results on three widely used Chinese NER datasets demonstrate that our proposed model significantly outperforms other state-of-the-art methods.

2019

pdf bib
Learning the Extraction Order of Multiple Relational Facts in a Sentence with Reinforcement Learning
Xiangrong Zeng | Shizhu He | Daojian Zeng | Kang Liu | Shengping Liu | Jun Zhao
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

The multiple relation extraction task tries to extract all relational facts from a sentence. Existing works didn’t consider the extraction order of relational facts in a sentence. In this paper we argue that the extraction order is important in this task. To take the extraction order into consideration, we apply the reinforcement learning into a sequence-to-sequence model. The proposed model could generate relational facts freely. Widely conducted experiments on two public datasets demonstrate the efficacy of the proposed method.

pdf bib
Neural Cross-Lingual Event Detection with Minimal Parallel Resources
Jian Liu | Yubo Chen | Kang Liu | Jun Zhao
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

The scarcity in annotated data poses a great challenge for event detection (ED). Cross-lingual ED aims to tackle this challenge by transferring knowledge between different languages to boost performance. However, previous cross-lingual methods for ED demonstrated a heavy dependency on parallel resources, which might limit their applicability. In this paper, we propose a new method for cross-lingual ED, demonstrating a minimal dependency on parallel resources. Specifically, to construct a lexical mapping between different languages, we devise a context-dependent translation method; to treat the word order difference problem, we propose a shared syntactic order event detector for multilingual co-training. The efficiency of our method is studied through extensive experiments on two standard datasets. Empirical results indicate that our method is effective in 1) performing cross-lingual transfer concerning different directions and 2) tackling the extremely annotation-poor scenario.

pdf bib
Generating Questions for Knowledge Bases via Incorporating Diversified Contexts and Answer-Aware Loss
Cao Liu | Kang Liu | Shizhu He | Zaiqing Nie | Jun Zhao
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We tackle the task of question generation over knowledge bases. Conventional methods for this task neglect two crucial research issues: 1) the given predicate needs to be expressed; 2) the answer to the generated question needs to be definitive. In this paper, we strive toward the above two issues via incorporating diversified contexts and answer-aware loss. Specifically, we propose a neural encoder-decoder model with multi-level copy mechanisms to generate such questions. Furthermore, the answer aware loss is introduced to make generated questions corresponding to more definitive answers. Experiments demonstrate that our model achieves state-of-the-art performance. Meanwhile, such generated question is able to express the given predicate and correspond to a definitive answer.

pdf bib
Leverage Lexical Knowledge for Chinese Named Entity Recognition via Collaborative Graph Network
Dianbo Sui | Yubo Chen | Kang Liu | Jun Zhao | Shengping Liu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

The lack of word boundaries information has been seen as one of the main obstacles to develop a high performance Chinese named entity recognition (NER) system. Fortunately, the automatically constructed lexicon contains rich word boundaries information and word semantic information. However, integrating lexical knowledge in Chinese NER tasks still faces challenges when it comes to self-matched lexical words as well as the nearest contextual lexical words. We present a Collaborative Graph Network to solve these challenges. Experiments on various datasets show that our model not only outperforms the state-of-the-art (SOTA) results, but also achieves a speed that is six to fifteen times faster than that of the SOTA model.

pdf bib
Machine Reading Comprehension Using Structural Knowledge Graph-aware Network
Delai Qiu | Yuanzhe Zhang | Xinwei Feng | Xiangwen Liao | Wenbin Jiang | Yajuan Lyu | Kang Liu | Jun Zhao
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Leveraging external knowledge is an emerging trend in machine comprehension task. Previous work usually utilizes knowledge graphs such as ConceptNet as external knowledge, and extracts triples from them to enhance the initial representation of the machine comprehension context. However, such method cannot capture the structural information in the knowledge graph. To this end, we propose a Structural Knowledge Graph-aware Network(SKG) model, constructing sub-graphs for entities in the machine comprehension context. Our method dynamically updates the representation of the knowledge according to the structural information of the constructed sub-graph. Experiments show that SKG achieves state-of-the-art performance on the ReCoRD dataset.

pdf bib
Incorporating Interlocutor-Aware Context into Response Generation on Multi-Party Chatbots
Cao Liu | Kang Liu | Shizhu He | Zaiqing Nie | Jun Zhao
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Conventional chatbots focus on two-party response generation, which simplifies the real dialogue scene. In this paper, we strive toward a novel task of Response Generation on Multi-Party Chatbot (RGMPC), where the generated responses heavily rely on the interlocutors’ roles (e.g., speaker and addressee) and their utterances. Unfortunately, complex interactions among the interlocutors’ roles make it challenging to precisely capture conversational contexts and interlocutors’ information. Facing this challenge, we present a response generation model which incorporates Interlocutor-aware Contexts into Recurrent Encoder-Decoder frameworks (ICRED) for RGMPC. Specifically, we employ interactive representations to capture dialogue contexts for different interlocutors. Moreover, we leverage an addressee memory to enhance contextual interlocutor information for the target addressee. Finally, we construct a corpus for RGMPC based on an existing open-access dataset. Automatic and manual evaluations demonstrate that the ICRED remarkably outperforms strong baselines.

pdf bib
Vocabulary Pyramid Network: Multi-Pass Encoding and Decoding with Multi-Level Vocabularies for Response Generation
Cao Liu | Shizhu He | Kang Liu | Jun Zhao
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We study the task of response generation. Conventional methods employ a fixed vocabulary and one-pass decoding, which not only make them prone to safe and general responses but also lack further refining to the first generated raw sequence. To tackle the above two problems, we present a Vocabulary Pyramid Network (VPN) which is able to incorporate multi-pass encoding and decoding with multi-level vocabularies into response generation. Specifically, the dialogue input and output are represented by multi-level vocabularies which are obtained from hierarchical clustering of raw words. Then, multi-pass encoding and decoding are conducted on the multi-level vocabularies. Since VPN is able to leverage rich encoding and decoding information with multi-level vocabularies, it has the potential to generate better responses. Experiments on English Twitter and Chinese Weibo datasets demonstrate that VPN remarkably outperforms strong baselines.

pdf bib
AdaNSP: Uncertainty-driven Adaptive Decoding in Neural Semantic Parsing
Xiang Zhang | Shizhu He | Kang Liu | Jun Zhao
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Neural semantic parsers utilize the encoder-decoder framework to learn an end-to-end model for semantic parsing that transduces a natural language sentence to the formal semantic representation. To keep the model aware of the underlying grammar in target sequences, many constrained decoders were devised in a multi-stage paradigm, which decode to the sketches or abstract syntax trees first, and then decode to target semantic tokens. We instead to propose an adaptive decoding method to avoid such intermediate representations. The decoder is guided by model uncertainty and automatically uses deeper computations when necessary. Thus it can predict tokens adaptively. Our model outperforms the state-of-the-art neural models and does not need any expertise like predefined grammar or sketches in the meantime.

2018

pdf bib
Pattern-revising Enhanced Simple Question Answering over Knowledge Bases
Yanchao Hao | Hao Liu | Shizhu He | Kang Liu | Jun Zhao
Proceedings of the 27th International Conference on Computational Linguistics

Question Answering over Knowledge Bases (KB-QA), which automatically answer natural language questions based on the facts contained by a knowledge base, is one of the most important natural language processing (NLP) tasks. Simple questions constitute a large part of questions queried on the web, still being a challenge to QA systems. In this work, we propose to conduct pattern extraction and entity linking first, and put forward pattern revising procedure to mitigate the error propagation problem. In order to learn to rank candidate subject-predicate pairs to enable the relevant facts retrieval given a question, we propose to do joint fact selection enhanced by relation detection. Multi-level encodings and multi-dimension information are leveraged to strengthen the whole procedure. The experimental results demonstrate that our approach sets a new record in this task, outperforming the current state-of-the-art by an absolute large margin.

pdf bib
Extracting Relational Facts by an End-to-End Neural Model with Copy Mechanism
Xiangrong Zeng | Daojian Zeng | Shizhu He | Kang Liu | Jun Zhao
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The relational facts in sentences are often complicated. Different relational triplets may have overlaps in a sentence. We divided the sentences into three types according to triplet overlap degree, including Normal, EntityPairOverlap and SingleEntiyOverlap. Existing methods mainly focus on Normal class and fail to extract relational triplets precisely. In this paper, we propose an end-to-end model based on sequence-to-sequence learning with copy mechanism, which can jointly extract relational facts from sentences of any of these classes. We adopt two different strategies in decoding process: employing only one united decoder or applying multiple separated decoders. We test our models in two public datasets and our model outperform the baseline method significantly.

pdf bib
DCFEE: A Document-level Chinese Financial Event Extraction System based on Automatically Labeled Training Data
Hang Yang | Yubo Chen | Kang Liu | Yang Xiao | Jun Zhao
Proceedings of ACL 2018, System Demonstrations

We present an event extraction framework to detect event mentions and extract events from the document-level financial news. Up to now, methods based on supervised learning paradigm gain the highest performance in public datasets (such as ACE2005, KBP2015). These methods heavily depend on the manually labeled training data. However, in particular areas, such as financial, medical and judicial domains, there is no enough labeled data due to the high cost of data labeling process. Moreover, most of the current methods focus on extracting events from one sentence, but an event is usually expressed by multiple sentences in one document. To solve these problems, we propose a Document-level Chinese Financial Event Extraction (DCFEE) system which can automatically generate a large scaled labeled data and extract events from the whole document. Experimental results demonstrate the effectiveness of it

pdf bib
Adversarial Transfer Learning for Chinese Named Entity Recognition with Self-Attention Mechanism
Pengfei Cao | Yubo Chen | Kang Liu | Jun Zhao | Shengping Liu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Named entity recognition (NER) is an important task in natural language processing area, which needs to determine entities boundaries and classify them into pre-defined categories. For Chinese NER task, there is only a very small amount of annotated data available. Chinese NER task and Chinese word segmentation (CWS) task have many similar word boundaries. There are also specificities in each task. However, existing methods for Chinese NER either do not exploit word boundary information from CWS or cannot filter the specific information of CWS. In this paper, we propose a novel adversarial transfer learning framework to make full use of task-shared boundaries information and prevent the task-specific features of CWS. Besides, since arbitrary character can provide important cues when predicting entity type, we exploit self-attention to explicitly capture long range dependencies between two tokens. Experimental results on two different widely used datasets show that our proposed model significantly and consistently outperforms other state-of-the-art methods.

pdf bib
Collective Event Detection via a Hierarchical and Bias Tagging Networks with Gated Multi-level Attention Mechanisms
Yubo Chen | Hang Yang | Kang Liu | Jun Zhao | Yantao Jia
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Traditional approaches to the task of ACE event detection primarily regard multiple events in one sentence as independent ones and recognize them separately by using sentence-level information. However, events in one sentence are usually interdependent and sentence-level information is often insufficient to resolve ambiguities for some types of events. This paper proposes a novel framework dubbed as Hierarchical and Bias Tagging Networks with Gated Multi-level Attention Mechanisms (HBTNGMA) to solve the two problems simultaneously. Firstly, we propose a hierachical and bias tagging networks to detect multiple events in one sentence collectively. Then, we devise a gated multi-level attention to automatically extract and dynamically fuse the sentence-level and document-level information. The experimental results on the widely used ACE 2005 dataset show that our approach significantly outperforms other state-of-the-art methods.

2017

pdf bib
Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning
Shizhu He | Cao Liu | Kang Liu | Jun Zhao
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Generating answer with natural language sentence is very important in real-world question answering systems, which needs to obtain a right answer as well as a coherent natural response. In this paper, we propose an end-to-end question answering system called COREQA in sequence-to-sequence learning, which incorporates copying and retrieving mechanisms to generate natural answers within an encoder-decoder framework. Specifically, in COREQA, the semantic units (words, phrases and entities) in a natural answer are dynamically predicted from the vocabulary, copied from the given question and/or retrieved from the corresponding knowledge base jointly. Our empirical study on both synthetic and real-world datasets demonstrates the efficiency of COREQA, which is able to generate correct, coherent and natural answers for knowledge inquired questions.

pdf bib
An End-to-End Model for Question Answering over Knowledge Base with Cross-Attention Combining Global Knowledge
Yanchao Hao | Yuanzhe Zhang | Kang Liu | Shizhu He | Zhanyi Liu | Hua Wu | Jun Zhao
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

With the rapid growth of knowledge bases (KBs) on the web, how to take full advantage of them becomes increasingly important. Question answering over knowledge base (KB-QA) is one of the promising approaches to access the substantial knowledge. Meanwhile, as the neural network-based (NN-based) methods develop, NN-based KB-QA has already achieved impressive results. However, previous work did not put more emphasis on question representation, and the question is converted into a fixed vector regardless of its candidate answers. This simple representation strategy is not easy to express the proper information in the question. Hence, we present an end-to-end neural network model to represent the questions and their corresponding scores dynamically according to the various candidate answer aspects via cross-attention mechanism. In addition, we leverage the global knowledge inside the underlying KB, aiming at integrating the rich KB information into the representation of the answers. As a result, it could alleviates the out-of-vocabulary (OOV) problem, which helps the cross-attention model to represent the question more precisely. The experimental results on WebQuestions demonstrate the effectiveness of the proposed approach.

pdf bib
Handling Cold-Start Problem in Review Spam Detection by Jointly Embedding Texts and Behaviors
Xuepeng Wang | Kang Liu | Jun Zhao
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Solving cold-start problem in review spam detection is an urgent and significant task. It can help the on-line review websites to relieve the damage of spammers in time, but has never been investigated by previous work. This paper proposes a novel neural network model to detect review spam for cold-start problem, by learning to represent the new reviewers’ review with jointly embedded textual and behavioral information. Experimental results prove the proposed model achieves an effective performance and possesses preferable domain-adaptability. It is also applicable to a large scale dataset in an unsupervised way.

pdf bib
Automatically Labeled Data Generation for Large Scale Event Extraction
Yubo Chen | Shulin Liu | Xiang Zhang | Kang Liu | Jun Zhao
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Modern models of event extraction for tasks like ACE are based on supervised learning of events from small hand-labeled data. However, hand-labeled training data is expensive to produce, in low coverage of event types, and limited in size, which makes supervised methods hard to extract large scale of events for knowledge base population. To solve the data labeling problem, we propose to automatically label training data for event extraction via world knowledge and linguistic knowledge, which can detect key arguments and trigger words for each event type and employ them to label events in texts automatically. The experimental results show that the quality of our large scale automatically labeled data is competitive with elaborately human-labeled data. And our automatically labeled data can incorporate with human-labeled data, then improve the performance of models learned from these data.

pdf bib
Exploiting Argument Information to Improve Event Detection via Supervised Attention Mechanisms
Shulin Liu | Yubo Chen | Kang Liu | Jun Zhao
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This paper tackles the task of event detection (ED), which involves identifying and categorizing events. We argue that arguments provide significant clues to this task, but they are either completely ignored or exploited in an indirect manner in existing detection approaches. In this work, we propose to exploit argument information explicitly for ED via supervised attention mechanisms. In specific, we systematically investigate the proposed model under the supervision of different attention strategies. Experimental results show that our approach advances state-of-the-arts and achieves the best F1 score on ACE 2005 dataset.

pdf bib
IJCNLP-2017 Task 5: Multi-choice Question Answering in Examinations
Shangmin Guo | Kang Liu | Shizhu He | Cao Liu | Jun Zhao | Zhuoyu Wei
Proceedings of the IJCNLP 2017, Shared Tasks

The IJCNLP-2017 Multi-choice Question Answering(MCQA) task aims at exploring the performance of current Question Answering(QA) techniques via the realworld complex questions collected from Chinese Senior High School Entrance Examination papers and CK12 website1. The questions are all 4-way multi-choice questions writing in Chinese and English respectively that cover a wide range of subjects, e.g. Biology, History, Life Science and etc. And, all questions are restrained within the elementary and middle school level. During the whole procedure of this task, 7 teams submitted 323 runs in total. This paper describes the collected data, the format and size of these questions, formal run statistics and results, overview and performance statistics of different methods

pdf bib
Which is the Effective Way for Gaokao: Information Retrieval or Neural Networks?
Shangmin Guo | Xiangrong Zeng | Shizhu He | Kang Liu | Jun Zhao
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

As one of the most important test of China, Gaokao is designed to be difficult enough to distinguish the excellent high school students. In this work, we detailed the Gaokao History Multiple Choice Questions(GKHMC) and proposed two different approaches to address them using various resources. One approach is based on entity search technique (IR approach), the other is based on text entailment approach where we specifically employ deep neural networks(NN approach). The result of experiment on our collected real Gaokao questions showed that they are good at different categories of questions, that is IR approach performs much better at entity questions(EQs) while NN approach shows its advantage on sentence questions(SQs). We achieve state-of-the-art performance and show that it’s indispensable to apply hybrid method when participating in the real-world tests.

2016

pdf bib
Learning to Represent Review with Tensor Decomposition for Spam Detection
Xuepeng Wang | Kang Liu | Shizhu He | Jun Zhao
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Mining Inference Formulas by Goal-Directed Random Walks
Zhuoyu Wei | Jun Zhao | Kang Liu
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Inner Attention based Recurrent Neural Networks for Answer Selection
Bingning Wang | Kang Liu | Jun Zhao
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Leveraging FrameNet to Improve Automatic Event Detection
Shulin Liu | Yubo Chen | Shizhu He | Kang Liu | Jun Zhao
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Book Review: Sentiment Analysis: Mining Opinions, Sentiments, and Emotions by Bing Liu
Jun Zhao | Kang Liu | Liheng Xu
Computational Linguistics, Volume 42, Issue 3 - September 2016

2015

pdf bib
Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks
Daojian Zeng | Kang Liu | Yubo Chen | Jun Zhao
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Short Text Clustering via Convolutional Neural Networks
Jiaming Xu | Peng Wang | Guanhua Tian | Bo Xu | Jun Zhao | Fangyuan Wang | Hongwei Hao
Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing

pdf bib
Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks
Yubo Chen | Liheng Xu | Kang Liu | Daojian Zeng | Jun Zhao
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering
Guangyou Zhou | Tingting He | Jun Zhao | Po Hu
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Sentiment-Aspect Extraction based on Restricted Boltzmann Machines
Linlin Wang | Kang Liu | Zhu Cao | Jun Zhao | Gerard de Melo
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Knowledge Graph Embedding via Dynamic Mapping Matrix
Guoliang Ji | Shizhu He | Liheng Xu | Kang Liu | Jun Zhao
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

pdf bib
Group Non-negative Matrix Factorization with Natural Categories for Question Retrieval in Community Question Answer Archives
Guangyou Zhou | Yubo Chen | Daojian Zeng | Jun Zhao
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Joint Opinion Relation Detection Using One-Class Deep Neural Network
Liheng Xu | Kang Liu | Jun Zhao
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Sentiment Classification with Graph Co-Regularization
Guangyou Zhou | Jun Zhao | Daojian Zeng
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Exploring Fine-grained Entity Type Constraints for Distantly Supervised Relation Extraction
Yang Liu | Kang Liu | Liheng Xu | Jun Zhao
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Relation Classification via Convolutional Deep Neural Network
Daojian Zeng | Kang Liu | Siwei Lai | Guangyou Zhou | Jun Zhao
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Question Answering over Linked Data Using First-order Logic
Shizhu He | Kang Liu | Yuanzhe Zhang | Liheng Xu | Jun Zhao
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Extracting Opinion Targets and Opinion Words from Online Reviews with Graph Co-ranking
Kang Liu | Liheng Xu | Jun Zhao
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Product Feature Mining: Semantic Clues versus Syntactic Constituents
Liheng Xu | Kang Liu | Siwei Lai | Jun Zhao
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2013

pdf bib
A Weakly Supervised Bayesian Model for Violence Detection in Social Media
Amparo Elizabeth Cano Basave | Yulan He | Kang Liu | Jun Zhao
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Attribute Relation Extraction from Template-inconsistent Semi-structured Text by Leveraging Site-level Knowledge
Yang Liu | Fang Liu | Siwei Lai | Kang Liu | Guangyou Zhou | Jun Zhao
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Statistical Machine Translation Improves Question Retrieval in Community Question Answering via Matrix Factorization
Guangyou Zhou | Fang Liu | Yang Liu | Shizhu He | Jun Zhao
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Syntactic Patterns versus Word Alignment: Extracting Opinion Targets from Online Reviews
Kang Liu | Liheng Xu | Jun Zhao
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Mining Opinion Words and Opinion Targets in a Two-Stage Framework
Liheng Xu | Kang Liu | Siwei Lai | Yubo Chen | Jun Zhao
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Joint Inference for Heterogeneous Dependency Parsing
Guangyou Zhou | Jun Zhao
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
Opinion Target Extraction Using Word-Based Translation Model
Kang Liu | Liheng Xu | Jun Zhao
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Exploiting Bilingual Translation for Question Retrieval in Community-Based Question Answering
Guangyou Zhou | Kang Liu | Jun Zhao
Proceedings of COLING 2012

2011

pdf bib
Improving Dependency Parsing with Fined-Grained Features
Guangyou Zhou | Li Cai | Kang Liu | Jun Zhao
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Learning the Latent Topics for Question Retrieval in Community QA
Li Cai | Guangyou Zhou | Kang Liu | Jun Zhao
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives
Guangyou Zhou | Li Cai | Jun Zhao | Kang Liu
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing
Guangyou Zhou | Jun Zhao | Kang Liu | Li Cai
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
Xianpei Han | Jun Zhao
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

2009

pdf bib
A Chinese-English Organization Name Translation System Using Heuristic Web Mining and Asymmetric Alignment
Fan Yang | Jun Zhao | Kang Liu
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

2008

pdf bib
Adding Redundant Features for CRFs-based Sentence Sentiment Classification
Jun Zhao | Kang Liu | Gen Wang
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf bib
CRFs-Based Named Entity Recognition Incorporated with Heuristic Entity List Searching
Fan Yang | Jun Zhao | Bo Zou
Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing

pdf bib
Chinese-English Backward Transliteration Assisted with Mining Monolingual Web Pages
Fan Yang | Jun Zhao | Bo Zou | Kang Liu | Feifan Liu
Proceedings of ACL-08: HLT

2007

pdf bib
Probabilistic Parsing Action Models for Multi-Lingual Dependency Parsing
Xiangyu Duan | Jun Zhao | Bo Xu
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
Cluster-Based Language Model for Sentence Retrieval in Chinese Question Answering
Youzheng Wu | Jun Zhao | Bo Xu
Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing

pdf bib
A Hybrid Approach to Chinese Base Noun Phrase Chunking
Fang Xu | Chengqing Zong | Jun Zhao
Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing

pdf bib
Multi-feature Based Chinese-English Named Entity Extraction from Comparable Corpora
Min Lu | Jun Zhao
Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation

2005

pdf bib
Chinese Named Entity Recognition with Multiple Features
Youzheng Wu | Jun Zhao | Bo Xu | Hao Yu
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
Product Named Entity Recognition Based on Hierarchical Hidden Markov Model
Feifan Liu | Jun Zhao | Bibo Lv | Bo Xu | Hao Yu
Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing

2003

pdf bib
Chinese Named Entity Recognition Combining Statistical Model wih Human Knowledge
Youzheng Wu | Jun Zhao | Bo Xu
Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-language Named Entity Recognition

2000

pdf bib
An Information-Theory-Based Feature Type Analysis for the Modeling of Statistical Parsing
Zhifang Sui | Jun Zhao | Dekai Wu
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics

1999

pdf bib
An Information-Theoretic Empirical Analysis of Dependency-Based Feature Types for Word Prediction Models
Dekai Wu | Jun Zhao | Zhifang Sui
1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora

1998

pdf bib
A Quasi-Dependency Model for Structural Analysis it of Chinese BaseNPs
Jun Zhao | Changning Huang
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

pdf bib
A Quasi-Dependency Model for the Structural Analysis of Chinese BaseNPs
Jun Zhao | Changning Huang
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

Search
Co-authors