Jun Yan


pdf bib
RockNER: A Simple Method to Create Adversarial Examples for Evaluating the Robustness of Named Entity Recognition Models
Bill Yuchen Lin | Wenyang Gao | Jun Yan | Ryan Moreno | Xiang Ren
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

To audit the robustness of named entity recognition (NER) models, we propose RockNER, a simple yet effective method to create natural adversarial examples. Specifically, at the entity level, we replace target entities with other entities of the same semantic class in Wikidata; at the context level, we use pre-trained language models (e.g., BERT) to generate word substitutions. Together, the two levels of at- tack produce natural adversarial examples that result in a shifted distribution from the training data on which our target models have been trained. We apply the proposed method to the OntoNotes dataset and create a new benchmark named OntoRock for evaluating the robustness of existing NER models via a systematic evaluation protocol. Our experiments and analysis reveal that even the best model has a significant performance drop, and these models seem to memorize in-domain entity patterns instead of reasoning from the context. Our work also studies the effects of a few simple data augmentation methods to improve the robustness of NER models.

pdf bib
Learning Contextualized Knowledge Structures for Commonsense Reasoning
Jun Yan | Mrigank Raman | Aaron Chan | Tianyu Zhang | Ryan Rossi | Handong Zhao | Sungchul Kim | Nedim Lipka | Xiang Ren
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
AdaTag: Multi-Attribute Value Extraction from Product Profiles with Adaptive Decoding
Jun Yan | Nasser Zalmout | Yan Liang | Christan Grant | Xiang Ren | Xin Luna Dong
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Automatic extraction of product attribute values is an important enabling technology in e-Commerce platforms. This task is usually modeled using sequence labeling architectures, with several extensions to handle multi-attribute extraction. One line of previous work constructs attribute-specific models, through separate decoders or entirely separate models. However, this approach constrains knowledge sharing across different attributes. Other contributions use a single multi-attribute model, with different techniques to embed attribute information. But sharing the entire network parameters across all attributes can limit the model’s capacity to capture attribute-specific characteristics. In this paper we present AdaTag, which uses adaptive decoding to handle extraction. We parameterize the decoder with pretrained attribute embeddings, through a hypernetwork and a Mixture-of-Experts (MoE) module. This allows for separate, but semantically correlated, decoders to be generated on the fly for different attributes. This approach facilitates knowledge sharing, while maintaining the specificity of each attribute. Our experiments on a real-world e-Commerce dataset show marked improvements over previous methods.


pdf bib
Scalable Multi-Hop Relational Reasoning for Knowledge-Aware Question Answering
Yanlin Feng | Xinyue Chen | Bill Yuchen Lin | Peifeng Wang | Jun Yan | Xiang Ren
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Existing work on augmenting question answering (QA) models with external knowledge (e.g., knowledge graphs) either struggle to model multi-hop relations efficiently, or lack transparency into the model’s prediction rationale. In this paper, we propose a novel knowledge-aware approach that equips pre-trained language models (PTLMs) has with a multi-hop relational reasoning module, named multi-hop graph relation network (MHGRN). It performs multi-hop, multi-relational reasoning over subgraphs extracted from external knowledge graphs. The proposed reasoning module unifies path-based reasoning methods and graph neural networks to achieve better interpretability and scalability. We also empirically show its effectiveness and scalability on CommonsenseQA and OpenbookQA datasets, and interpret its behaviors with case studies, with the code for experiments released.


pdf bib
A Deep Learning-Based System for PharmaCoNER
Ying Xiong | Yedan Shen | Yuanhang Huang | Shuai Chen | Buzhou Tang | Xiaolong Wang | Qingcai Chen | Jun Yan | Yi Zhou
Proceedings of The 5th Workshop on BioNLP Open Shared Tasks

The Biological Text Mining Unit at BSC and CNIO organized the first shared task on chemical & drug mention recognition from Spanish medical texts called PharmaCoNER (Pharmacological Substances, Compounds and proteins and Named Entity Recognition track) in 2019, which includes two tracks: one for NER offset and entity classification (track 1) and the other one for concept indexing (track 2). We developed a pipeline system based on deep learning methods for this shared task, specifically, a subsystem based on BERT (Bidirectional Encoder Representations from Transformers) for NER offset and entity classification and a subsystem based on Bpool (Bi-LSTM with max/mean pooling) for concept indexing. Evaluation conducted on the shared task data showed that our system achieves a micro-average F1-score of 0.9105 on track 1 and a micro-average F1-score of 0.8391 on track 2.

pdf bib
HITSZ-ICRC: A Report for SMM4H Shared Task 2019-Automatic Classification and Extraction of Adverse Effect Mentions in Tweets
Shuai Chen | Yuanhang Huang | Xiaowei Huang | Haoming Qin | Jun Yan | Buzhou Tang
Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task

This is the system description of the Harbin Institute of Technology Shenzhen (HITSZ) team for the first and second subtasks of the fourth Social Media Mining for Health Applications (SMM4H) shared task in 2019. The two subtasks are automatic classification and extraction of adverse effect mentions in tweets. The systems for the two subtasks are based on bidirectional encoder representations from transformers (BERT), and achieves promising results. Among the systems we developed for subtask1, the best F1-score was 0.6457, for subtask2, the best relaxed F1-score and the best strict F1-score were 0.614 and 0.407 respectively. Our system ranks first among all systems on subtask1.


pdf bib
Language Modeling with Sparse Product of Sememe Experts
Yihong Gu | Jun Yan | Hao Zhu | Zhiyuan Liu | Ruobing Xie | Maosong Sun | Fen Lin | Leyu Lin
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Most language modeling methods rely on large-scale data to statistically learn the sequential patterns of words. In this paper, we argue that words are atomic language units but not necessarily atomic semantic units. Inspired by HowNet, we use sememes, the minimum semantic units in human languages, to represent the implicit semantics behind words for language modeling, named Sememe-Driven Language Model (SDLM). More specifically, to predict the next word, SDLM first estimates the sememe distribution given textual context. Afterwards, it regards each sememe as a distinct semantic expert, and these experts jointly identify the most probable senses and the corresponding word. In this way, SDLM enables language models to work beyond word-level manipulation to fine-grained sememe-level semantics, and offers us more powerful tools to fine-tune language models and improve the interpretability as well as the robustness of language models. Experiments on language modeling and the downstream application of headline generation demonstrate the significant effectiveness of SDLM.


pdf bib
Active Sentiment Domain Adaptation
Fangzhao Wu | Yongfeng Huang | Jun Yan
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Domain adaptation is an important technology to handle domain dependence problem in sentiment analysis field. Existing methods usually rely on sentiment classifiers trained in source domains. However, their performance may heavily decline if the distributions of sentiment features in source and target domains have significant difference. In this paper, we propose an active sentiment domain adaptation approach to handle this problem. Instead of the source domain sentiment classifiers, our approach adapts the general-purpose sentiment lexicons to target domain with the help of a small number of labeled samples which are selected and annotated in an active learning mode, as well as the domain-specific sentiment similarities among words mined from unlabeled samples of target domain. A unified model is proposed to fuse different types of sentiment information and train sentiment classifier for target domain. Extensive experiments on benchmark datasets show that our approach can train accurate sentiment classifier with less labeled samples.


pdf bib
Towards Accurate Distant Supervision for Relational Facts Extraction
Xingxing Zhang | Jianwen Zhang | Junyu Zeng | Jun Yan | Zheng Chen | Zhifang Sui
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)