Proceedings of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Deep Learning Architectures
The cloze test for Chinese idioms is a new challenge in machine reading comprehension: given a sentence with a blank, choosing a candidate Chinese idiom which matches the context. Chinese idiom is a type of Chinese idiomatic expression. The common misuse of Chinese idioms leads to error in corpus and causes error in the learned semantic representation of Chinese idioms. In this paper, we introduce the definition written by Chinese experts to correct the misuse. We propose a model for the Chinese idiom cloze test integrating various information effectively. We propose an attention mechanism called Attribute Attention to balance the weight of different attributes among different descriptions of the Chinese idiom. Besides the given candidates of every blank, we also try to choose the answer from all Chinese idioms that appear in the dataset as the extra loss due to the uniqueness and specificity of Chinese idioms. In experiments, our model outperforms the state-of-the-art model.
This submission is a paper that proposes an architecture for the relation extraction task which integrates semantic information with knowledge base modeling in a novel manner.
Studies have shown that deep neural networks (DNNs) are vulnerable to adversarial examples – perturbed inputs that cause DNN-based models to produce incorrect results. One robust adversarial attack in the NLP domain is the synonym substitution. In attacks of this variety, the adversary substitutes words with synonyms. Since synonym substitution perturbations aim to satisfy all lexical, grammatical, and semantic constraints, they are difficult to detect with automatic syntax check as well as by humans. In this paper, we propose a structure-free defensive method that is capable of improving the performance of DNN-based models with both clean and adversarial data. Our findings show that replacing the embeddings of the important words in the input samples with the average of their synonyms’ embeddings can significantly improve the generalization of DNN-based classifiers. By doing so, we reduce model sensitivity to particular words in the input samples. Our results indicate that the proposed defense is not only capable of defending against adversarial attacks, but is also capable of improving the performance of DNN-based models when tested on benign data. On average, the proposed defense improved the classification accuracy of the CNN and Bi-LSTM models by 41.30% and 55.66%, respectively, when tested under adversarial attacks. Extended investigation shows that our defensive method can improve the robustness of nonneural models, achieving an average of 17.62% and 22.93% classification accuracy increase on the SVM and XGBoost models, respectively. The proposed defensive method has also shown an average of 26.60% classification accuracy improvement when tested with the infamous BERT model. Our algorithm is generic enough to be applied in any NLP domain and to any model trained on any natural language.
In this paper, we investigate data augmentation for text generation, which we call GenAug. Text generation and language modeling are important tasks within natural language processing, and are especially challenging for low-data regimes. We propose and evaluate various augmentation methods, including some that incorporate external knowledge, for finetuning GPT-2 on a subset of Yelp Reviews. We also examine the relationship between the amount of augmentation and the quality of the generated text. We utilize several metrics that evaluate important aspects of the generated text including its diversity and fluency. Our experiments demonstrate that insertion of character-level synthetic noise and keyword replacement with hypernyms are effective augmentation methods, and that the quality of generations improves to a peak at approximately three times the amount of original data.
Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers
Anne Lauscher | Olga Majewska | Leonardo F. R. Ribeiro | Iryna Gurevych | Nikolai Rozanov | Goran Glavaš
Following the major success of neural language models (LMs) such as BERT or GPT-2 on a variety of language understanding tasks, recent work focused on injecting (structured) knowledge from external resources into these models. While on the one hand, joint pre-training (i.e., training from scratch, adding objectives based on external knowledge to the primary LM objective) may be prohibitively computationally expensive, post-hoc fine-tuning on external knowledge, on the other hand, may lead to the catastrophic forgetting of distributional knowledge. In this work, we investigate models for complementing the distributional knowledge of BERT with conceptual knowledge from ConceptNet and its corresponding Open Mind Common Sense (OMCS) corpus, respectively, using adapter training. While overall results on the GLUE benchmark paint an inconclusive picture, a deeper analysis reveals that our adapter-based models substantially outperform BERT (up to 15-20 performance points) on inference tasks that require the type of conceptual knowledge explicitly present in ConceptNet and OMCS. We also open source all our experiments and relevant code under: https://github.com/wluper/retrograph.
Entity-attribute relations are a fundamental component for building large-scale knowledge bases, which are widely employed in modern search engines. However, most such knowledge bases are manually curated, covering only a small fraction of all attributes, even for common entities. To improve the precision of model-based entity-attribute extraction, we propose attribute-aware embeddings, which embeds entities and attributes in the same space by the similarity of their attributes. Our model, EANET, learns these embeddings by representing entities as a weighted sum of their attributes and concatenates these embeddings to mention level features. EANET achieves up to 91% classification accuracy, outperforming strong baselines and achieves 83% precision on manually labeled high confidence extractions, outperforming Biperpedia (Gupta et al., 2014), a previous state-of-the-art for large scale entity-attribute extraction.
Deep neural networks have demonstrated high performance on many natural language processing (NLP) tasks that can be answered directly from text, and have struggled to solve NLP tasks requiring external (e.g., world) knowledge. In this paper, we present OSCR (Ontology-based Semantic Composition Regularization), a method for injecting task-agnostic knowledge from an Ontology or knowledge graph into a neural network during pre-training. We evaluated the performance of BERT pre-trained on Wikipedia with and without OSCR by measuring the performance when fine-tuning on two question answering tasks involving world knowledge and causal reasoning and one requiring domain (healthcare) knowledge and obtained 33.3%, 18.6%, and 4% improved accuracy compared to pre-training BERT without OSCR.
Medical concept normalization (MCN) i.e., mapping of colloquial medical phrases to standard concepts is an essential step in analysis of medical social media text. The main drawback in existing state-of-the-art approach (Kalyan and Sangeetha, 2020b) is learning target concept vector representations from scratch which requires more number of training instances. Our model is based on RoBERTa and target concept embeddings. In our model, we integrate a) target concept information in the form of target concept vectors generated by encoding target concept descriptions using SRoBERTa, state-of-the-art RoBERTa based sentence embedding model and b) domain lexicon knowledge by enriching target concept vectors with synonym relationship knowledge using retrofitting algorithm. It is the first attempt in MCN to exploit both target concept information as well as domain lexicon knowledge in the form of retrofitted target concept vectors. Our model outperforms all the existing models with an accuracy improvement up to 1.36% on three standard datasets. Further, our model when trained only on mapping lexicon synonyms achieves up to 4.87% improvement in accuracy.
Pretrained language models have excelled at many NLP tasks recently; however, their social intelligence is still unsatisfactory. To enable this, machines need to have a more general understanding of our complicated world and develop the ability to perform commonsense reasoning besides fitting the specific downstream tasks. External commonsense knowledge graphs (KGs), such as ConceptNet, provide rich information about words and their relationships. Thus, towards general commonsense learning, we propose two approaches to implicitly and explicitly infuse such KGs into pretrained language models. We demonstrate our proposed methods perform well on SocialIQA, a social commonsense reasoning task, in both limited and full training data regimes.
In this work, we present our empirical attempt to identify the proper strategy of using Transformer Language Models to identify sentences consistent with commonsense. We tackle the first two tasks from the ComVE competition. The starting point for our work is the BERT assumption according to which a large number of NLP tasks can be solved with pre-trained Transformers with no substantial task-specific changes of the architecture. However, our experiments show that the encoding strategy can have a great impact on the quality of the fine-tuning. The combination between cross-encoding and multi-input models worked better than one cross-encoder and allowed us to achieve comparable results with the state-of-the-art without the use of any external data.
We demonstrate the complementary natures of neural knowledge graph embedding, fine-grain entity type prediction, and neural language modeling. We show that a language model-inspired knowledge graph embedding approach yields both improved knowledge graph embeddings and fine-grain entity type representations. Our work also shows that jointly modeling both structured knowledge tuples and language improves both.