Although deep neural networks have been widely employed and proven effective in sentiment analysis tasks, it remains challenging for model developers to assess their models for erroneous predictions that might exist prior to deployment. Once deployed, emergent errors can be hard to identify in prediction run-time and impossible to trace back to their sources. To address such gaps, in this paper we propose an error detection framework for sentiment analysis based on explainable features. We perform global-level feature validation with human-in-the-loop assessment, followed by an integration of global and local-level feature contribution analysis. Experimental results show that, given limited human-in-the-loop intervention, our method is able to identify erroneous model predictions on unseen data with high precision.
Customers of machine learning systems demand accountability from the companies employing these algorithms for various prediction tasks. Accountability requires understanding of system limit and condition of erroneous predictions, as customers are often interested in understanding the incorrect predictions, and model developers are absorbed in finding methods that can be used to get incremental improvements to an existing system. Therefore, we propose an accountable error characterization method, AEC, to understand when and where errors occur within the existing black-box models. AEC, as constructed with human-understandable linguistic features, allows the model developers to automatically identify the main sources of errors for a given classification system. It can also be used to sample for the set of most informative input points for a next round of training. We perform error detection for a sentiment analysis task using AEC as a case study. Our results on the sample sentiment task show that AEC is able to characterize erroneous predictions into human understandable categories and also achieves promising results on selecting erroneous samples when compared with the uncertainty-based sampling.
Document-level Relation Extraction (RE) is particularly challenging due to complex semantic interactions among multiple entities in a document. Among exiting approaches, Graph Convolutional Networks (GCN) is one of the most effective approaches for document-level RE. However, traditional GCN simply takes word nodes and adjacency matrix to represent graphs, which is difficult to establish direct connections between distant entity pairs. In this paper, we propose Global Context-enhanced Graph Convolutional Networks (GCGCN), a novel model which is composed of entities as nodes and context of entity pairs as edges between nodes to capture rich global context information of entities in a document. Two hierarchical blocks, Context-aware Attention Guided Graph Convolution (CAGGC) for partially connected graphs and Multi-head Attention Guided Graph Convolution (MAGGC) for fully connected graphs, could take progressively more global context into account. Meantime, we leverage a large-scale distantly supervised dataset to pre-train a GCGCN model with curriculum learning, which is then fine-tuned on the human-annotated dataset for further improving document-level RE performance. The experimental results on DocRED show that our model could effectively capture rich global context information in the document, leading to a state-of-the-art result. Our code is available at https://github.com/Huiweizhou/GCGCN.
In medical domain, given a medical question, it is difficult to manually select the most relevant information from a large number of search results. BioNLP 2019 proposes Question Answering (QA) task, which encourages the use of text mining technology to automatically judge whether a search result is an answer to the medical question. The main challenge of QA task is how to mine the semantic relation between question and answer. We propose BioBERT Transformer model to tackle this challenge, which applies Transformers to extract semantic relation between different words in questions and answers. Furthermore, BioBERT is utilized to encode medical domain-specific contextualized word representations. Our method has reached the accuracy of 76.24% and spearman of 17.12% on the BioNLP 2019 QA task.