In the field of natural language processing, correction of performance assessment for chance agreement plays a crucial role in evaluating the reliability of annotations. However, there is a notable dearth of research focusing on chance correction for assessing the reliability of sequence annotation tasks, despite their widespread prevalence in the field. To address this gap, this paper introduces a novel model for generating random annotations, which serves as the foundation for estimating chance agreement in sequence annotation tasks. Utilizing the proposed randomization model and a related comparison approach, we successfully derive the analytical form of the distribution, enabling the computation of the probable location of each annotated text segment and subsequent chance agreement estimation. Through a combination simulation and corpus-based evaluation, we successfully assess its applicability and validate its accuracy and efficacy.
Knowledge graph embedding (KGE) has been well-studied in general domains, but has not been examined for food computing. To fill this gap, we perform knowledge representation learning over a food knowledge graph (KG). We employ a pre-trained language model to encode entities and relations, thus emphasizing contextual information in food KGs. The model is trained on two tasks – predicting a masked entity from a given triple from the KG and predicting the plausibility of a triple. Analysis of food substitutions helps in dietary choices for enabling healthier eating behaviors. Previous work in food substitutions mainly focuses on semantic similarity while ignoring the context. It is also hard to evaluate the substitutions due to the lack of an adequate validation set, and further, the evaluation is subjective based on perceived purpose. To tackle this problem, we propose a collection of adversarial sample generation strategies for different food substitutions over our learnt KGE. We propose multiple strategies to generate high quality context-aware recipe and ingredient substitutions and also provide generalized ingredient substitutions to meet different user needs. The effectiveness and efficiency of the proposed knowledge graph learning method and the following attack strategies are verified by extensive evaluations on a large-scale food KG.
Event extraction for the biomedical domain is more challenging than that in the general news domain since it requires broader acquisition of domain-specific knowledge and deeper understanding of complex contexts. To better encode contextual information and external background knowledge, we propose a novel knowledge base (KB)-driven tree-structured long short-term memory networks (Tree-LSTM) framework, incorporating two new types of features: (1) dependency structures to capture wide contexts; (2) entity properties (types and category descriptions) from external ontologies via entity linking. We evaluate our approach on the BioNLP shared task with Genia dataset and achieve a new state-of-the-art result. In addition, both quantitative and qualitative studies demonstrate the advancement of the Tree-LSTM and the external knowledge representation for biomedical event extraction.
In this paper we tackle two unique challenges in biomedical relation extraction. The first challenge is that the contextual information between two entity mentions often involves sophisticated syntactic structures. We propose a novel graph convolutional networks model that incorporates dependency parsing and contextualized embedding to effectively capture comprehensive contextual information. The second challenge is that most of the benchmark data sets for this task are quite imbalanced because more than 80% mention pairs are negative instances (i.e., no relations). We propose a multi-task learning framework to jointly model relation identification and classification tasks to propagate supervision signals from each other and apply a focal loss to focus training on ambiguous mention pairs. By applying these two strategies, experiments show that our model achieves state-of-the-art F-score on the 2013 drug-drug interaction extraction task.