Generating coherent and credible explanations remains a significant challenge in the field of AI. In recent years, researchers have delved into the utilization of entailment trees to depict explanations, which exhibit a reasoning process of how a hypothesis is deduced from the supporting facts. However, existing models often overlook the importance of generating intermediate conclusions with logical consistency from the given facts, leading to inaccurate conclusions and undermining the overall credibility of entailment trees. To address this limitation, we propose the logical pattern memory pre-trained model (LMPM). LMPM incorporates an external memory structure to learn and store the latent representations of logical patterns, which aids in generating logically consistent conclusions. Furthermore, to mitigate the influence of logically irrelevant domain knowledge in the Wikipedia-based data, we introduce an entity abstraction approach to construct the dataset for pre-training LMPM. The experimental results highlight the effectiveness of our approach in improving the quality of entailment tree generation. By leveraging logical entailment patterns, our model produces more coherent and reasonable conclusions that closely align with the underlying premises.
Much of commonsense knowledge in real world is the form of procudures or sequences of steps to achieve particular goals. In recent years, knowledge extraction on procedural documents has attracted considerable attention. However, they often focus on procedural text but ignore a common multimodal scenario in the real world. Images and text can complement each other semantically, alleviating the semantic ambiguity suffered in text-only modality. Motivated by these, in this paper, we explore a problem of grounded multimodal procedural entity recognition (GMPER), aiming to detect the entity and the corresponding bounding box groundings in image (i.e., visual entities). A new dataset (Wiki-GMPER) is bult and extensive experiments are conducted to evaluate the effectiveness of our proposed model.
Current structured and semi-structured knowledge bases mainly focus on representing descriptive knowledge but ignore another commonsense knowledge (Procedural Knowledge). To structure the procedural knowledge, existing methods are proposed to automatically generate flow graphs from procedural documents. They focus on extracting sequential dependency between sentences but neglect another two important dependencies (i.e., inclusion dependency and constraint dependency) in procedural documents. In our paper, we explore a problem of automatically generating procedural graph with multiple dependency relations to extend the flow graph constructed by existing methods and propose a procedural graph construction method with syntactic information and discourse structures. A new dataset (WHPG) is built and extensive experiments are conducted to evaluate the effectiveness of our proposed model.
Sentiment analysis in social media is challenging since posts are short of context. As a popular way to express emotion on social media, stickers related to these posts can supplement missing sentiments and help identify sentiments precisely. However, research about stickers has not been investigated further. To this end, we present a Chinese sticker-based multimodal dataset for the sentiment analysis task (CSMSA). Compared with previous real-world photo-based multimodal datasets, the CSMSA dataset focuses on stickers, conveying more vivid and moving emotions. The sticker-based multimodal sentiment analysis task is challenging in three aspects: inherent multimodality of stickers, significant inter-series variations between stickers, and complex multimodal sentiment fusion. We propose SAMSAM to address the above three challenges. Our model introduces a flexible masked self-attention mechanism to allow the dynamic interaction between post texts and stickers. The experimental results indicate that our model performs best compared with other models. More researches need to be devoted to this field. The dataset is publicly available at
https://github.com/Logos23333/CSMSA.
Relation Classification (RC) plays an important role in natural language processing (NLP). Current conventional supervised and distantly supervised RC models always make a closed-world assumption which ignores the emergence of novel relations in open environment. To incrementally recognize the novel relations, current two solutions (i.e, re-training and lifelong learning) are designed but suffer from the lack of large-scale labeled data for novel relations. Meanwhile, prototypical network enjoys better performance on both fields of deep supervised learning and few-shot learning. However, it still suffers from the incompatible feature embedding problem when the novel relations come in. Motivated by them, we propose a two-phase prototypical network with prototype attention alignment and triplet loss to dynamically recognize the novel relations with a few support instances meanwhile without catastrophic forgetting. Extensive experiments are conducted to evaluate the effectiveness of our proposed model.