Xurui Li


2024

pdf bib
Knowledge Triplets Derivation from Scientific Publications via Dual-Graph Resonance
Kai Zhang | Pengcheng Li | Kaisong Song | Xurui Li | Yangyang Kang | Xuhong Zhang | Xiaozhong Liu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Scientific Information Extraction (SciIE) is a vital task and is increasingly being adopted in biomedical data mining to conceptualize and epitomize knowledge triplets from the scientific literature. Existing relation extraction methods aim to extract explicit triplet knowledge from documents, however, they can hardly perceive unobserved factual relations. Recent generative methods have more flexibility, but their generated relations will encounter trustworthiness problems. In this paper, we first propose a novel Extraction-Contextualization-Derivation (ECD) strategy to generate a document-specific and entity-expanded dynamic graph from a shared static knowledge graph. Then, we propose a novel Dual-Graph Resonance Network (DGRN) which can generate richer explicit and implicit relations under the guidance of static and dynamic knowledge topologies. Experiments conducted on a public PubMed corpus validate the superiority of our method against several state-of-the-art baselines.

pdf bib
PDAMeta: Meta-Learning Framework with Progressive Data Augmentation for Few-Shot Text Classification
Xurui Li | Kaisong Song | Tianqianjin Lin | Yangyang Kang | Fubang Zhao | Changlong Sun | Xiaozhong Liu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Recently, we have witnessed the breakthroughs of meta-learning for few-shot learning scenario. Data augmentation is essential for meta-learning, particularly in situations where data is extremely scarce. However, existing text data augmentation methods can not ensure the diversity and quality of the generated data, which leads to sub-optimal performance. Inspired by the recent success of large language models (LLMs) which demonstrate improved language comprehension abilities, we propose a Meta-learning framework with Progressive Data Augmentation (PDAMeta) for few-shot text classification, which contains a two-stage data augmentation strategy. First, the prompt-based data augmentation enriches the diversity of the training instances from a global perspective. Second, the attention-based data augmentation further improves the data quality from a local perspective. Last, we propose a dual-stream contrastive meta-learning strategy to learn discriminative text representations from both original and augmented instances. Extensive experiments conducted on four public few-shot text classification datasets show that PDAMeta significantly outperforms several state-of-the-art models and shows better robustness.

2023

pdf bib
STINMatch: Semi-Supervised Semantic-Topological Iteration Network for Financial Risk Detection via News Label Diffusion
Xurui Li | Yue Qin | Rui Zhu | Tianqianjin Lin | Yongming Fan | Yangyang Kang | Kaisong Song | Fubang Zhao | Changlong Sun | Haixu Tang | Xiaozhong Liu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Commercial news provide rich semantics and timely information for automated financial risk detection. However, unaffordable large-scale annotation as well as training data sparseness barrier the full exploitation of commercial news in risk detection. To address this problem, we propose a semi-supervised Semantic-Topological Iteration Network, STINMatch, along with a news-enterprise knowledge graph (NEKG) to endorse the risk detection enhancement. The proposed model incorporates a label correlation matrix and interactive consistency regularization techniques into the iterative joint learning framework of text and graph modules. The carefully designed framework takes full advantage of the labeled and unlabeled data as well as their interrelations, enabling deep label diffusion coordination between article-level semantics and label correlations following the topological structure. Extensive experiments demonstrate the superior effectiveness and generalization ability of STINMatch.