Siqiang Luo

2025

pdf bib abs
Permitted Knowledge Boundary: Evaluating the Knowledge-Constrained Responsiveness of Large Language Models
Wenrui Bao | Kai Wang | Siqiang Luo | Xiang Li
Findings of the Association for Computational Linguistics: EMNLP 2025

With the advancement of large language models (LLMs), recent research has raised concerns about their controllability.. In this paper, we argue for the importance of Knowledge-Constrained Responsiveness (KCR), ensuring that LLMs comply with human-defined constraints. However, KCR is an implicit and unobservable capability of LLMs, functioning as a black box that currently eludes quantitative assessment. To address this issue, we first introduce the definition of “permitted boundary” and define the “boundary bias” to depict KCR. We propose six metrics to quantify the boundary bias of LLMs and subsequently assess the KCR. Furthermore, we establish a benchmark with two new datasets, KCR-SimpleQA and KCR-WebNLG, to evaluate the performance of LLMs. Our extensive experiments show that several tested LLMs still struggle to varying degrees when adhering to constraints, especially without the corresponding knowledge.

2024

pdf bib abs
LLM as Prompter: Low-resource Inductive Reasoning on Arbitrary Knowledge Graphs
Kai Wang | Yuwei Xu | Zhiyong Wu | Siqiang Luo
Findings of the Association for Computational Linguistics: ACL 2024

Knowledge Graph (KG) inductive reasoning, which aims to infer missing facts from new KGs that are not seen during training, has been widely adopted in various applications. One critical challenge of KG inductive reasoning is handling low-resource scenarios with scarcity in both textual and structural aspects. In this paper, we attempt to address this challenge with Large Language Models (LLMs). Particularly, we utilize the state-of-the-art LLMs to generate a graph-structural prompt to enhance the pre-trained Graph Neural Networks (GNNs), which brings us new methodological insights into the KG inductive reasoning methods, as well as high generalizability in practice. On the methodological side, we introduce a novel pretraining and prompting framework ProLINK, designed for low-resource inductive reasoning across arbitrary KGs without requiring additional training. On the practical side, we experimentally evaluate our approach on 36 low-resource KG datasets and find that ProLINK outperforms previous methods in three-shot, one-shot, and zero-shot reasoning tasks, exhibiting average performance improvements by 20%, 45%, and 147%, respectively. Furthermore, ProLINK demonstrates strong robustness for various LLM promptings as well as full-shot scenarios.

pdf bib abs
StructAM: Enhancing Address Matching through Semantic Understanding of Structure-aware Information
Zhaoqi Zhang | Pasquale Balsebre | Siqiang Luo | Zhen Hai | Jiangping Huang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The task of address matching involves linking unstructured addresses to standard ones in a database. The challenges presented by this task are manifold: misspellings, incomplete information, and variations in address content are some examples. While there have been previous studies on entity matching in natural language processing, for the address matching solution, existing approaches still rely on string-based similarity matching or manually-designed rules. In this paper, we propose StructAM, a novel method based on pre-trained language models (LMs) and graph neural networks to extract the textual and structured information of the addresses. The proposed method leverages the knowledge acquired by large language models during the pre-training phase, and refines it during the fine-tuning process on the address domain, to obtain address-specific semantic features. Meanwhile, it also applies an attribute attention mechanism based on Graph Sampling and Aggregation (GraphSAGE) module to capture internal hierarchy information of the address text. To further enhance the accuracy of our algorithm in dirty settings, we incorporate spatial coordinates and contextual information from the surrounding area as auxiliary guidance. We conduct extensive experiments on real-world datasets from four different countries and the results show that StructAM outperforms state-of-the-art baseline approaches for address matching.

Co-authors

Venues

Fix author