Pasquale Balsebre


2024

pdf bib
StructAM: Enhancing Address Matching through Semantic Understanding of Structure-aware Information
Zhaoqi Zhang | Pasquale Balsebre | Siqiang Luo | Zhen Hai | Jiangping Huang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The task of address matching involves linking unstructured addresses to standard ones in a database. The challenges presented by this task are manifold: misspellings, incomplete information, and variations in address content are some examples. While there have been previous studies on entity matching in natural language processing, for the address matching solution, existing approaches still rely on string-based similarity matching or manually-designed rules. In this paper, we propose StructAM, a novel method based on pre-trained language models (LMs) and graph neural networks to extract the textual and structured information of the addresses. The proposed method leverages the knowledge acquired by large language models during the pre-training phase, and refines it during the fine-tuning process on the address domain, to obtain address-specific semantic features. Meanwhile, it also applies an attribute attention mechanism based on Graph Sampling and Aggregation (GraphSAGE) module to capture internal hierarchy information of the address text. To further enhance the accuracy of our algorithm in dirty settings, we incorporate spatial coordinates and contextual information from the surrounding area as auxiliary guidance. We conduct extensive experiments on real-world datasets from four different countries and the results show that StructAM outperforms state-of-the-art baseline approaches for address matching.