DeezyMatch: A Flexible Deep Learning Approach to Fuzzy String Matching

Kasra Hosseini, Federico Nanni, Mariona Coll Ardanuy


Abstract
We present DeezyMatch, a free, open-source software library written in Python for fuzzy string matching and candidate ranking. Its pair classifier supports various deep neural network architectures for training new classifiers and for fine-tuning a pretrained model, which paves the way for transfer learning in fuzzy string matching. This approach is especially useful where only limited training examples are available. The learned DeezyMatch models can be used to generate rich vector representations from string inputs. The candidate ranker component in DeezyMatch uses these vector representations to find, for a given query, the best matching candidates in a knowledge base. It uses an adaptive searching algorithm applicable to large knowledge bases and query sets. We describe DeezyMatch’s functionality, design and implementation, accompanied by a use case in toponym matching and candidate ranking in realistic noisy datasets.
Anthology ID:
2020.emnlp-demos.9
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Month:
October
Year:
2020
Address:
Online
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
62–69
Language:
URL:
https://aclanthology.org/2020.emnlp-demos.9
DOI:
10.18653/v1/2020.emnlp-demos.9
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2020.emnlp-demos.9.pdf
Code
 Living-with-machines/DeezyMatch