string2string: A Modern Python Library for String-to-String Algorithms

Mirac Suzgun, Stuart Shieber, Dan Jurafsky


Abstract
We introduce **string2string**, an open-source library that offers a comprehensive suite of efficient algorithms for a broad range of string-to-string problems. It includes traditional algorithmic solutions as well as recent advanced neural approaches to tackle various problems in string alignment, distance measurement, lexical and semantic search, and similarity analysis�along with several helpful visualization tools and metrics to facilitate the interpretation and analysis of these methods. Notable algorithms featured in the library include the Smith-Waterman algorithm for pairwise local alignment, the Hirschberg algorithm for global alignment, the Wagner-Fischer algorithm for edit distance, BARTScore and BERTScore for similarity analysis, the Knuth-Morris-Pratt algorithm for lexical search, and Faiss for semantic search. In addition, it wraps existing efficient and widely-used implementations of certain frameworks and metrics, such as sacreBLEU and ROUGE. Overall, the library aims to provide extensive coverage and increased flexibility in comparison to existing libraries for strings. It can be used for many downstream applications, tasks, and problems in natural-language processing, bioinformatics, and computational social sciences. It is implemented in Python, easily installable via pip, and accessible through a simple API. Source code, documentation, and tutorials are all available on our GitHub page: https://github.com/stanfordnlp/string2string* Documentation: https://string2string.readthedocs.io/en/latest/* GitHub page: https://github.com/stanfordnlp/string2string* Short video: https://drive.google.com/file/d/1IT-pBACDVUoEHewk__5Pz5mU5oAMq5k_/view?usp=sharing
Anthology ID:
2024.luhme-demos.26
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Yixin Cao, Yang Feng, Deyi Xiong
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
278–285
Language:
URL:
https://aclanthology.org/2024.luhme-demos.26/
DOI:
10.18653/v1/2024.acl-demos.26
Bibkey:
Cite (ACL):
Mirac Suzgun, Stuart Shieber, and Dan Jurafsky. 2024. string2string: A Modern Python Library for String-to-String Algorithms. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 278–285, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
string2string: A Modern Python Library for String-to-String Algorithms (Suzgun et al., ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-demos.26.pdf