Structured Extraction of Terms and Conditions from German and English Online Shops

Tobias Schamel, Daniel Braun, Florian Matthes


Abstract
The automated analysis of Terms and Conditions has gained attention in recent years, mainly due to its relevance to consumer protection. Well-structured data sets are the base for every analysis. While content extraction, in general, is a well-researched field and many open source libraries are available, our evaluation shows, that existing solutions cannot extract Terms and Conditions in sufficient quality, mainly because of their special structure. In this paper, we present an approach to extract the content and hierarchy of Terms and Conditions from German and English online shops. Our evaluation shows, that the approach outperforms the current state of the art. A python implementation of the approach is made available under an open license.
Anthology ID:
2022.ecnlp-1.21
Volume:
Proceedings of the Fifth Workshop on e-Commerce and NLP (ECNLP 5)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Shervin Malmasi, Oleg Rokhlenko, Nicola Ueffing, Ido Guy, Eugene Agichtein, Surya Kallumadi
Venue:
ECNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
181–190
Language:
URL:
https://aclanthology.org/2022.ecnlp-1.21
DOI:
10.18653/v1/2022.ecnlp-1.21
Bibkey:
Cite (ACL):
Tobias Schamel, Daniel Braun, and Florian Matthes. 2022. Structured Extraction of Terms and Conditions from German and English Online Shops. In Proceedings of the Fifth Workshop on e-Commerce and NLP (ECNLP 5), pages 181–190, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Structured Extraction of Terms and Conditions from German and English Online Shops (Schamel et al., ECNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.ecnlp-1.21.pdf
Video:
 https://aclanthology.org/2022.ecnlp-1.21.mp4
Code
 sebischair/lowestcommonancestorextractor