Privacy Pitfalls of Online Service Terms and Conditions: a Hybrid Approach for Classification and Summarization

Emilia Lukose, Suparna De, Jon Johnson


Abstract
Verbose and complicated legal terminology in online service terms and conditions (T&C) means that users typically don’t read these documents before accepting the terms of such unilateral service contracts. With such services becoming part of mainstream digital life, highlighting Terms of Service (ToS) clauses that impact on the collection and use of user data and privacy are important concerns. Advances in text summarization can help to create informative and concise summaries of the terms, but existing approaches geared towards news and microblogging corpora are not directly applicable to the ToS domain, which is hindered by a lack of T&C-relevant resources for training and evaluation. This paper presents a ToS model, developing a hybrid extractive-classifier-abstractive pipeline that highlights the privacy and data collection/use-related sections in a ToS document and paraphrases these into concise and informative sentences. Relying on significantly less training data (4313 training pairs) than previous representative works (287,226 pairs), our model outperforms extractive baselines by at least 50% in ROUGE-1 score and 54% in METEOR score. The paper also contributes to existing community efforts by curating a dataset of online service T&C, through a developed web scraping tool.
Anthology ID:
2022.nllp-1.6
Volume:
Proceedings of the Natural Legal Language Processing Workshop 2022
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Editors:
Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goanță, Daniel Preoțiuc-Pietro
Venue:
NLLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
65–75
Language:
URL:
https://aclanthology.org/2022.nllp-1.6
DOI:
10.18653/v1/2022.nllp-1.6
Bibkey:
Cite (ACL):
Emilia Lukose, Suparna De, and Jon Johnson. 2022. Privacy Pitfalls of Online Service Terms and Conditions: a Hybrid Approach for Classification and Summarization. In Proceedings of the Natural Legal Language Processing Workshop 2022, pages 65–75, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):
Privacy Pitfalls of Online Service Terms and Conditions: a Hybrid Approach for Classification and Summarization (Lukose et al., NLLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.nllp-1.6.pdf
Video:
 https://aclanthology.org/2022.nllp-1.6.mp4