APPSI-139: A Parallel Corpus of English Application Privacy Policy Summarization and Interpretation

Pengyun Zhu; Qiheng Sun; Long Wen; Yanbo Wang; Yang Cao; Junxu Liu; Deyi Xiong (德意 熊); Jinfei Liu; Zhibo Wang; Kui Ren

APPSI-139: A Parallel Corpus of English Application Privacy Policy Summarization and Interpretation

Pengyun Zhu, Qiheng Sun, Long Wen, Yanbo Wang, Yang Cao, Junxu Liu, Deyi Xiong, Jinfei Liu, Zhibo Wang, Kui Ren

Abstract

Privacy policies are essential for users to understand how service providers handle their personal data. However, these documents are often long and complex, as well as filled with technobabble and legalese, causing users to unknowingly accept terms that may even contradict the law. While summarizing and interpreting these privacy policies is crucial, there is a lack of high-quality English parallel corpus optimized for legal clarity and readability. To address this issue, we introduce APPSI-139, a high-quality English privacy policy corpus meticulously annotated by domain experts, specifically designed for summarization and interpretation tasks. The corpus includes 139 English privacy policies, 15,692 rewritten parallel corpora, and 36,351 fine-grained annotation labels across 11 data practice categories. Concurrently, we propose TCSI-pp-V2, a hybrid privacy policy summarization and interpretation framework that employs an alternating training strategy and coordinates multiple expert modules to effectively balance computational efficiency and accuracy. Experimental results show that the hybrid summarization system built on APPSI-139 corpus and the TCSI-pp-V2 framework outperform large language models, such as GPT-4o and LLaMA-3-70B, in terms of readability and reliability. The source code and dataset are available at https://github.com/EnlightenedAI/APPSI-139.

Anthology ID:: 2026.acl-long.168
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3681–3706
Language:
URL:: https://aclanthology.org/2026.acl-long.168/
DOI:
Bibkey:
Cite (ACL):: Pengyun Zhu, Qiheng Sun, Long Wen, Yanbo Wang, Yang Cao, Junxu Liu, Deyi Xiong, Jinfei Liu, Zhibo Wang, and Kui Ren. 2026. APPSI-139: A Parallel Corpus of English Application Privacy Policy Summarization and Interpretation. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3681–3706, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: APPSI-139: A Parallel Corpus of English Application Privacy Policy Summarization and Interpretation (Zhu et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.168.pdf
Checklist:: 2026.acl-long.168.checklist.pdf

PDF Cite Search Checklist Fix data