Nine Ways to Break Copyright Law and Why Our LLM Won’t: A Fair Use Aligned Generation Framework

Aakash Sen Sharma, Debdeep Sanyal, Priyansh Srivastava, Sundar Athreya H, Shirish Karande, Mohan Kankanhalli, Murari Mandal


Abstract
Large language models (LLMs) commonly risk copyright infringement by reproducing protected content verbatim or with insufficient transformative modifications, posing significant ethical, legal, and practical concerns. Current inference-time safeguards predominantly rely on restrictive refusal-based filters, often compromising the practical utility of these models. To address this, we collaborated closely with intellectual property experts to develop LAW-LM (Legally Aware Language Model), a legally-grounded framework explicitly designed to align LLM outputs with fair-use doctrine. Central to our method is FairUseDB, a carefully constructed dataset containing 18,000 expert-validated examples covering nine realistic infringement scenarios. Leveraging this dataset, we apply Direct Preference Optimization (DPO) to fine-tune open-source LLMs, encouraging them to produce legally compliant and practically useful alternatives rather than resorting to blunt refusal. Recognizing the shortcomings of traditional evaluation metrics, we propose new measures: Weighted Penalty Utility and Compliance Aware Harmonic Mean (CAH) to balance infringement risk against response utility. Extensive quantitative experiments coupled with expert evaluations confirm that LAW-LM substantially reduces problematic outputs compared to state-of-the-art approaches, while preserving real-world usability.
Anthology ID:
2025.findings-emnlp.423
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7993–8023
Language:
URL:
https://aclanthology.org/2025.findings-emnlp.423/
DOI:
Bibkey:
Cite (ACL):
Aakash Sen Sharma, Debdeep Sanyal, Priyansh Srivastava, Sundar Athreya H, Shirish Karande, Mohan Kankanhalli, and Murari Mandal. 2025. Nine Ways to Break Copyright Law and Why Our LLM Won’t: A Fair Use Aligned Generation Framework. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 7993–8023, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Nine Ways to Break Copyright Law and Why Our LLM Won’t: A Fair Use Aligned Generation Framework (Sharma et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-emnlp.423.pdf
Checklist:
 2025.findings-emnlp.423.checklist.pdf