Machine Extraction of Tax Laws from Legislative Texts

Elliott Ash, Malka Guillot, Luyang Han


Abstract
Using a corpus of compiled codes from U.S. states containing labeled tax law sections, we train text classifiers to automatically tag tax-law documents and, further, to identify the associated revenue source (e.g. income, property, or sales). After evaluating classifier performance in held-out test data, we apply them to an historical corpus of U.S. state legislation to extract the flow of relevant laws over the years 1910 through 2010. We document that the classifiers are effective in the historical corpus, for example by automatically detecting establishments of state personal income taxes. The trained models with replication code are published at https://github.com/luyang521/tax-classification.
Anthology ID:
2021.nllp-1.7
Volume:
Proceedings of the Natural Legal Language Processing Workshop 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Venue:
NLLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
76–85
Language:
URL:
https://aclanthology.org/2021.nllp-1.7
DOI:
10.18653/v1/2021.nllp-1.7
Bibkey:
Cite (ACL):
Elliott Ash, Malka Guillot, and Luyang Han. 2021. Machine Extraction of Tax Laws from Legislative Texts. In Proceedings of the Natural Legal Language Processing Workshop 2021, pages 76–85, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Machine Extraction of Tax Laws from Legislative Texts (Ash et al., NLLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.nllp-1.7.pdf
Software:
 2021.nllp-1.7.Software.zip
Code
 luyang521/tax-classification