HICode: Hierarchical Inductive Coding with LLMs

Mian Zhong; Pristina Wang; Anjalie Field

HICode: Hierarchical Inductive Coding with LLMs

Mian Zhong, Pristina Wang, Anjalie Field

Abstract

Despite numerous applications for fine-grained corpus analysis, researchers continue to rely on manual labeling, which does not scale, or statistical tools like topic modeling, which are difficult to control. We propose that LLMs have the potential to scale the nuanced analyses that researchers typically conduct manually to large text corpora. To this effect, inspired by qualitative research methods, we develop HICode, a two-part pipeline that first inductively generates labels directly from analysis data and then hierarchically clusters them to surface emergent themes. We validate this approach across three diverse datasets by measuring alignment with human-constructed themes and demonstrating its robustness through automated and human evaluations. Finally, we conduct a case study of litigation documents related to the ongoing opioid crisis in the U.S., revealing aggressive marketing strategies employed by pharmaceutical companies and demonstrating HICode’s potential for facilitating nuanced analyses in large-scale data.

Anthology ID:: 2025.emnlp-main.1580
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 31048–31066
Language:
URL:: https://aclanthology.org/2025.emnlp-main.1580/
DOI:
Bibkey:
Cite (ACL):: Mian Zhong, Pristina Wang, and Anjalie Field. 2025. HICode: Hierarchical Inductive Coding with LLMs. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 31048–31066, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: HICode: Hierarchical Inductive Coding with LLMs (Zhong et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.1580.pdf
Checklist:: 2025.emnlp-main.1580.checklist.pdf

PDF Cite Search Checklist Fix data