Hunter Heidenreich
2025
Page Stream Segmentation with LLMs: Challenges and Applications in Insurance Document Automation
Hunter Heidenreich
|
Ratish Dalvi
|
Nikhil Verma
|
Yosheb Getachew
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
Page Stream Segmentation (PSS) is critical for automating document processing in industries like insurance, where unstructured document collections are common. This paper explores the use of large language models (LLMs) for PSS, applying parameter-efficient fine-tuning to real-world insurance data. Our experiments show that LLMs outperform baseline models in page- and stream-level segmentation accuracy. However, stream-level calibration remains challenging, especially for high-stakes applications. We evaluate post-hoc calibration and Monte Carlo dropout, finding limited improvement. Future work will integrate active learning to enhance model calibration and support deployment in practical settings.
2019
Latent semantic network induction in the context of linked example senses
Hunter Heidenreich
|
Jake Williams
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)
The Princeton WordNet is a powerful tool for studying language and developing natural language processing algorithms. With significant work developing it further, one line considers its extension through aligning its expert-annotated structure with other lexical resources. In contrast, this work explores a completely data-driven approach to network construction, forming a wordnet using the entirety of the open-source, noisy, user-annotated dictionary, Wiktionary. Comparing baselines to WordNet, we find compelling evidence that our network induction process constructs a network with useful semantic structure. With thousands of semantically-linked examples that demonstrate sense usage from basic lemmas to multiword expressions (MWEs), we believe this work motivates future research.