DocIE@XLLM25: ZeroSemble - Robust and Efficient Zero-Shot Document Information Extraction with Heterogeneous Large Language Model Ensembles

Nguyen Pham Hoang Le; An Dinh Thien; Son T. Luu; Kiet Van Nguyen

doi:10.18653/v1/2025.xllm-1.25

DocIE@XLLM25: ZeroSemble - Robust and Efficient Zero-Shot Document Information Extraction with Heterogeneous Large Language Model Ensembles

Nguyen Pham Hoang Le, An Dinh Thien, Son T. Luu, Kiet Van Nguyen

Abstract

The schematization of knowledge, including the extraction of entities and relations from documents, poses significant challenges to traditional approaches because of the document’s ambiguity, heterogeneity, and high cost domain-specific training. Although Large Language Models (LLMs) allow for extraction without prior training on the dataset, the requirement of fine-tuning along with low precision, especially in relation extraction, serves as an obstacle. In absence of domain-specific training, we present a new zero-shot ensemble approach using DeepSeek-R1-Distill-Llama-70B, Llama-3.3-70B, and Qwen-2.5-32B. Our key innovation is a two-stage pipeline that first consolidates high-confidence entities through ensemble techniques, then leverages Qwen-2.5-32B with engineered prompts to generate precise semantic triples. This approach effectively resolves the low precision problem typically encountered in relation extraction. Experiments demonstrate significant gains in both accuracy and efficiency across diverse domains, with our method ranking in the top 2 on the official leaderboard in Shared Task-IV of The 1st Joint Workshop on Large Language Models and Structure Modeling. This competitive performance validates our approach as a compelling solution for practitioners seeking robust document-level information extraction without the burden of task-specific fine-tuning. Our code can be found at https://github.com/dinhthienan33/ZeroSemble.

Anthology ID:: 2025.xllm-1.25
Volume:: Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025)
Month:: August
Year:: 2025
Address:: Vienna, Austria
Editors:: Hao Fei, Kewei Tu, Yuhui Zhang, Xiang Hu, Wenjuan Han, Zixia Jia, Zilong Zheng, Yixin Cao, Meishan Zhang, Wei Lu, N. Siddharth, Lilja Øvrelid, Nianwen Xue, Yue Zhang
Venues:: XLLM | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 288–297
Language:
URL:: https://aclanthology.org/2025.xllm-1.25/
DOI:: 10.18653/v1/2025.xllm-1.25
Bibkey:
Cite (ACL):: Nguyen Pham Hoang Le, An Dinh Thien, Son T. Luu, and Kiet Van Nguyen. 2025. DocIE@XLLM25: ZeroSemble - Robust and Efficient Zero-Shot Document Information Extraction with Heterogeneous Large Language Model Ensembles. In Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025), pages 288–297, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: DocIE@XLLM25: ZeroSemble - Robust and Efficient Zero-Shot Document Information Extraction with Heterogeneous Large Language Model Ensembles (Pham Hoang Le et al., XLLM 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.xllm-1.25.pdf

PDF Cite Search Fix data