Knowledge Extraction on Semi-Structured Content: Does It Remain Relevant for Question Answering in the Era of LLMs?

Kai Sun; Yin Huang; Srishti Mehra; Mohammad Kachuee; Xilun Chen; Renjie Tao; Zhaojiang Lin; Andrea Jessee; Nirav Shah; Alex L Betty; Yue Liu; Anuj Kumar; Wen-tau Yih; Xin Luna Dong

Knowledge Extraction on Semi-Structured Content: Does It Remain Relevant for Question Answering in the Era of LLMs?

Kai Sun, Yin Huang, Srishti Mehra, Mohammad Kachuee, Xilun Chen, Renjie Tao, Zhaojiang Lin, Andrea Jessee, Nirav Shah, Alex L Betty, Yue Liu, Anuj Kumar, Wen-tau Yih, Xin Luna Dong

Abstract

The advent of Large Language Models (LLMs) has significantly advanced web-based Question Answering (QA) systems over semi-structured content, raising questions about the continued utility of knowledge extraction for question answering. This paper investigates the value of triple extraction in this new paradigm by extending an existing benchmark with knowledge extraction annotations and evaluating commercial and open-source LLMs of varying sizes. Our results show that web-scale knowledge extraction remains a challenging task for LLMs. Despite achieving high QA accuracy, LLMs can still benefit from knowledge extraction, through augmentation with extracted triples and multi-task learning. These findings provide insights into the evolving role of knowledge triple extraction in web-based QA and highlight strategies for maximizing LLM effectiveness across different model sizes and resource settings.

Anthology ID:: 2026.eacl-long.91
Volume:: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2055–2074
Language:
URL:: https://aclanthology.org/2026.eacl-long.91/
DOI:
Bibkey:
Cite (ACL):: Kai Sun, Yin Huang, Srishti Mehra, Mohammad Kachuee, Xilun Chen, Renjie Tao, Zhaojiang Lin, Andrea Jessee, Nirav Shah, Alex L Betty, Yue Liu, Anuj Kumar, Wen-tau Yih, and Xin Luna Dong. 2026. Knowledge Extraction on Semi-Structured Content: Does It Remain Relevant for Question Answering in the Era of LLMs?. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2055–2074, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Knowledge Extraction on Semi-Structured Content: Does It Remain Relevant for Question Answering in the Era of LLMs? (Sun et al., EACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.eacl-long.91.pdf
Checklist:: 2026.eacl-long.91.checklist.pdf

PDF Cite Search Checklist Fix data