Analysis of Automated Document Relevance Annotation for Information Retrieval in Oil and Gas Industry

João Vitor Mariano Correia; Murilo Missano Bell; João Vitor Robiatti Amorim; Jonas Queiroz; Daniel Pedronette; Ivan Rizzo Guilherme; Felipe Lima de Oliveira

doi:10.18653/v1/2025.emnlp-industry.132

Analysis of Automated Document Relevance Annotation for Information Retrieval in Oil and Gas Industry

João Vitor Mariano Correia, Murilo Missano Bell, João Vitor Robiatti Amorim, Jonas Queiroz, Daniel Pedronette, Ivan Rizzo Guilherme, Felipe Lima de Oliveira

Abstract

The lack of high-quality test collections challenges Information Retrieval (IR) in specialized domains. This work addresses this issue by comparing supervised classifiers against zero-shot Large Language Models (LLMs) for automated relevance annotation in the oil and gas industry, using human expert judgments as a benchmark. A supervised classifier, trained on limited expert data, outperforms LLMs, achieving an F1-score that surpasses even a second human annotator. The study also empirically confirms that LLMs are susceptible to unfairly prefer technologically similar retrieval systems. While LLMs lack precision in this context, a well-engineered classifier offers an accurate and practical path to scaling evaluation datasets within a human-in-the-loop framework that empowers, not replaces, human expertise.

Anthology ID:: 2025.emnlp-industry.132
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:: November
Year:: 2025
Address:: Suzhou (China)
Editors:: Saloni Potdar, Lina Rojas-Barahona, Sebastien Montella
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1878–1889
Language:
URL:: https://aclanthology.org/2025.emnlp-industry.132/
DOI:: 10.18653/v1/2025.emnlp-industry.132
Bibkey:
Cite (ACL):: João Vitor Mariano Correia, Murilo Missano Bell, João Vitor Robiatti Amorim, Jonas Queiroz, Daniel Pedronette, Ivan Rizzo Guilherme, and Felipe Lima de Oliveira. 2025. Analysis of Automated Document Relevance Annotation for Information Retrieval in Oil and Gas Industry. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 1878–1889, Suzhou (China). Association for Computational Linguistics.
Cite (Informal):: Analysis of Automated Document Relevance Annotation for Information Retrieval in Oil and Gas Industry (Correia et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-industry.132.pdf

PDF Cite Search Fix data