Fathom: A Fast and Modular RAG Pipeline for Fact-Checking

Farrukh Bin Rashid; Saqib Hakak

doi:10.18653/v1/2025.fever-1.20

Fathom: A Fast and Modular RAG Pipeline for Fact-Checking

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

We present Fathom, a Retrieval-Augmented Generation (RAG) pipeline for automated fact-checking, built entirely using lightweight open-source language models. The system begins with HyDE-style question generation to expand the context around each claim, followed by a dual-stage retrieval process using BM25 and semantic similarity to gather relevant evidence. Finally, a lightweight LLM performs veracity prediction, producing both a verdict and supporting rationale. Despite relying on smaller models, our system achieved an AVeriTeC score of 0.2043 on the test set, a 0.99% absolute improvement over the baseline and 0.378 on the dev set, marking a 27.7% absolute improvement.

Anthology ID:: 2025.fever-1.20
Volume:: Proceedings of the Eighth Fact Extraction and VERification Workshop (FEVER)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Mubashara Akhtar, Rami Aly, Christos Christodoulopoulos, Oana Cocarascu, Zhijiang Guo, Arpit Mittal, Michael Schlichtkrull, James Thorne, Andreas Vlachos
Venues:: FEVER | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 258–265
Language:
URL:: https://aclanthology.org/2025.fever-1.20/
DOI:: 10.18653/v1/2025.fever-1.20
Bibkey:
Cite (ACL):: Farrukh Bin Rashid and Saqib Hakak. 2025. Fathom: A Fast and Modular RAG Pipeline for Fact-Checking. In Proceedings of the Eighth Fact Extraction and VERification Workshop (FEVER), pages 258–265, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Fathom: A Fast and Modular RAG Pipeline for Fact-Checking (Rashid & Hakak, FEVER 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.fever-1.20.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{rashid-hakak-2025-fathom,
    title = "Fathom: A Fast and Modular {RAG} Pipeline for Fact-Checking",
    author = "Rashid, Farrukh Bin  and
      Hakak, Saqib",
    editor = "Akhtar, Mubashara  and
      Aly, Rami  and
      Christodoulopoulos, Christos  and
      Cocarascu, Oana  and
      Guo, Zhijiang  and
      Mittal, Arpit  and
      Schlichtkrull, Michael  and
      Thorne, James  and
      Vlachos, Andreas",
    booktitle = "Proceedings of the Eighth Fact Extraction and VERification Workshop (FEVER)",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.fever-1.20/",
    doi = "10.18653/v1/2025.fever-1.20",
    pages = "258--265",
    ISBN = "978-1-959429-53-1",
    abstract = "We present Fathom, a Retrieval-Augmented Generation (RAG) pipeline for automated fact-checking, built entirely using lightweight open-source language models. The system begins with HyDE-style question generation to expand the context around each claim, followed by a dual-stage retrieval process using BM25 and semantic similarity to gather relevant evidence. Finally, a lightweight LLM performs veracity prediction, producing both a verdict and supporting rationale. Despite relying on smaller models, our system achieved an AVeriTeC score of 0.2043 on the test set, a 0.99{\%} absolute improvement over the baseline and 0.378 on the dev set, marking a 27.7{\%} absolute improvement."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="rashid-hakak-2025-fathom">
    <titleInfo>
        <title>Fathom: A Fast and Modular RAG Pipeline for Fact-Checking</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Farrukh</namePart>
        <namePart type="given">Bin</namePart>
        <namePart type="family">Rashid</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Saqib</namePart>
        <namePart type="family">Hakak</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2025-07</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the Eighth Fact Extraction and VERification Workshop (FEVER)</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Mubashara</namePart>
            <namePart type="family">Akhtar</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Rami</namePart>
            <namePart type="family">Aly</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Christos</namePart>
            <namePart type="family">Christodoulopoulos</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Oana</namePart>
            <namePart type="family">Cocarascu</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Zhijiang</namePart>
            <namePart type="family">Guo</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Arpit</namePart>
            <namePart type="family">Mittal</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Michael</namePart>
            <namePart type="family">Schlichtkrull</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">James</namePart>
            <namePart type="family">Thorne</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Andreas</namePart>
            <namePart type="family">Vlachos</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Vienna, Austria</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
        <identifier type="isbn">978-1-959429-53-1</identifier>
    </relatedItem>
    <abstract>We present Fathom, a Retrieval-Augmented Generation (RAG) pipeline for automated fact-checking, built entirely using lightweight open-source language models. The system begins with HyDE-style question generation to expand the context around each claim, followed by a dual-stage retrieval process using BM25 and semantic similarity to gather relevant evidence. Finally, a lightweight LLM performs veracity prediction, producing both a verdict and supporting rationale. Despite relying on smaller models, our system achieved an AVeriTeC score of 0.2043 on the test set, a 0.99% absolute improvement over the baseline and 0.378 on the dev set, marking a 27.7% absolute improvement.</abstract>
    <identifier type="citekey">rashid-hakak-2025-fathom</identifier>
    <identifier type="doi">10.18653/v1/2025.fever-1.20</identifier>
    <location>
        <url>https://aclanthology.org/2025.fever-1.20/</url>
    </location>
    <part>
        <date>2025-07</date>
        <extent unit="page">
            <start>258</start>
            <end>265</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Fathom: A Fast and Modular RAG Pipeline for Fact-Checking
%A Rashid, Farrukh Bin
%A Hakak, Saqib
%Y Akhtar, Mubashara
%Y Aly, Rami
%Y Christodoulopoulos, Christos
%Y Cocarascu, Oana
%Y Guo, Zhijiang
%Y Mittal, Arpit
%Y Schlichtkrull, Michael
%Y Thorne, James
%Y Vlachos, Andreas
%S Proceedings of the Eighth Fact Extraction and VERification Workshop (FEVER)
%D 2025
%8 July
%I Association for Computational Linguistics
%C Vienna, Austria
%@ 978-1-959429-53-1
%F rashid-hakak-2025-fathom
%X We present Fathom, a Retrieval-Augmented Generation (RAG) pipeline for automated fact-checking, built entirely using lightweight open-source language models. The system begins with HyDE-style question generation to expand the context around each claim, followed by a dual-stage retrieval process using BM25 and semantic similarity to gather relevant evidence. Finally, a lightweight LLM performs veracity prediction, producing both a verdict and supporting rationale. Despite relying on smaller models, our system achieved an AVeriTeC score of 0.2043 on the test set, a 0.99% absolute improvement over the baseline and 0.378 on the dev set, marking a 27.7% absolute improvement.
%R 10.18653/v1/2025.fever-1.20
%U https://aclanthology.org/2025.fever-1.20/
%U https://doi.org/10.18653/v1/2025.fever-1.20
%P 258-265

Download as File

Markdown (Informal)

[Fathom: A Fast and Modular RAG Pipeline for Fact-Checking](https://aclanthology.org/2025.fever-1.20/) (Rashid & Hakak, FEVER 2025)

Fathom: A Fast and Modular RAG Pipeline for Fact-Checking (Rashid & Hakak, FEVER 2025)

ACL

Farrukh Bin Rashid and Saqib Hakak. 2025. Fathom: A Fast and Modular RAG Pipeline for Fact-Checking. In Proceedings of the Eighth Fact Extraction and VERification Workshop (FEVER), pages 258–265, Vienna, Austria. Association for Computational Linguistics.