AAbAAC: An Annotated Corpus for Autoimmunity Information Extraction

Fabien Maury; Solène Grosdidier; Maud De Dieuleveult; Adrien Coulet

AAbAAC: An Annotated Corpus for Autoimmunity Information Extraction

Fabien Maury, Solène Grosdidier, Maud De Dieuleveult, Adrien Coulet

Abstract

Despite advances in information extraction driven by deep learning and large language models, performance gaps remain in highly specialized biomedical fields, where domain-specific complexity poses challenges for generalist models.In this work, we focus on the domain of autoimmunity where the main entities of interest are autoimmune diseases, autoantibodies (i.e. molecules that may mark or cause these diseases), their molecular targets, their location in the body, and the associated clinical signs. Herein, we present AAbAAC (AutoAntibodies and Autoimmunity Annotated Corpus), a corpus of 115 abstracts selected from PubMed that we manually annotated for those entities and their relationships. First, AAbAAC was used to evaluate several methods on the task of named entity recognition (NER), and second, to fine-tune NER models. Our study demonstrates the utility of AAbAAC for information extraction in the domain of autoimmunity, showing expected improvement in NER performance after fine-tuning. This illustrates the value of small-scale annotation efforts for specialized domains and contributes to the computational study of autoimmunity. The AAbAAC corpus is available at: https://github.com/f-maury/AAbAAC .

Anthology ID:: 2026.bionlp-1.65
Volume:: BioNLP 2026
Month:: July
Year:: 2026
Address:: San Diego, California
Editors:: Dina Demner-Fushman, Sophia Ananiadou, Kirk Roberts, Junichi Tsujii
Venues:: BioNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 791–800
Language:
URL:: https://aclanthology.org/2026.bionlp-1.65/
DOI:
Bibkey:
Cite (ACL):: Fabien Maury, Solène Grosdidier, Maud De Dieuleveult, and Adrien Coulet. 2026. AAbAAC: An Annotated Corpus for Autoimmunity Information Extraction. In BioNLP 2026, pages 791–800, San Diego, California. Association for Computational Linguistics.
Cite (Informal):: AAbAAC: An Annotated Corpus for Autoimmunity Information Extraction (Maury et al., BioNLP 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.bionlp-1.65.pdf

PDF Cite Search Fix data