How can I access data from the Anthology?
In addition to its papers, the Anthology publishes extensive bibliographic content in a number of formats. We also maintain a Python library that provides clean programmatic access to the Anthology’s metadata and content.
Bibliographic data
Bibliographic data is available for individual papers and in bulk.
For individual papers, buttons are provided to download citation data in a number of formats, including BibTeX, MODS XML, Endnote, and an informal citation string. These formats can be downloaded as files or copied to the clipboard via convenient buttons.
For bulk downloads, we provide consolidated BibTeX files in the following variations:
- anthology+abstracts.bib.gz contains citations for all papers that exist in the Anthology, including abstracts.
- anthology.bib.gz contains all citations but removes abstracts, to save on space.
- anthology.bib is the same as the above, but provided uncompressed, for convenience.
- anthology-1.bib, anthology-2.bib etc. are sharded variants that are under 50 MB each, suitable for direct import into Overleaf repositories.
Finally, we also offer an XML paper feed, which is useful in tools like Zotero and Mendeley.
Python library
The Anthology also provides a Python library, acl-anthology, which is the preferred way to access the content inside the Anthology programmatically. This library can be easily installed via pip:
pip install acl-anthology
Documentation on the API can be found at readthedocs, and its source code in our GitHub repository.