Arabic Citation Parsing using Part of Speech and Named Entity Recognition

Youssef Karout, Hadi Hammoud, Fadi Zaraket


Abstract
This paper introduces an industry level citation element extractor from Arabic text. Citation element extraction enables editorial task automation for publishing houses, creation of citation networks, and automatic citation analytics for impact analysis firms. Citation library tools help users manage their citations. However, for Arabic, these tools lack basic support to identify and extract citation elements. Consequently, researchers, editors and reviewers manually manage Arabic citations tasks. We present a novel Arabic citation element dataset, use it to train a citation element extraction model, and use named entity recognition, morphological analysis, and keyword detection to improve the results for practical use. The paper reports industry ready performance with F1 scores ranging between .80 and .95 for interesting citation elements.
Anthology ID:
2026.abjadnlp-1.33
Volume:
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
Month:
March
Year:
2026
Address:
Rabat, Morocco
Venues:
AbjadNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
245–252
Language:
URL:
https://aclanthology.org/2026.abjadnlp-1.33/
DOI:
Bibkey:
Cite (ACL):
Youssef Karout, Hadi Hammoud, and Fadi Zaraket. 2026. Arabic Citation Parsing using Part of Speech and Named Entity Recognition. In Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script, pages 245–252, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Arabic Citation Parsing using Part of Speech and Named Entity Recognition (Karout et al., AbjadNLP 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.abjadnlp-1.33.pdf