Youssef Karout

2026

Arabic Citation Parsing using Part of Speech and Named Entity Recognition
Youssef Karout | Hadi Hammoud | Fadi Zaraket
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script

This paper introduces an industry level citation element extractor from Arabic text. Citation element extraction enables editorial task automation for publishing houses, creation of citation networks, and automatic citation analytics for impact analysis firms. Citation library tools help users manage their citations. However, for Arabic, these tools lack basic support to identify and extract citation elements. Consequently, researchers, editors and reviewers manually manage Arabic citations tasks. We present a novel Arabic citation element dataset, use it to train a citation element extraction model, and use named entity recognition, morphological analysis, and keyword detection to improve the results for practical use. The paper reports industry ready performance with F1 scores ranging between .80 and .95 for interesting citation elements.

Co-authors

Hadi Hammoud 1
Fadi A. Zaraket 1

Venues

AbjadNLP1
WS1

Fix author