AraSAS: The Open Source Arabic Semantic Tagger

Mahmoud El-Haj, Elvis de Souza, Nouran Khallaf, Paul Rayson, Nizar Habash


Abstract
This paper presents (AraSAS) the first open-source Arabic semantic analysis tagging system. AraSAS is a software framework that provides full semantic tagging of text written in Arabic. AraSAS is based on the UCREL Semantic Analysis System (USAS) which was first developed to semantically tag English text. Similarly to USAS, AraSAS uses a hierarchical semantic tag set that contains 21 major discourse fields and 232 fine-grained semantic field tags. The paper describes the creation, validation and evaluation of AraSAS. In addition, we demonstrate a first case study to illustrate the affordances of applying USAS and AraSAS semantic taggers on the Zayed University Arabic-English Bilingual Undergraduate Corpus (ZAEBUC) (Palfreyman and Habash, 2022), where we show and compare the coverage of the two semantic taggers through running them on Arabic and English essays on different topics. The analysis expands to compare the taggers when run on texts in Arabic and English written by the same writer and texts written by male and by female students. Variables for comparison include frequency of use of particular semantic sub-domains, as well as the diversity of semantic elements within a text.
Anthology ID:
2022.osact-1.3
Volume:
Proceedinsg of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur'an QA and Fine-Grained Hate Speech Detection
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Hend Al-Khalifa, Tamer Elsayed, Hamdy Mubarak, Abdulmohsen Al-Thubaity, Walid Magdy, Kareem Darwish
Venue:
OSACT
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
23–31
Language:
URL:
https://aclanthology.org/2022.osact-1.3
DOI:
Bibkey:
Cite (ACL):
Mahmoud El-Haj, Elvis de Souza, Nouran Khallaf, Paul Rayson, and Nizar Habash. 2022. AraSAS: The Open Source Arabic Semantic Tagger. In Proceedinsg of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur'an QA and Fine-Grained Hate Speech Detection, pages 23–31, Marseille, France. European Language Resources Association.
Cite (Informal):
AraSAS: The Open Source Arabic Semantic Tagger (El-Haj et al., OSACT 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.osact-1.3.pdf