Context-Gloss Augmentation for Improving Arabic Target Sense Verification

Sanad Malaysha; Mustafa Jarrar; Mohammed Khalilia

doi:10.18653/v1/2023.gwc-1.31

Context-Gloss Augmentation for Improving Arabic Target Sense Verification

Sanad Malaysha, Mustafa Jarrar, Mohammed Khalilia

Abstract

Arabic language lacks semantic datasets and sense inventories. The most common semantically-labeled dataset for Arabic is the ArabGlossBERT, a relatively small dataset that consists of 167K context-gloss pairs (about 60K positive and 107K negative pairs), collected from Arabic dictionaries. This paper presents an enrichment to the ArabGlossBERT dataset, by augmenting it using (Arabic-English-Arabic) machine back-translation. Augmentation increased the dataset size to 352K pairs (149K positive and 203K negative pairs). We measure the impact of augmentation using different data configurations to fine-tune BERT on target sense verification (TSV) task. Overall, the accuracy ranges between 78% to 84% for different data configurations. Although our approach performed at par with the baseline, we did observe some improvements for some POS tags in some experiments. Furthermore, our fine-tuned models are trained on a larger dataset covering larger vocabulary and contexts. We provide an in-depth analysis of the accuracy for each part-of-speech (POS).

Anthology ID:: 2023.gwc-1.31
Volume:: Proceedings of the 12th Global Wordnet Conference
Month:: January
Year:: 2023
Address:: University of the Basque Country, Donostia - San Sebastian, Basque Country
Editors:: German Rigau, Francis Bond, Alexandre Rademaker
Venue:: GWC
SIG:: SIGLEX
Publisher:: Global Wordnet Association
Note:
Pages:: 254–262
Language:
URL:: https://aclanthology.org/2023.gwc-1.31/
DOI:: 10.18653/v1/2023.gwc-1.31
Bibkey:
Cite (ACL):: Sanad Malaysha, Mustafa Jarrar, and Mohammed Khalilia. 2023. Context-Gloss Augmentation for Improving Arabic Target Sense Verification. In Proceedings of the 12th Global Wordnet Conference, pages 254–262, University of the Basque Country, Donostia - San Sebastian, Basque Country. Global Wordnet Association.
Cite (Informal):: Context-Gloss Augmentation for Improving Arabic Target Sense Verification (Malaysha et al., GWC 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.gwc-1.31.pdf

PDF Cite Search Fix data