Simon O’Keefe


2022

pdf bib
Domain Adaptation for Arabic Crisis Response
Reem Alrashdi | Simon O’Keefe
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)

Deep learning algorithms can identify related tweets to reduce the information overload that prevents humanitarian organisations from using valuable Twitter posts. However, they rely heavily on human-labelled data, which are unavailable for emerging crises. Because each crisis has its own features, such as location, time and social media response, current models are known to suffer from generalising to unseen disaster events when pre-trained on past ones. Tweet classifiers for low-resource languages like Arabic has the additional issue of limited labelled data duplicates caused by the absence of good language resources. Thus, we propose a novel domain adaptation approach that employs distant supervision to automatically label tweets from emerging Arabic crisis events to be used to train a model along with available human-labelled data. We evaluate our work on data from seven 2018–2020 Arabic events from different crisis types (flood, explosion, virus and storm). Results show that our method outperforms self-training in identifying crisis-related tweets in real-time scenarios and can be seen as a robust Arabic tweet classifier.

2019

pdf bib
En-Ar Bilingual Word Embeddings without Word Alignment: Factors Effects
Taghreed Alqaisi | Simon O’Keefe
Proceedings of the Fourth Arabic Natural Language Processing Workshop

This paper introduces the first attempt to investigate morphological segmentation on En-Ar bilingual word embeddings using bilingual word embeddings model without word alignment (BilBOWA). We investigate the effect of sentence length and embedding size on the learning process. Our experiment shows that using the D3 segmentation scheme improves the accuracy of learning bilingual word embeddings up to 10 percentage points compared to the ATB and D0 schemes in all different training settings.