Alexandra O’Neil

Also published as: Alexandra O’neil


2023

pdf bib
Comparing methods of orthographic conversion for Bàsàá, a language of Cameroon
Alexandra O’neil | Daniel Swanson | Robert Pugh | Francis Tyers | Emmanuel Ngue Um
Proceedings of the Fourth workshop on Resources for African Indigenous Languages (RAIL 2023)

Orthographical standardization is a milestone in a language’s documentation and the development of its resources. However, texts written in former orthographies remain relevant to the language’s history and development and therefore must be converted to the standardized orthography. Ensuring a language has access to the orthographically standardized version of all of its recorded texts is important in the development of resources as it provides additional textual resources for training, supports contribution of authors using former writing systems, and provides information about the development of the language. This paper evaluates the performance of natural language processing methods, specifically Finite State Transducers and Long Short-term Memory networks, for the orthographical conversion of Bàsàá texts from the Protestant missionary orthography to the now-standard AGLC orthography, with the conclusion that LSTMs are somewhat more effective in the absence of explicit lexical information.

2021

pdf bib
On the Interaction between Annotation Quality and Classifier Performance in Abusive Language Detection
Holly Lopez Long | Alexandra O’Neil | Sandra Kübler
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Abusive language detection has become an important tool for the cultivation of safe online platforms. We investigate the interaction of annotation quality and classifier performance. We use a new, fine-grained annotation scheme that allows us to distinguish between abusive language and colloquial uses of profanity that are not meant to harm. Our results show a tendency of crowd workers to overuse the abusive class, which creates an unrealistic class balance and affects classification accuracy. We also investigate different methods of distinguishing between explicit and implicit abuse and show lexicon-based approaches either over- or under-estimate the proportion of explicit abuse in data sets.