Anietie Andy

2025

Ibom NLP: A Step Toward Inclusive Natural Language Processing for Nigeria’s Minority Languages
Oluwadara Kalejaiye | Luel Hagos Beyene | David Ifeoluwa Adelani | Mmekut-mfon Gabriel Edet | Aniefon Daniel Akpan | Eno-Abasi Urua | Anietie Andy
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

Nigeria is the most populous country in Africa with a population of more than 200 million people. More than 500 languages are spoken in Nigeria and it is one of the most linguistically diverse countries in the world. Despite this, natural language processing (NLP) research has mostly focused on the following four languages: Hausa, Igbo, Nigerian-Pidgin, and Yoruba (i.e <1% of the languages spoken in Nigeria). This is in part due to the unavailability of textual data in these languages to train and apply NLP algorithms. In this work, we introduce Ibom—a dataset for machine translation and topic classification in four Coastal Nigerian languages from the Akwa Ibom State region: Anaang, Efik, Ibibio, and Oro. These languages are not represented in Google Translate or in major benchmarks such as Flores-200 or SIB-200. We focus on extending Flores-200 benchmark to these languages, and further align the translated texts with topic labels based on SIB-200 classification dataset. Our evaluation shows that current LLMs perform poorly on machine translation for these languages in both zero-and-few shot settings. However, we find the few-shot samples to steadily improve topic classification with more shots.

2024

Low-resource languages often face challenges in acquiring high-quality language data due to the reliance on translation-based methods, which can introduce the translationese effect. This phenomenon results in translated sentences that lack fluency and naturalness in the target language. In this paper, we propose a novel approach for data collection by leveraging storyboards to elicit more fluent and natural sentences. Our method involves presenting native speakers with visual stimuli in the form of storyboards and collecting their descriptions without direct exposure to the source text. We conducted a comprehensive evaluation comparing our storyboard-based approach with traditional text translation-based methods in terms of accuracy and fluency. Human annotators and quantitative metrics were used to assess translation quality. The results indicate a preference for text translation in terms of accuracy, while our method demonstrates worse accuracy but better fluency in the language focused.

2022

pdf bib abs

A Survey of Machine Translation Tasks on Nigerian Languages
Ebelechukwu Nwafor | Anietie Andy
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Machine translation is an active area of research that has received a significant amount of attention over the past decade. With the advent of deep learning models, the translation of several languages has been performed with high accuracy and precision. In spite of the development in machine translation techniques, there is very limited work focused on translating low-resource African languages, particularly Nigerian languages. Nigeria is one of the most populous countries in Africa with diverse language and ethnic groups. In this paper, we survey the current state of the art of machine translation research on Nigerian languages with a major emphasis on neural machine translation techniques. We outline the limitations of research in machine translation on Nigerian languages and propose future directions in increasing research and participation.

pdf bib abs

Did that happen? Predicting Social Media Posts that are Indicative of what happened in a scene: A case study of a TV show
Anietie Andy | Reno Kriz | Sharath Chandra Guntuku | Derry Tanti Wijaya | Chris Callison-Burch
Proceedings of the Thirteenth Language Resources and Evaluation Conference

While popular Television (TV) shows are airing, some users interested in these shows publish social media posts about the show. Analyzing social media posts related to a TV show can be beneficial for gaining insights about what happened during scenes of the show. This is a challenging task partly because a significant number of social media posts associated with a TV show or event may not clearly describe what happened during the event. In this work, we propose a method to predict social media posts (associated with scenes of a TV show) that are indicative of what transpired during the scenes of the show. We evaluate our method on social media (Twitter) posts associated with an episode of a popular TV show, Game of Thrones. We show that for each of the identified scenes, with high AUC’s, our method can predict posts that are indicative of what happened in a scene from those that are not-indicative. Based on Twitters policy, we will make the Tweeter ID’s of the Twitter posts used for this work publicly available.

2021

pdf bib abs

Understanding Social Support Expressed in a COVID-19 Online Forum
Anietie Andy | Brian Chu | Ramie Fathy | Barrington Bennett | Daniel Stokes | Sharath Chandra Guntuku
Proceedings of the 12th International Workshop on Health Text Mining and Information Analysis

In online forums focused on health and wellbeing, individuals tend to seek and give the following social support: emotional and informational support. Understanding the expressions of these social supports in an online COVID- 19 forum is important for: (a) the forum and its members to provide the right type of support to individuals and (b) determining the long term effects of the COVID-19 pandemic on the well-being of the public, thereby informing interventions. In this work, we build four machine learning models to measure the extent of the following social supports expressed in each post in a COVID-19 online forum: (a) emotional support given (b) emotional support sought (c) informational support given, and (d) informational support sought. Using these models, we aim to: (i) determine if there is a correlation between the different social supports expressed in posts e.g. when members of the forum give emotional support in posts, do they also tend to give or seek informational support in the same post? (ii) determine how these social supports sought and given changes over time in published posts. We find that (i) there is a positive correlation between the informational support given in posts and the emotional support given and emotional support sought, respectively, in these posts and (ii) over time, users tended to seek more emotional support and give less emotional support.

2020

pdf bib abs

Does Social Support (Expressed in Post Titles) Elicit Comments in Online Substance Use Recovery Forums?
Anietie Andy | Sharath Chandra Guntuku
Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science

Individuals recovering from substance use often seek social support (emotional and informational) on online recovery forums, where they can both write and comment on posts, expressing their struggles and successes. A common challenge in these forums is that certain posts (some of which may be support seeking) receive no comments. In this work, we use data from two Reddit substance recovery forums: /r/Leaves and /r/OpiatesRecovery, to determine the relationship between the social supports expressed in the titles of posts and the number of comments they receive. We show that the types of social support expressed in post titles that elicit comments vary from one substance use recovery forum to the other.

pdf bib abs

Resolving Pronouns in Twitter Streams: Context can Help!
Anietie Andy | Chris Callison-Burch | Derry Tanti Wijaya
Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference

Many people live-tweet televised events like Presidential debates and popular TV-shows and discuss people or characters in the event. Naturally, many tweets make pronominal reference to these people/characters. We propose an algorithm for resolving personal pronouns that make reference to people involved in an event, in tweet streams collected during the event.

2019

pdf bib abs

Winter is here: Summarizing Twitter Streams related to Pre-Scheduled Events
Anietie Andy | Derry Tanti Wijaya | Chris Callison-Burch
Proceedings of the Second Workshop on Storytelling

Pre-scheduled events, such as TV shows and sports games, usually garner considerable attention from the public. Twitter captures large volumes of discussions and messages related to these events, in real-time. Twitter streams related to pre-scheduled events are characterized by the following: (1) spikes in the volume of published tweets reflect the highlights of the event and (2) some of the published tweets make reference to the characters involved in the event, in the context in which they are currently portrayed in a subevent. In this paper, we take advantage of these characteristics to identify the highlights of pre-scheduled events from tweet streams and we demonstrate a method to summarize these highlights. We evaluate our algorithm on tweets collected around 2 episodes of a popular TV show, Game of Thrones, Season 7.

2017

pdf bib abs

Constructing an Alias List for Named Entities during an Event
Anietie Andy | Mark Dredze | Mugizi Rwebangira | Chris Callison-Burch
Proceedings of the 3rd Workshop on Noisy User-generated Text

In certain fields, real-time knowledge from events can help in making informed decisions. In order to extract pertinent real-time knowledge related to an event, it is important to identify the named entities and their corresponding aliases related to the event. The problem of identifying aliases of named entities that spike has remained unexplored. In this paper, we introduce an algorithm, EntitySpike, that identifies entities that spike in popularity in tweets from a given time period, and constructs an alias list for these spiked entities. EntitySpike uses a temporal heuristic to identify named entities with similar context that occur in the same time period (within minutes) during an event. Each entity is encoded as a vector using this temporal heuristic. We show how these entity-vectors can be used to create a named entity alias list. We evaluated our algorithm on a dataset of temporally ordered tweets from a single event, the 2013 Grammy Awards show. We carried out various experiments on tweets that were published in the same time period and show that our algorithm identifies most entity name aliases and outperforms a competitive baseline.

2016

pdf bib abs

Name Variation in Community Question Answering Systems
Anietie Andy | Satoshi Sekine | Mugizi Rwebangira | Mark Dredze
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)

Name Variation in Community Question Answering Systems Abstract Community question answering systems are forums where users can ask and answer questions in various categories. Examples are Yahoo! Answers, Quora, and Stack Overflow. A common challenge with such systems is that a significant percentage of asked questions are left unanswered. In this paper, we propose an algorithm to reduce the number of unanswered questions in Yahoo! Answers by reusing the answer to the most similar past resolved question to the unanswered question, from the site. Semantically similar questions could be worded differently, thereby making it difficult to find questions that have shared needs. For example, “Who is the best player for the Reds?” and “Who is currently the biggest star at Manchester United?” have a shared need but are worded differently; also, “Reds” and “Manchester United” are used to refer to the soccer team Manchester United football club. In this research, we focus on question categories that contain a large number of named entities and entity name variations. We show that in these categories, entity linking can be used to identify relevant past resolved questions with shared needs as a given question by disambiguating named entities and matching these questions based on the disambiguated entities, identified entities, and knowledge base information related to these entities. We evaluated our algorithm on a new dataset constructed from Yahoo! Answers. The dataset contains annotated question pairs, (Qgiven, [Qpast, Answer]). We carried out experiments on several question categories and show that an entity-based approach gives good performance when searching for similar questions in entity rich categories.

pdf bib abs

An Entity-Based approach to Answering Recurrent and Non-Recurrent Questions with Past Answers
Anietie Andy | Mugizi Rwebangira | Satoshi Sekine
Proceedings of the Open Knowledge Base and Question Answering Workshop (OKBQA 2016)

An Entity-based approach to Answering recurrent and non-recurrent questions with Past Answers Abstract Community question answering (CQA) systems such as Yahoo! Answers allow registered-users to ask and answer questions in various question categories. However, a significant percentage of asked questions in Yahoo! Answers are unanswered. In this paper, we propose to reduce this percentage by reusing answers to past resolved questions from the site. Specifically, we propose to satisfy unanswered questions in entity rich categories by searching for and reusing the best answers to past resolved questions with shared needs. For unanswered questions that do not have a past resolved question with a shared need, we propose to use the best answer to a past resolved question with similar needs. Our experiments on a Yahoo! Answers dataset shows that our approach retrieves most of the past resolved questions that have shared and similar needs to unanswered questions.

Venues

WS1

Anietie Andy

2025

2024

2022

2021

2020

2019

2017

2016

Co-authors

Venues