Seid Muhie Yimam - ACL Anthology

Seid Muhie Yimam

Also published as: Seid Yimam, Seid Muhie Yimam

2026

Full Fine-Tuning vs. Parameter-Efficient Adaptation for Low-Resource African ASR: A Controlled Study with Whisper-Small
Sukairaj Hafiz Imam | Muhammad Yahuza Bello | Hadiza Ali Umar | Tadesse Destaw Belay | Idris Abdulmumin | Seid Muhie Yimam | Shamsuddeen Hassan Muhammad
Proceedings of the 7th Workshop on African Natural Language Processing (AfricaNLP 2026)

Automatic speech recognition (ASR) for African low-resource languages (LRLs) is often limited by scarce labelled data and the high cost of adapting large foundation models. This study evaluates whether parameter-efficient fine-tuning (PEFT) can serve as a practical alternative to full fine-tuning (FFT) for adapting Whisper-Small with limited labelled speech and constrained compute. We used a 10-hour subset of NaijaVoices covering Hausa, Yorùbá, and Igbo, and we compared FFT with several PEFT strategies under a fixed evaluation protocol. DoRA attains a 22.0% macro-average WER, closely aligning with the 22.1% achieved by FFT while updating only 4M parameters rather than 240M, and this difference remains within run-to-run variation across random seeds. Yorùbá consistently yields the lowest word error rates, whereas Igbo remains the most challenging, indicating that PEFT can deliver near FFT accuracy with substantially lower training and storage requirements for low-resource African ASR.

2025

AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages
Shamsuddeen Hassan Muhammad | Idris Abdulmumin | Abinew Ali Ayele | David Ifeoluwa Adelani | Ibrahim Said Ahmad | Saminu Mohammad Aliyu | Paul Röttger | Abigail Oppong | Andiswa Bukula | Chiamaka Ijeoma Chukwuneke | Ebrahim Chekol Jibril | Elyas Abdi Ismail | Esubalew Alemneh | Hagos Tesfahun Gebremichael | Lukman Jibril Aliyu | Meriem Beloucif | Oumaima Hourrane | Rooweither Mabuya | Salomey Osei | Samuel Rutunda | Tadesse Destaw Belay | Tadesse Kebede Guge | Tesfa Tegegne Asfaw | Lilian Diana Awuor Wanzare | Nelson Odhiambo Onyango | Seid Muhie Yimam | Nedjma Ousidhoum
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Hate speech and abusive language are global phenomena that need socio-cultural background knowledge to be understood, identified, and moderated. However, in many regions of the Global South, there have been several documented occurrences of (1) absence of moderation and (2) censorship due to the reliance on keyword spotting out of context. Further, high-profile individuals have frequently been at the center of the moderation process, while large and targeted hate speech campaigns against minorities have been overlooked.These limitations are mainly due to the lack of high-quality data in the local languages and the failure to include local communities in the collection, annotation, and moderation processes. To address this issue, we present AfriHate: a multilingual collection of hate speech and abusive language datasets in 15 African languages. Each instance in AfriHate is a tweet annotated by native speakers familiar with the regional culture. We report the challenges related to the construction of the datasets and present various classification baseline results with and without using LLMs. We find that model performance highly depends on the language and that multilingual models can help boost performance in low-resource settings.

CULEMO: Cultural Lenses on Emotion - Benchmarking LLMs for Cross-Cultural Emotion Understanding
Tadesse Destaw Belay | Ahmed Haj Ahmed | Alvin Grissom II | Iqra Ameer | Grigori Sidorov | Olga Kolesnikova | Seid Muhie Yimam
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

NLP research has increasingly focused on subjective tasks such as emotion analysis. However, existing emotion benchmarks suffer fromtwo major shortcomings: (1) they largely rely on keyword-based emotion recognition, overlooking crucial cultural dimensions required fordeeper emotion understanding, and (2) many are created by translating English-annotated data into other languages, leading to potentially unreliable evaluation. To address these issues, we introduce Cultural Lenses on Emotion (CuLEmo), the first benchmark designedto evaluate culture-aware emotion prediction across six languages: Amharic, Arabic, English, German, Hindi, and Spanish. CuLEmocomprises 400 crafted questions per language, each requiring nuanced cultural reasoning and understanding. We use this benchmark to evaluate several state-of-the-art LLMs on culture-aware emotion prediction and sentiment analysis tasks. Our findings reveal that (1) emotion conceptualizations vary significantly across languages and cultures, (2) LLMs performance likewise varies by language and cultural context, and (3) prompting in English with explicit country context often outperforms in-language prompts for culture-aware emotion and sentiment understanding. The dataset and evaluation code is available.

A Case Against Implicit Standards: Homophone Normalization in Machine Translation for Languages that use the Ge’ez Script.
Hellina Hailu Nigatu | Atnafu Lambebo Tonja | Henok Biadglign Ademtew | Hizkiel Mitiku Alemayehu | Negasi Haile Abadi | Tadesse Destaw Belay | Seid Muhie Yimam
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Homophone normalization–where characters that have the same sound in a writing script are mapped to one character–is a pre-processing step applied in Amharic Natural Language Processing (NLP) literature. While this may improve performance reported by automatic metrics, it also results in models that are unable to effectively process different forms of writing in a single language. Further, there might be impacts in transfer learning, where models trained on normalized data do not generalize well to other languages. In this paper, we experiment with monolingual training and cross-lingual transfer to understand the impacts of normalization on languages that use the Ge’ez script. We then propose a post-inference intervention in which normalization is applied to model predictions instead of training data. With our simple scheme of post-inference normalization, we show that we can achieve an increase in BLEU score of up to 1.03 while preserving language features in training.

HatePRISM: Policies, Platforms, and Research Integration. Advancing NLP for Hate Speech Proactive Mitigation
Naquee Rizwan | Seid Muhie Yimam | Daryna Dementieva | Dr. Florian Skupin | Tim Fischer | Daniil Moskovskiy | Aarushi Ajay Borkar | Robert Geislinger | Punyajoy Saha | Sarthak Roy | Martin Semmann | Alexander Panchenko | Chris Biemann | Animesh Mukherjee
Findings of the Association for Computational Linguistics: ACL 2025

Despite regulations imposed by nations and social media platforms, e.g. (Government of India, 2021; European Parliament and Council of the European Union, 2022), inter alia, hateful content persists as a significant challenge. Existing approaches primarily rely on reactive measures such as blocking or suspending offensive messages, with emerging strategies focusing on proactive measurements like detoxification and counterspeech. In our work, which we call HATEPRISM, we conduct a comprehensive examination of hate speech regulations and strategies from three perspectives: country regulations, social platform policies, and NLP research datasets. Our findings reveal significant inconsistencies in hate speech definitions and moderation practices across jurisdictions and platforms, alongside a lack of alignment with research efforts. Based on these insights, we suggest ideas and research direction for further exploration of a unified framework for automated hate speech moderation incorporating diverse strategies.

Multilingual and Explainable Text Detoxification with Parallel Corpora
Daryna Dementieva | Nikolay Babakov | Amit Ronen | Abinew Ali Ayele | Naquee Rizwan | Florian Schneider | Xintong Wang | Seid Muhie Yimam | Daniil Moskovskiy | Elisei Stakovskii | Eran Kaufman | Ashraf Elnagar | Animesh Mukherjee | Alexander Panchenko
Proceedings of the 31st International Conference on Computational Linguistics

Even with various regulations in place across countries and social media platforms (Government of India, 2021; European Parliament and Council of the European Union, 2022), digital abusive speech remains a significant issue. One potential approach to address this challenge is automatic text detoxification, a text style transfer (TST) approach that transforms toxic language into a more neutral or non-toxic form. To date, the availability of parallel corpora for the text detoxification task (Logacheva et al., 2022; Atwell et al., 2022; Dementieva et al., 2024a) has proven to be crucial for state-of-the-art approaches. With this work, we extend parallel text detoxification corpus to new languages—German, Chinese, Arabic, Hindi, and Amharic—testing in the extensive multilingual setup TST baselines. Next, we conduct the first of its kind an automated, explainable analysis of the descriptive features of both toxic and non-toxic sentences, diving deeply into the nuances, similarities, and differences of toxicity and detoxification across 9 languages. Finally, based on the obtained insights, we experiment with a novel text detoxification method inspired by the Chain-of-Thoughts reasoning approach, enhancing the prompting process through clustering on relevant descriptive attributes.

Adaption and Evaluation of Generative Large Language Models for German Medical Information Extraction
Sören Spiegel | Seid Muhie Yimam | Philipp Breitfeld | Frank Ückert
Proceedings of the 21st Conference on Natural Language Processing (KONVENS 2025): Long and Short Papers

LECTURE4ALL: A Lightweight Approach to Precise Timestamp Detection in Online Lecture Videos
Torben Hannemann | Frank Hammerschmidt | Simon Kazemi | Gregor Stange | Viktoria Wrobel | Robert Geislinger | Seid Muhie Yimam
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

This paper presents LECTURE4ALL, a web application developed to improve the search experience of educational video platforms. Lecture2Go provides a vast collection of recorded lectures, but locating specific content within videos can be time-consuming. LECTURE4ALL addresses this issue by leveraging a vector database and a streamlined user interface to enable direct retrieval of precise video timestamps. By enhancing search accuracy and efficiency, LECTURE4ALL significantly improves the accessibility and usability of educational video platforms.

We present our shared task on text-based emotion detection, covering more than 30 languages from seven distinct language families. These languages are predominantly low-resource and spoken across various continents. The data instances are multi-labeled into six emotional classes, with additional datasets in 11 languages annotated for emotion intensity. Participants were asked to predict labels in three tracks: (a) emotion labels in monolingual settings, (b) emotion intensity scores, and (c) emotion labels in cross-lingual settings.

FASCIST-O-METER: Classifier for Neo-fascist Discourse Online
Rudy Alexandro Garrido Veliz | Martin Semmann | Chris Biemann | Seid Muhie Yimam
Proceedings of the 21st Conference on Natural Language Processing (KONVENS 2025): Long and Short Papers

AfroXLMR-Social: Adapting Pre-trained Language Models for African Languages Social Media Text
Tadesse Destaw Belay | Israel Abebe Azime | Ibrahim Said Ahmad | David Ifeoluwa Adelani | Idris Abdulmumin | Abinew Ali Ayele | Shamsuddeen Hassan Muhammad | Seid Muhie Yimam
Findings of the Association for Computational Linguistics: EMNLP 2025

Language models built from various sources are the foundation of today’s NLP progress. However, for many low-resource languages, the diversity of domains is often limited, more biased to a religious domain, which impacts their performance when evaluated on distant and rapidly evolving domains such as social media. Domain adaptive pre-training (DAPT) and task-adaptive pre-training (TAPT) are popular techniques to reduce this bias through continual pre-training for BERT-based models, but they have not been explored for African multilingual encoders. In this paper, we explore DAPT and TAPT continual pre-training approaches for African languages social media domain. We introduce AfriSocial, a large-scale social media and news domain corpus for continual pre-training on several African languages. Leveraging AfriSocial, we show that DAPT consistently improves performance (from 1% to 30% F1 score) on three subjective tasks: sentiment analysis, multi-label emotion, and hate speech classification, covering 19 languages. Similarly, leveraging TAPT on the data from one task enhances performance on other related tasks. For example, training with unlabeled sentiment data (source) for a fine-grained emotion classification task (target) improves the baseline results by an F1 score ranging from 0.55% to 15.11%. Combining these two methods (i.e. DAPT + TAPT) further improves the overall performance. The data and model resources are available at HuggingFace.

Evaluating the Capabilities of Large Language Models for Multi-label Emotion Understanding
Tadesse Destaw Belay | Israel Abebe Azime | Abinew Ali Ayele | Grigori Sidorov | Dietrich Klakow | Philip Slusallek | Olga Kolesnikova | Seid Muhie Yimam
Proceedings of the 31st International Conference on Computational Linguistics

Large Language Models (LLMs) show promising learning and reasoning abilities. Compared to other NLP tasks, multilingual and multi-label emotion evaluation tasks are under-explored in LLMs. In this paper, we present EthioEmo, a multi-label emotion classification dataset for four Ethiopian languages, namely, Amharic (amh), Afan Oromo (orm), Somali (som), and Tigrinya (tir). We perform extensive experiments with an additional English multi-label emotion dataset from SemEval 2018 Task 1. Our evaluation includes encoder-only, encoder-decoder, and decoder-only language models. We compare zero and few-shot approaches of LLMs to fine-tuning smaller language models. The results show that accurate multi-label emotion classification is still insufficient even for high-resource languages such as English, and there is a large gap between the performance of high-resource and low-resource languages. The results also show varying performance levels depending on the language and model type. EthioEmo is available publicly to further improve the understanding of emotions in language models and how people convey emotions through various languages.

Automatic Speech Recognition for African Low-Resource Languages: Challenges and Future Directions
Sukairaj Hafiz Imam | Babangida Sani | Dawit Ketema Gete | Bedru Yimam Ahmed | Ibrahim Said Ahmad | Idris Abdulmumin | Seid Muhie Yimam | Muhammad Yahuza Bello | Shamsuddeen Hassan Muhammad
Proceedings of the Sixth Workshop on African Natural Language Processing (AfricaNLP 2025)

Automatic Speech Recognition (ASR) technologies have transformed human-computer interaction; however, low-resource languages in Africa remain significantly underrepresented in both research and practical applications. This study investigates the major challenges hindering the development of ASR systems for these languages, which include data scarcity, linguistic complexity, limited computational resources, acoustic variability, and ethical concerns surrounding bias and privacy. The primary goal is to critically analyze these barriers and identify practical, inclusive strategies to advance ASR technologies within the African context. Recent advances and case studies emphasize promising strategies such as community-driven data collection, self-supervised and multilingual learning, lightweight model architectures, and techniques that prioritize privacy. Evidence from pilot projects involving various African languages showcases the feasibility and impact of customized solutions, which encompass morpheme-based modeling and domain-specific ASR applications in sectors like healthcare and education. The findings highlight the importance of interdisciplinary collaboration and sustained investment to tackle the distinct linguistic and infrastructural challenges faced by the continent. This study offers a progressive roadmap for creating ethical, efficient, and inclusive ASR systems that not only safeguard linguistic diversity but also improve digital accessibility and promote socioeconomic participation for speakers of African languages.

2024

Exploring and quantifying semantic relatedness is central to representing language and holds significant implications across various NLP tasks. While earlier NLP research primarily focused on semantic similarity, often within the English language context, we instead investigate the broader phenomenon of semantic relatedness. In this paper, we present SemRel, a new semantic relatedness dataset collection annotated by native speakers across 13 languages: Afrikaans, Algerian Arabic, Amharic, English, Hausa, Hindi, Indonesian, Kinyarwanda, Marathi, Moroccan Arabic, Modern Standard Arabic, Spanish, and Telugu. These languages originate from five distinct language families and are predominantly spoken in Africa and Asia – regions characterised by a relatively limited availability of NLP resources. Each instance in the SemRel datasets is a sentence pair associated with a score that represents the degree of semantic textual relatedness between the two sentences. The scores are obtained using a comparative annotation framework. We describe the data collection and annotation processes, challenges when building the datasets, baseline experiments, and their impact and utility in NLP.

EthioLLM: Multilingual Large Language Models for Ethiopian Languages with Task Evaluation
Atnafu Lambebo Tonja | Israel Abebe Azime | Tadesse Destaw Belay | Mesay Gemeda Yigezu | Moges Ahmed Ah Mehamed | Abinew Ali Ayele | Ebrahim Chekol Jibril | Michael Melese Woldeyohannis | Olga Kolesnikova | Philipp Slusallek | Dietrich Klakow | Seid Muhie Yimam
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Large language models (LLMs) have gained popularity recently due to their outstanding performance in various downstream Natural Language Processing (NLP) tasks. However, low-resource languages are still lagging behind current state-of-the-art (SOTA) developments in the field of NLP due to insufficient resources to train LLMs. Ethiopian languages exhibit remarkable linguistic diversity, encompassing a wide array of scripts, and are imbued with profound religious and cultural significance. This paper introduces EthioLLM – multilingual large language models for five Ethiopian languages (Amharic, Ge’ez, Afan Oromo, Somali, and Tigrinya) and English, and Ethiobenchmark – a new benchmark dataset for various downstream NLP tasks. We evaluate the performance of these models across five downstream NLP tasks. We open-source our multilingual language models, new benchmark datasets for various downstream tasks, and task-specific fine-tuned language models and discuss the performance of the models. Our dataset and models are available at the https://huggingface.co/EthioNLP repository.

Detecting Hate Speech in Amharic Using Multimodal Analysis of Social Media Memes
Melese Ayichlie Jigar | Abinew Ali Ayele | Seid Muhie Yimam | Chris Biemann
Proceedings of the Fourth Workshop on Threat, Aggression & Cyberbullying @ LREC-COLING-2024

In contemporary society, the proliferation of hate speech is increasingly prevalent across various social media platforms, with a notable trend of incorporating memes to amplify its visual impact and reach. The conventional text-based detection approaches frequently fail to address the complexities introduced by memes, thereby aggravating the challenges, particularly in low-resource languages such as Amharic. We develop Amharic meme hate speech detection models using 2,000 memes collected from Facebook, Twitter, and Telegram over four months. We employ native Amharic speakers to annotate each meme using a web-based tool, yielding a Fleiss’ kappa score of 0.50. We utilize different feature extraction techniques, namely VGG16 for images and word2Vec for textual content, and build unimodal and multimodal models such as LSTM, BiLSTM, and CNN. The BiLSTM model shows the best performance, achieving 63% accuracy for text and 75% for multimodal features. In image-only experiments, the CNN model achieves 69% in accuracy. Multimodal models demonstrate superior performance in detecting Amharic hate speech in memes, showcasing their potential to address the unique challenges posed by meme-based hate speech on social media.

Walia-LLM: Enhancing Amharic-LLaMA by Integrating Task-Specific and Generative Datasets
Israel Abebe Azime | Atnafu Lambebo Tonja | Tadesse Destaw Belay | Mitiku Yohannes Fuge | Aman Kassahun Wassie | Eyasu Shiferaw Jada | Yonas Chanie | Walelign Tewabe Sewunetie | Seid Muhie Yimam
Findings of the Association for Computational Linguistics: EMNLP 2024

Large language models (LLMs) have received a lot of attention in natural language processing (NLP) research because of their exceptional performance in understanding and generating human languages. However, low-resource languages are left behind due to the unavailability of resources. In this work, we focus on enhancing the LLaMA-2-Amharic model by integrating task-specific and generative datasets to improve language model performance for Amharic. We compile an Amharic instruction fine-tuning dataset and fine-tuned LLaMA-2-Amharic model. The fine-tuned model shows promising results in different NLP tasks. We also explore the effectiveness of translated instruction datasets compared to the dataset we created. Our dataset creation pipeline, along with instruction datasets, trained models, and evaluation outputs, is made publicly available to encourage research in language-specific models.

We present the first shared task on Semantic Textual Relatedness (STR). While earlier shared tasks primarily focused on semantic similarity, we instead investigate the broader phenomenon of semantic relatedness across 14 languages: Afrikaans, Algerian Arabic, Amharic, English, Hausa, Hindi, Indonesian, Kinyarwanda, Marathi, Moroccan Arabic, Modern Standard Arabic, Punjabi, Spanish, and Telugu. These languages originate from five distinct language families and are predominantly spoken in Africa and Asia – regions characterised by the relatively limited availability of NLP resources. Each instance in the datasets is a sentence pair associated with a score that represents the degree of semantic textual relatedness between the two sentences. Participating systems were asked to rank sentence pairs by their closeness in meaning (i.e., their degree of semantic relatedness) in the 14 languages in three main tracks: (a) supervised, (b) unsupervised, and (c) crosslingual. The task attracted 163 participants. We received 70 submissions in total (across all tasks) from 51 different teams, and 38 system description papers. We report on the best-performing systems as well as the most common and the most effective approaches for the three different tracks.

Gender Bias Evaluation in Machine Translation for Amharic, Tigrigna, and Afaan Oromoo
Walelign Sewunetie | Atnafu Tonja | Tadesse Belay | Hellina Hailu Nigatu | Gashaw Gebremeskel | Zewdie Mossie | Hussien Seid | Seid Yimam
Proceedings of the 2nd International Workshop on Gender-Inclusive Translation Technologies

While Machine Translation (MT) research has progressed over the years, translation systems still suffer from biases, including gender bias. While an active line of research studies the existence and mitigation strategies of gender bias in machine translation systems, there is limited research exploring this phenomenon for low-resource languages. The limited availability of linguistic and computational resources confounded with the lack of benchmark datasets makes studying bias for low-resourced languages that much more difficult. In this paper, we construct benchmark datasets to evaluate gender bias in machine translation for three low-resource languages: Afaan Oromoo (Orm), Amharic (Amh), and Tigrinya (Tir). Building on prior work, we collected 2400 gender-balanced sentences parallelly translated into the three languages. From human evaluations of the dataset we collected, we found that about 93% of Afaan Oromoo, 80% of Tigrinya, and 72% of Amharic sentences exhibited gender bias. In addition to providing benchmarks for improving gender bias mitigation research in the three languages, we hope the careful documentation of our work will help other low-resourced language researchers extend our approach to their languages.

UHH at AVeriTeC: RAG for Fact-Checking with Real-World Claims
Özge Sevgili | Irina Nikishina | Seid Muhie Yimam | Martin Semmann | Chris Biemann
Proceedings of the Seventh Fact Extraction and VERification Workshop (FEVER)

This paper presents UHH’s approach developed for the AVeriTeC shared task. The goal of the challenge is to verify given real-world claims with evidences from the Web. In this shared task, we investigate a Retrieval-Augmented Generation (RAG) model, which mainly contains retrieval, generation, and augmentation components. We start with the selection of the top 10k evidences via BM25 scores, and continue with two approaches to retrieve the most similar evidences: (1) to retrieve top 10 evidences through vector similarity, generate questions for them, and rerank them or (2) to generate questions for the claim and retrieve the most similar evidence, again, through vector similarity. After retrieving the top evidences, a Large Language Model (LLM) is prompted using the claim along with either all evidences or individual evidence to predict the label. Our system submission, UHH, using the first approach and individual evidence prompts, ranks 6th out of 23 systems.

Exploring Boundaries and Intensities in Offensive and Hate Speech: Unveiling the Complex Spectrum of Social Media Discourse
Abinew Ali Ayele | Esubalew Alemneh Jalew | Adem Chanie Ali | Seid Muhie Yimam | Chris Biemann
Proceedings of the Fourth Workshop on Threat, Aggression & Cyberbullying @ LREC-COLING-2024

The prevalence of digital media and evolving sociopolitical dynamics have significantly amplified the dissemination of hateful content. Existing studies mainly focus on classifying texts into binary categories, often overlooking the continuous spectrum of offensiveness and hatefulness inherent in the text. In this research, we present an extensive benchmark dataset for Amharic, comprising 8,258 tweets annotated for three distinct tasks: category classification, identification of hate targets, and rating offensiveness and hatefulness intensities. Our study highlights that a considerable majority of tweets belong to the less offensive and less hate intensity levels, underscoring the need for early interventions by stakeholders. The prevalence of ethnic and political hatred targets, with significant overlaps in our dataset, emphasizes the complex relationships within Ethiopia’s sociopolitical landscape. We build classification and regression models and investigate the efficacy of models in handling these tasks. Our results reveal that hate and offensive speech can not be addressed by a simplistic binary classification, instead manifesting as variables across a continuous range of values. The afro-XLMR-large model exhibits the best performances achieving F1-scores of 75.30%, 70.59%, and 29.42% for the category, target, and regression tasks, respectively. The 80.22% correlation coefficient of the Afro-XLMR-large model indicates strong alignments.

2023

Natural Language Processing in Ethiopian Languages: Current State, Challenges, and Opportunities
Atnafu Lambebo Tonja | Tadesse Destaw Belay | Israel Abebe Azime | Abinew Ali Ayele | Moges Ahmed Mehamed | Olga Kolesnikova | Seid Muhie Yimam
Proceedings of the Fourth workshop on Resources for African Indigenous Languages (RAIL 2023)

This survey delves into the current state of natural language processing (NLP) for four Ethiopian languages: Amharic, Afaan Oromo, Tigrinya, and Wolaytta. Through this paper, we identify key challenges and opportunities for NLP research in Ethiopia.Furthermore, we provide a centralized repository on GitHub that contains publicly available resources for various NLP tasks in these languages. This repository can be updated periodically with contributions from other researchers. Our objective is to disseminate information to NLP researchers interested in Ethiopian languages and encourage future research in this domain.

Africa is home to over 2,000 languages from over six language families and has the highest linguistic diversity among all continents. This includes 75 languages with at least one million speakers each. Yet, there is little NLP research conducted on African languages. Crucial in enabling such research is the availability of high-quality annotated datasets. In this paper, we introduce AfriSenti, a sentiment analysis benchmark that contains a total of >110,000 tweets in 14 African languages (Amharic, Algerian Arabic, Hausa, Igbo, Kinyarwanda, Moroccan Arabic, Mozambican Portuguese, Nigerian Pidgin, Oromo, Swahili, Tigrinya, Twi, Xitsonga, and Yoruba) from four language families. The tweets were annotated by native speakers and used in the AfriSenti-SemEval shared task (with over 200 participants, see website: https://afrisenti-semeval.github.io). We describe the data collection methodology, annotation process, and the challenges we dealt with when curating each dataset. We further report baseline experiments conducted on the AfriSenti datasets and discuss their usefulness.

Exploring Amharic Hate Speech Data Collection and Classification Approaches
Abinew Ali Ayele | Seid Muhie Yimam | Tadesse Destaw Belay | Tesfa Asfaw | Chris Biemann
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

In this paper, we present a study of efficient data selection and annotation strategies for Amharic hate speech. We also build various classification models and investigate the challenges of hate speech data selection, annotation, and classification for the Amharic language. From a total of over 18 million tweets in our Twitter corpus, 15.1k tweets are annotated by two independent native speakers, and a Cohen’s kappa score of 0.48 is achieved. A third annotator, a curator, is also employed to decide on the final gold labels. We employ both classical machine learning and deep learning approaches, which include fine-tuning AmFLAIR and AmRoBERTa contextual embedding models. Among all the models, AmFLAIR achieves the best performance with an F1-score of 72%. We publicly release the annotation guidelines, keywords/lexicon entries, datasets, models, and associated scripts with a permissive license.

SemEval-2023 Task 12: Sentiment Analysis for African Languages (AfriSenti-SemEval)
Shamsuddeen Hassan Muhammad | Idris Abdulmumin | Seid Muhie Yimam | David Ifeoluwa Adelani | Ibrahim Said Ahmad | Nedjma Ousidhoum | Abinew Ali Ayele | Saif Mohammad | Meriem Beloucif | Sebastian Ruder
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

We present the first Africentric SemEval Shared task, Sentiment Analysis for African Languages (AfriSenti-SemEval) - The dataset is available at https://github.com/afrisenti-semeval/afrisent-semeval-2023. AfriSenti-SemEval is a sentiment classification challenge in 14 African languages: Amharic, Algerian Arabic, Hausa, Igbo, Kinyarwanda, Moroccan Arabic, Mozambican Portuguese, Nigerian Pidgin, Oromo, Swahili, Tigrinya, Twi, Xitsonga, and Yorb (Muhammad et al., 2023), using data labeled with 3 sentiment classes. We present three subtasks: (1) Task A: monolingual classification, which received 44 submissions; (2) Task B: multilingual classification, which received 32 submissions; and (3) Task C: zero-shot classification, which received 34 submissions. The best performance for tasks A and B was achieved by NLNDE team with 71.31 and 75.06 weighted F1, respectively. UCAS-IIE-NLP achieved the best average score for task C with 58.15 weighted F1. We describe the various approaches adopted by the top 10 systems and their approaches.

CodeAnno: Extending WebAnno with Hierarchical Document Level Annotation and Automation
Florian Schneider | Seid Muhie Yimam | Fynn Petersen-Frey | Gerret von Nordheim | Katharina Kleinen-von Königslöw | Chris Biemann
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

WebAnno is one of the most popular annotation tools that supports generic annotation types and distributive annotation with multiple user roles. However, WebAnno focuses on annotating span-level mentions and relations among them, making document-level annotation complicated. When it comes to the annotation and analysis of social science materials, it usually involves the creation of codes to categorize a given document. The codes, which are known as codebooks, are typically hierarchical, which enables to code the document either with a general category or more fine-grained subcategories. CodeAnno is forked from WebAnno and designed to solve the coding problems faced by many social science researchers with the following main functionalities. 1) Creation of hierarchical codebooks, with functionality to move and sort categories in the hierarchy 2) an interactive UI for codebook annotation 3) import and export of annotations in CSV format, hence being compatible with existing annotations conducted using spreadsheet applications 4) integration of an external automation component to facilitate coding using machine learning 5) project templating that allows duplicating a project structure without copying the actual documents. We present different use-cases to demonstrate the capability of CodeAnno. A shot demonstration video of the system is available here: https://www.youtube.com/watch?v=RmCdTghBe-s

Multilingual Racial Hate Speech Detection Using Transfer Learning
Abinew Ali Ayele | Skadi Dinter | Seid Muhie Yimam | Chris Biemann
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

The rise of social media eases the spread of hateful content, especially racist content with severe consequences. In this paper, we analyze the tweets targeting the death of George Floyd in May 2020 as the event accelerated debates on racism globally. We focus on the tweets published in French for a period of one month since the death of Floyd. Using the Yandex Toloka platform, we annotate the tweets into categories as hate, offensive or normal. Tweets that are offensive or hateful are further annotated as racial or non-racial. We build French hate speech detection models based on the multilingual BERT and CamemBERT and apply transfer learning by fine-tuning the HateXplain model. We compare different approaches to resolve annotation ties and find that the detection model based on CamemBERT yields the best results in our experiments.

Multi-Modal Learning Application – Support Language Learners with NLP Techniques and Eye-Tracking
Robert Geislinger | Ali Ebrahimi Pourasad | Deniz Gül | Daniel Djahangir | Seid Muhie Yimam | Steffen Remus | Chris Biemann
Proceedings of the 1st Workshop on Linguistic Insights from and for Multimodal Language Processing

2022

More Like This: Semantic Retrieval with Linguistic Information
Steffen Remus | Gregor Wiedemann | Saba Anwar | Fynn Petersen-Frey | Seid Muhie Yimam | Chris Biemann
Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022)

Elvis vs. M. Jackson: Who has More Albums? Classification and Identification of Elements in Comparative Questions
Meriem Beloucif | Seid Muhie Yimam | Steffen Stahlhacke | Chris Biemann
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Comparative Question Answering (cQA) is the task of providing concrete and accurate responses to queries such as: “Is Lyft cheaper than a regular taxi?” or “What makes a mortgage different from a regular loan?”. In this paper, we propose two new open-domain real-world datasets for identifying and labeling comparative questions. While the first dataset contains instances of English questions labeled as comparative vs. non-comparative, the second dataset provides additional labels including the objects and the aspects of comparison. We conduct several experiments that evaluate the soundness of our datasets. The evaluation of our datasets using various classifiers show promising results that reach close-to-human results on a binary classification task with a neural model using ALBERT embeddings. When approaching the unsupervised sequence labeling task, some headroom remains.

Question Answering Classification for Amharic Social Media Community Based Questions
Tadesse Destaw Belay | Seid Muhie Yimam | Abinew Ayele | Chris Biemann
Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages

In this work, we build a Question Answering (QA) classification dataset from a social media platform, namely the Telegram public channel called @AskAnythingEthiopia. The channel has more than 78k subscribers and has existed since May 31, 2019. The platform allows asking questions that belong to various domains, like politics, economics, health, education, and so on. Since the questions are posed in a mixed-code, we apply different strategies to pre-process the dataset. Questions are posted in Amharic, English, or Amharic but in a Latin script. As part of the pre-processing tools, we build a Latin to Ethiopic Script transliteration tool. We collect 8k Amharic and 24K transliterated questions and develop deep learning-based questions answering classifiers that attain as high as an F-score of 57.29 in 20 different question classes or categories. The datasets and pre-processing scripts are open-sourced to facilitate further research on the Amharic community-based question answering.

2021

ActiveAnno: General-Purpose Document-Level Annotation Tool with Active Learning Integration
Max Wiechmann | Seid Muhie Yimam | Chris Biemann
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations

ActiveAnno is an annotation tool focused on document-level annotation tasks developed both for industry and research settings. It is designed to be a general-purpose tool with a wide variety of use cases. It features a modern and responsive web UI for creating annotation projects, conducting annotations, adjudicating disagreements, and analyzing annotation results. ActiveAnno embeds a highly configurable and interactive user interface. The tool also integrates a RESTful API that enables integration into other software systems, including an API for machine learning integration. ActiveAnno is built with extensible design and easy deployment in mind, all to enable users to perform annotation tasks with high efficiency and high-quality annotation results.

MasakhaNER: Named Entity Recognition for African Languages
David Ifeoluwa Adelani | Jade Abbott | Graham Neubig | Daniel D’souza | Julia Kreutzer | Constantine Lignos | Chester Palen-Michel | Happy Buzaaba | Shruti Rijhwani | Sebastian Ruder | Stephen Mayhew | Israel Abebe Azime | Shamsuddeen H. Muhammad | Chris Chinenye Emezue | Joyce Nakatumba-Nabende | Perez Ogayo | Aremu Anuoluwapo | Catherine Gitau | Derguene Mbaye | Jesujoba Alabi | Seid Muhie Yimam | Tajuddeen Rabiu Gwadabe | Ignatius Ezeani | Rubungo Andre Niyongabo | Jonathan Mukiibi | Verrah Otiende | Iroro Orife | Davis David | Samba Ngom | Tosin Adewumi | Paul Rayson | Mofetoluwa Adeyemi | Gerald Muriuki | Emmanuel Anebi | Chiamaka Chukwuneke | Nkiruka Odu | Eric Peter Wairagala | Samuel Oyerinde | Clemencia Siro | Tobius Saul Bateesa | Temilola Oloyede | Yvonne Wambui | Victor Akinode | Deborah Nabagereka | Maurice Katusiime | Ayodele Awokoya | Mouhamadane MBOUP | Dibora Gebreyohannes | Henok Tilaye | Kelechi Nwaike | Degaga Wolde | Abdoulaye Faye | Blessing Sibanda | Orevaoghene Ahia | Bonaventure F. P. Dossou | Kelechi Ogueji | Thierno Ibrahima DIOP | Abdoulaye Diallo | Adewale Akinfaderin | Tendai Marengereke | Salomey Osei
Transactions of the Association for Computational Linguistics, Volume 9

We take a step towards addressing the under- representation of the African continent in NLP research by bringing together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten African languages. We detail the characteristics of these languages to help researchers and practitioners better understand the challenges they pose for NER tasks. We analyze our datasets and conduct an extensive empirical evaluation of state- of-the-art methods across both supervised and transfer learning settings. Finally, we release the data, code, and models to inspire future research on African NLP.1

How Hateful are Movies? A Study and Prediction on Movie Subtitles
Niklas von Boguszewski | Sana Moin | Anirban Bhowmick | Seid Muhie Yimam | Chris Biemann
Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021)

The Development of Pre-processing Tools and Pre-trained Embedding Models for Amharic
Tadesse Destaw Belay | Abinew Ayele | Seid Muhie Yimam
Proceedings of the Fifth Workshop on Widening Natural Language Processing

Amharic is the second most spoken Semitic language after Arabic and serves as the official working language of Ethiopia. While Amharic NLP research is getting wider attention recently, the main bottleneck is that the resources and related tools are not publicly released, which makes it still a low-resource language. Due to this reason, we observe that different researchers try to repeat the same NLP research again and again. In this work, we investigate the existing approach in Amharic NLP and take the first step to publicly release tools, datasets, and models to advance Amharic NLP research. We build Python-based preprocessing tools for Amharic (tokenizer, sentence segmenter, and text cleaner) that can easily be used and integrated for the development of NLP applications. Furthermore, we compiled the first moderately large-scale Amharic text corpus (6.8m sentences) along with the word2Vec, fastText, RoBERTa, and FLAIR embeddings models. Finally, we compile benchmark datasets and build classification models for the named entity recognition task.

Word Complexity is in the Eye of the Beholder
Sian Gooding | Ekaterina Kochmar | Seid Muhie Yimam | Chris Biemann
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Lexical complexity is a highly subjective notion, yet this factor is often neglected in lexical simplification and readability systems which use a ”one-size-fits-all” approach. In this paper, we investigate which aspects contribute to the notion of lexical complexity in various groups of readers, focusing on native and non-native speakers of English, and how the notion of complexity changes depending on the proficiency level of a non-native reader. To facilitate reproducibility of our approach and foster further research into these aspects, we release a dataset of complex words annotated by readers with different backgrounds.

SCoT: Sense Clustering over Time: a tool for the analysis of lexical change
Christian Haase | Saba Anwar | Seid Muhie Yimam | Alexander Friedrich | Chris Biemann
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

We present Sense Clustering over Time (SCoT), a novel network-based tool for analysing lexical change. SCoT represents the meanings of a word as clusters of similar words. It visualises their formation, change, and demise. There are two main approaches to the exploration of dynamic networks: the discrete one compares a series of clustered graphs from separate points in time. The continuous one analyses the changes of one dynamic network over a time-span. SCoT offers a new hybrid solution. First, it aggregates time-stamped documents into intervals and calculates one sense graph per discrete interval. Then, it merges the static graphs to a new type of dynamic semantic neighbourhood graph over time. The resulting sense clusters offer uniquely detailed insights into lexical change over continuous intervals with model transparency and provenance. SCoT has been successfully used in a European study on the changing meaning of ‘crisis’.

2020

Automatic Compilation of Resources for Academic Writing and Evaluating with Informal Word Identification and Paraphrasing System
Seid Muhie Yimam | Gopalakrishnan Venkatesh | John Lee | Chris Biemann
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present the first approach to automatically building resources for academic writing. The aim is to build a writing aid system that automatically edits a text so that it better adheres to the academic style of writing. On top of existing academic resources, such as the Corpus of Contemporary American English (COCA) academic Word List, the New Academic Word List, and the Academic Collocation List, we also explore how to dynamically build such resources that would be used to automatically identify informal or non-academic words or phrases. The resources are compiled using different generic approaches that can be extended for different domains and languages. We describe the evaluation of resources with a system implementation. The system consists of an informal word identification (IWI), academic candidate paraphrase generation, and paraphrase ranking components. To generate candidates and rank them in context, we have used the PPDB and WordNet paraphrase resources. We use the Concepts in Context (CoInCO) “All-Words” lexical substitution dataset both for the informal word identification and paraphrase generation experiments. Our informal word identification component achieves an F-1 score of 82%, significantly outperforming a stratified classifier baseline. The main contribution of this work is a domain-independent methodology to build targeted resources for writing aids.

UHH-LT at SemEval-2020 Task 12: Fine-Tuning of Pre-Trained Transformer Networks for Offensive Language Detection
Gregor Wiedemann | Seid Muhie Yimam | Chris Biemann
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Fine-tuning of pre-trained transformer networks such as BERT yield state-of-the-art results for text classification tasks. Typically, fine-tuning is performed on task-specific training datasets in a supervised manner. One can also fine-tune in unsupervised manner beforehand by further pre-training the masked language modeling (MLM) task. Hereby, in-domain data for unsupervised MLM resembling the actual classification target dataset allows for domain adaptation of the model. In this paper, we compare current pre-trained transformer networks with and without MLM fine-tuning on their performance for offensive language detection. Our MLM fine-tuned RoBERTa-based classifier officially ranks 1st in the SemEval 2020 Shared Task 12 for the English language. Further experiments with the ALBERT model even surpass this result.

Exploring Amharic Sentiment Analysis from Social Media Texts: Building Annotation Tools and Classification Models
Seid Muhie Yimam | Hizkiel Mitiku Alemayehu | Abinew Ayele | Chris Biemann
Proceedings of the 28th International Conference on Computational Linguistics

This paper presents the study of sentiment analysis for Amharic social media texts. As the number of social media users is ever-increasing, social media platforms would like to understand the latent meaning and sentiments of a text to enhance decision-making procedures. However, low-resource languages such as Amharic have received less attention due to several reasons such as lack of well-annotated datasets, unavailability of computing resources, and fewer or no expert researchers in the area. This research addresses three main research questions. We first explore the suitability of existing tools for the sentiment analysis task. Annotation tools are scarce to support large-scale annotation tasks in Amharic. Also, the existing crowdsourcing platforms do not support Amharic text annotation. Hence, we build a social-network-friendly annotation tool called ‘ASAB’ using the Telegram bot. We collect 9.4k tweets, where each tweet is annotated by three Telegram users. Moreover, we explore the suitability of machine learning approaches for Amharic sentiment analysis. The FLAIR deep learning text classifier, based on network embeddings that are computed from a distributional thesaurus, outperforms other supervised classifiers. We further investigate the challenges in building a sentiment analysis system for Amharic and we found that the widespread usage of sarcasm and figurative speech are the main issues in dealing with the problem. To advance the sentiment analysis research in Amharic and other related low-resource languages, we release the dataset, the annotation tool, source code, and models publicly under a permissive.

2018

A Multilingual Information Extraction Pipeline for Investigative Journalism
Gregor Wiedemann | Seid Muhie Yimam | Chris Biemann
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We introduce an advanced information extraction pipeline to automatically process very large collections of unstructured textual data for the purpose of investigative journalism. The pipeline serves as a new input processor for the upcoming major release of our New/s/leak 2.0 software, which we develop in cooperation with a large German news organization. The use case is that journalists receive a large collection of files up to several Gigabytes containing unknown contents. Collections may originate either from official disclosures of documents, e.g. Freedom of Information Act requests, or unofficial data leaks.

Par4Sim – Adaptive Paraphrasing for Text Simplification
Seid Muhie Yimam | Chris Biemann
Proceedings of the 27th International Conference on Computational Linguistics

Learning from a real-world data stream and continuously updating the model without explicit supervision is a new challenge for NLP applications with machine learning components. In this work, we have developed an adaptive learning system for text simplification, which improves the underlying learning-to-rank model from usage data, i.e. how users have employed the system for the task of simplification. Our experimental result shows that, over a period of time, the performance of the embedded paraphrase ranking model increases steadily improving from a score of 62.88% up to 75.70% based on the NDCG@10 evaluation metrics. To our knowledge, this is the first study where an NLP component is adaptively improved through usage.

A Report on the Complex Word Identification Shared Task 2018
Seid Muhie Yimam | Chris Biemann | Shervin Malmasi | Gustavo Paetzold | Lucia Specia | Sanja Štajner | Anaïs Tack | Marcos Zampieri
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications

We report the findings of the second Complex Word Identification (CWI) shared task organized as part of the BEA workshop co-located with NAACL-HLT’2018. The second CWI shared task featured multilingual and multi-genre datasets divided into four tracks: English monolingual, German monolingual, Spanish monolingual, and a multilingual track with a French test set, and two tasks: binary classification and probabilistic classification. A total of 12 teams submitted their results in different task/track combinations and 11 of them wrote system description papers that are referred to in this report and appear in the BEA workshop proceedings.

Demonstrating Par4Sem - A Semantic Writing Aid with Adaptive Paraphrasing
Seid Muhie Yimam | Chris Biemann
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

In this paper, we present Par4Sem, a semantic writing aid tool based on adaptive paraphrasing. Unlike many annotation tools that are primarily used to collect training examples, Par4Sem is integrated into a real word application, in this case a writing aid tool, in order to collect training examples from usage data. Par4Sem is a tool, which supports an adaptive, iterative, and interactive process where the underlying machine learning models are updated for each iteration using new training examples from usage data. After motivating the use of ever-learning tools in NLP applications, we evaluate Par4Sem by adopting it to a text simplification task through mere usage.

2017

Multilingual and Cross-Lingual Complex Word Identification
Seid Muhie Yimam | Sanja Štajner | Martin Riedl | Chris Biemann
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

Complex Word Identification (CWI) is an important task in lexical simplification and text accessibility. Due to the lack of CWI datasets, previous works largely depend on Simple English Wikipedia and edit histories for obtaining ‘gold standard’ annotations, which are of doubtable quality, and limited only to English. We collect complex words/phrases (CP) for English, German and Spanish, annotated by both native and non-native speakers, and propose language independent features that can be used to train multilingual and cross-lingual CWI models. We show that the performance of cross-lingual CWI systems (using a model trained on one language and applying it on the other languages) is comparable to the performance of monolingual CWI systems.

IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for Community Question Answering and Implicit Dialogue Identification
Titas Nandi | Chris Biemann | Seid Muhie Yimam | Deepak Gupta | Sarah Kohail | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

In this paper we present the system for Answer Selection and Ranking in Community Question Answering, which we build as part of our participation in SemEval-2017 Task 3. We develop a Support Vector Machine (SVM) based system that makes use of textual, domain-specific, word-embedding and topic-modeling features. In addition, we propose a novel method for dialogue chain identification in comment threads. Our primary submission won subtask C, outperforming other systems in all the primary evaluation metrics. We performed well in other English subtasks, ranking third in subtask A and eighth in subtask B. We also developed open source toolkits for all the three English subtasks by the name cQARank [https://github.com/TitasNandi/cQARank].

Entity-Centric Information Access with Human in the Loop for the Biomedical Domain
Seid Muhie Yimam | Steffen Remus | Alexander Panchenko | Andreas Holzinger | Chris Biemann
Proceedings of the Biomedical NLP Workshop associated with RANLP 2017

In this paper, we describe the concept of entity-centric information access for the biomedical domain. With entity recognition technologies approaching acceptable levels of accuracy, we put forward a paradigm of document browsing and searching where the entities of the domain and their relations are explicitly modeled to provide users the possibility of collecting exhaustive information on relations of interest. We describe three working prototypes along these lines: NEW/S/LEAK, which was developed for investigative journalists who need a quick overview of large leaked document collections; STORYFINDER, which is a personalized organizer for information found in web pages that allows adding entities as well as relations, and is capable of personalized information management; and adaptive annotation capabilities of WEBANNO, which is a general-purpose linguistic annotation tool. We will discuss future steps towards the adaptation of these tools to biomedical data, which is subject to a recently started project on biomedical knowledge acquisition. A key difference to other approaches is the centering around the user in a Human-in-the-Loop machine learning approach, where users define and extend categories and enable the system to improve via feedback and interaction.

CWIG3G2 - Complex Word Identification Task across Three Text Genres and Two User Groups
Seid Muhie Yimam | Sanja Štajner | Martin Riedl | Chris Biemann
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Complex word identification (CWI) is an important task in text accessibility. However, due to the scarcity of CWI datasets, previous studies have only addressed this problem on Wikipedia sentences and have solely taken into account the needs of non-native English speakers. We collect a new CWI dataset (CWIG3G2) covering three text genres News, WikiNews, and Wikipedia) annotated by both native and non-native English speakers. Unlike previous datasets, we cover single words, as well as complex phrases, and present them for judgment in a paragraph context. We present the first study on cross-genre and cross-group CWI, showing measurable influences in native language and genre types.

2016

new/s/leak – Information Extraction and Visualization for Investigative Data Journalists
Seid Muhie Yimam | Heiner Ulrich | Tatiana von Landesberger | Marcel Rosenbach | Michaela Regneri | Alexander Panchenko | Franziska Lehmann | Uli Fahrer | Chris Biemann | Kathrin Ballweg
Proceedings of ACL-2016 System Demonstrations

Learning Paraphrasing for Multiword Expressions
Seid Muhie Yimam | Héctor Martínez Alonso | Martin Riedl | Chris Biemann
Proceedings of the 12th Workshop on Multiword Expressions

A Web-based Tool for the Integrated Annotation of Semantic and Syntactic Structures
Richard Eckart de Castilho | Éva Mújdricza-Maydt | Seid Muhie Yimam | Silvana Hartmann | Iryna Gurevych | Anette Frank | Chris Biemann
Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)

We introduce the third major release of WebAnno, a generic web-based annotation tool for distributed teams. New features in this release focus on semantic annotation tasks (e.g. semantic role labelling or event annotation) and allow the tight integration of semantic annotations with syntactic annotations. In particular, we introduce the concept of slot features, a novel constraint mechanism that allows modelling the interaction between semantic and syntactic annotations, as well as a new annotation user interface. The new features were developed and used in an annotation project for semantic roles on German texts. The paper briefly introduces this project and reports on experiences performing annotations with the new tool. On a comparative evaluation, our tool reaches significant speedups over WebAnno 2 for a semantic annotation task.

2015

Narrowing the Loop: Integration of Resources and Linguistic Dataset Development with Interactive Machine Learning
Seid Muhie Yimam
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

2014

Automatic Annotation Suggestions and Custom Annotation Layers in WebAnno
Seid Muhie Yimam | Chris Biemann | Richard Eckart de Castilho | Iryna Gurevych
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations

2013

WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations
Seid Muhie Yimam | Iryna Gurevych | Richard Eckart de Castilho | Chris Biemann
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations

Co-authors

Ibrahim Said Ahmad 7

Meriem Beloucif 7

David Ifeoluwa Adelani 6

Israel Abebe Azime 6

Nedjma Ousidhoum 6

Saif Mohammad 5

Alexander Panchenko 5

Oumaima Hourrane 4

Olga Kolesnikova 4

Atnafu Lambebo Tonja 4

Vladimir Araujo 3

Richard Eckart De Castilho 3

Robert Geislinger 3

Iryna Gurevych 3

Steffen Remus 3

Sebastian Ruder 3

Samuel Rutunda 3

Martin Semmann 3

Nirmal Surange 3

Gregor Wiedemann 3

Christine de Kock 3

Sanja Štajner 3

Mohamed Abdalla 2

Sanchit Ahuja 2

Alham Fikri Aji 2

Hizkiel Mitiku Alemayehu 2

Tadesse Belay 2

Muhammad Yahuza Bello 2

Daryna Dementieva 2

Hagos Tesfahun Gebremichael 2

Sukairaj Hafiz Imam 2

Ebrahim Chekol Jibril 2

Dietrich Klakow 2

Daniil Moskovskiy 2

Animesh Mukherjee 2

Hellina Hailu Nigatu 2

Fynn Petersen-Frey 2

Naquee Rizwan 2

Florian Schneider 2

Manish Shrivastava 2

Grigori Sidorov 2

Thamar Solorio 2

Krishnapriya Vishnubhotla 2

Negasi Haile Abadi 1

Henok Biadglign Ademtew 1

Tosin Adewumi 1

Mofetoluwa Adeyemi 1

Orevaoghene Ahia 1

Ibrahim Sa’id Ahmad 1

Ahmed Haj Ahmed 1

Bedru Yimam Ahmed 1

Adewale Akinfaderin 1

Victor Akinode 1

Jesujoba Alabi 1

Esubalew Alemneh 1

Felermino Dário Mário António Ali 1

Felermino Dario Mario Ali 1

Adem Chanie Ali 1

Saminu Mohammad Aliyu 1

Lukman Jibril Aliyu 1

Emmanuel Anebi 1

Aremu Anuoluwapo 1

Stephen Arthur 1

Tesfa Tegegne Asfaw 1

Ayodele Awokoya 1

Nikolay Babakov 1

Hailu Beshada Balcha 1

Kathrin Ballweg 1

Pavan Baswani 1

Tobius Saul Bateesa 1

Bello Shehu Bello 1

Pushpak Bhattacharyya 1

Anirban Bhowmick 1

Aarushi Ajay Borkar 1

Sofia Bourhim 1

Pavel Brazdil 1

Philipp Breitfeld 1

Andiswa Bukula 1

Happy Buzaaba 1

Sisay Adugna Chala 1

Chiamaka Ijeoma Chukwuneke 1

Chiamaka Chukwuneke 1

Thierno Ibrahima DIOP 1

Abdoulaye Diallo 1

Daniel Djahangir 1

Bonaventure F. P. Dossou 1

Daniel D’souza 1

Ashraf Elnagar 1

Chris Chinenye Emezue 1

Ignatius Ezeani 1

Abdoulaye Faye 1

Alexander Friedrich 1

Mitiku Yohannes Fuge 1

Rudy Alexandro Garrido Veliz 1

Gashaw Gebremeskel 1

Dibora Gebreyohannes 1

Dawit Ketema Gete 1

Catherine Gitau 1

Alvin Grissom II 1

Tadesse Kebede Guge 1

Tajuddeen Gwadabe 1

Tajuddeen Rabiu Gwadabe 1

Christian Haase 1

Frank Hammerschmidt 1

Torben Hannemann 1

Silvana Hartmann 1

Andreas Holzinger 1

Falalu Ibrahim 1

Elyas Abdi Ismail 1

Eyasu Shiferaw Jada 1

Esubalew Alemneh Jalew 1

Melese Ayichlie Jigar 1

Gopichand Kanumolu 1

Maurice Katusiime 1

Katharina Kleinen-von Königslöw 1

Ekaterina Kochmar 1

Julia Kreutzer 1

John S. Y. Lee 1

Franziska Lehmann 1

Constantine Lignos 1

Terry Lima Ruas 1

Mouhamadane MBOUP 1

Rooweither Mabuya 1

Lokesh Madasu 1

Shervin Malmasi 1

Tendai Marengereke 1

Héctor Martínez Alonso 1

Stephen Mayhew 1

Derguene Mbaye 1

Moges Ahmed Mehamed 1

Moges Ahmed Ah Mehamed 1

Wendimu Baye Messelle 1

Zewdie Mossie 1

Jonathan Mukiibi 1

Gerald Muriuki 1

Éva Mújdricza-Maydt 1

Deborah Nabagereka 1

Joyce Nakatumba-Nabende 1

Graham Neubig 1

Irina Nikishina 1

Rubungo Andre Niyongabo 1

Kelechi Nwaike 1

Kelechi Ogueji 1

Temilola Oloyede 1

Nelson Odhiambo Onyango 1

Bernard Opoku 1

Abigail Oppong 1

Verrah Akinyi Otiende 1

Samuel Oyerinde 1

Gustavo Paetzold 1

Chester Palen-Michel 1

Ali Ebrahimi Pourasad 1

Michaela Regneri 1

Shruti Rijhwani 1

Marcel Rosenbach 1

Paul Röttger 1

Punyajoy Saha 1

Babangida Sani 1

Özge Sevgili 1

Walelign Tewabe Sewunetie 1

Walelign Sewunetie 1

Blessing Kudzaishe Sibanda 1

Clemencia Siro 1

Dr. Florian Skupin 1

Philipp Slusallek 1

Philip Slusallek 1

Sören Spiegel 1

Steffen Stahlhacke 1

Elisei Stakovskii 1

Gregor Stange 1

Daniela Teodorescu 1

Hailegnaw Tilaye 1

Heiner Ulrich 1

Hadiza Ali Umar 1

Gopalakrishnan Venkatesh 1

Jan Philip Wahle 1

Eric Peter Wairagala 1

Yvonne Wambui 1

Lilian Diana Awuor Wanzare 1

Aman Kassahun Wassie 1

Max Wiechmann 1

Genta Indra Winata 1

Michael Melese Woldeyohannis 1

Viktoria Wrobel 1

Mesay Gemeda Yigezu 1

Marcos Zampieri 1

Niklas von Boguszewski 1

Tatiana von Landesberger 1

Gerret von Nordheim 1

Frank Ückert 1

Venues