Paul Buitelaar

2026

We investigate the role of large language models (LLMs) in promoting gender-inclusive language by evaluating their ability to rewrite biased text and generate counterfactual narratives across multiple languages. We introduce a shared task with two subtasks: gender-inclusive rewriting and counterfactual generation. The task covers five languages English, German, Spanish, Tamil, and Kannada reflecting diverse grammatical gender systems and sociocultural contexts. We release curated word-level and sentence-level datasets to support controlled inclusive generation. A total of 50 teams registered for the shared task, and around 8 teams submitted results. Submissions are evaluated using a hybrid framework combining rubric-based automatic scoring with expert human judgment. Finally, we provide an overview of participating systems and discuss key findings and challenges observed across languages.

pdf bib abs

Findings of Shared Task on Counter Narrative Generation on Homophobic and Transphobic Comments
Prasanna Kumar Kumaresan | Praveen Prasannan | Tanay Singh | Ruba Priyadharshini | Subalalitha Chinnaudayar Navaneethakrishnan | Saranya Rajiakodi | Paul Buitelaar | Bharathi Raja Chakravarthi
Proceedings of the Sixth Workshop on Language Technology for Equality, Diversity, Inclusion

Online platforms continue to witness harmful expressions targeting LGBTQ+ individuals, particularly in the form of homophobic and transphobic comments. While detection of such content has received substantial attention, generating constructive counter-narratives remains comparatively underexplored. In this shared task, we focus on counter-narrative generation in English and Tamil. Participants were provided with social media comments labeled as homophobic or transphobic and were required to generate respectful, contextually appropriate responses that challenge prejudice and promote empathy. Systems were evaluated using both reference-based metrics (Distinct-2 and BERTScore-F1) and rubric-based human evaluation metrics measuring politeness (PRS), quality (QS), and contextual coherence (CCNC). The results demonstrate variation in system performance across languages, with English systems showing stronger lexical diversity and Tamil systems excelling in politeness and contextual coherence. This paper presents dataset statistics, evaluation methodology, system performance analysis, and key observations from the shared task.

pdf bib abs

A Progressive Evaluation Framework for Multicultural Analysis of Story Visualization
Janak Kapuriya | Ali Hatami | Paul Buitelaar
Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)

Recent advancements in text-to-image generative models have improved narrative consistency in story visualization. However, current story visualization models often overlook cultural dimensions, resulting in visuals that lack cultural fidelity. In this study, we present a progressive evaluation framework for story visualization. We validate this framework on current text-to-image models across three languages (English, Hindi, and Chinese) on two datasets (VIST and FlintstonesSV). The proposed framework introduces three levels of cultural analysis as evaluation rubrics: 1) Basic Cultural Criteria, 2) Cultural Dimension Guidance, and 3) Cultural Examples Grounding. We evaluate story visualization by use of a novel MLLM-as-Jury approach across all three rubrics and a small-scale human evaluation only on the third rubric. We implement an MLLM-as-jury approach by aggregating scores from three different families of MLLM-as-Judge models. In our experiments, real-world stories generally receive higher cultural appropriateness scores than animated ones, with English tending to score higher than Hindi and Chinese across the evaluated models. Some examples also exhibited culturally inconsistent or stereotypical elements noted by annotators. The proposed progressive evaluation framework has therefore been shown to provide early insights into cultural misalignments in story visualization. Code for this work is made available on https://github.com/janak11111/Cultural_Eval_For_StoryViz

2025

pdf bib abs

Leveraging Visual Scene Graph to Enhance Translation Quality in Multimodal Machine Translation
Ali Hatami | Mihael Arcan | Paul Buitelaar
Proceedings of Machine Translation Summit XX: Volume 1

Despite significant advancements in Multimodal Machine Translation, understanding and effectively utilising visual scenes within multimodal models remains a complex challenge. Extracting comprehensive and relevant visual features requires extensive and detailed input data to ensure the model accurately captures objects, their attributes, and relationships within a scene. In this paper, we explore using visual scene graphs extracted from images to enhance the performance of translation models. We investigate this approach for integrating Visual Scene Graph information into translation models, focusing on representing this information in a semantic structure rather than relying on raw image data. The performance of our approach was evaluated on the Multi30K dataset for English into German, French, and Czech translations using BLEU, chrF2, TER and COMET metrics. Our results demonstrate that utilising visual scene graph information improves translation performance. Using information on semantic structure can improve the multimodal baseline model, leading to better contextual understanding and translation accuracy.

pdf bib abs

Overview of Homophobia and Transphobia Span Detection in Social Media Comments
Prasanna Kumar Kumaresan | Bharathi Raja Chakravarthi | Ruba Priyadharshini | Paul Buitelaar | Malliga Subramanian | Kishore Kumar Ponnusamy
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion

The rise and the intensity of harassment and hate speech in social media platforms against LGBTQ+ communities is a growing concern. This work is an initiative to address this problem by conducting a shared task focused on the detection of homophobic and transphobic content in multilingual settings. The task comprises two subtasks: (1) multi-class classification of content into Homophobia, Transphobia, or Non-anti-LGBT+ categories across eight languages and (2) span-level detection to identify specific toxic segments within comments in English, Tamil, and Marathi. This initiative helps the development of explainable and socially re- sponsible AI tools for combating identity-based harm in digital spaces. Multiple teams registered for the task, however only two teams submitted their results, and the results were evaluated using the macro F1 score.

pdf bib abs

The increasing prevalence of misogynistic content in online memes has raised concerns about their impact on digital discourse. The culture specific images and informal usage of text in the memes present considerable challenges for the automatic detection systems, especially in low-resource languages. While previous shared tasks have addressed misogyny detection in English and several European languages, misogynistic meme detection in the Chinese has remained largely unexplored. To address this gap, we introduced a shared task focused on binary classification of Chinese language memes as misogynistic or non-misogynistic. The task featured memes collected from the Chinese social media and annotated by native speakers. A total of 45 teams registered, with 8 teams submitting predictions from their multimodal models integrating textual and visual features through diverse fusion strategies. The best-performing system achieved a macro F1-score of 0.93035, highlighting the effectiveness of lightweight pretrained encoder fusion. This system used the Chinese BERT and DenseNet-121 for text and image feature extraction, respectively. A feedforward network was trained as a classifier using the features obtained by concatenating text and image features.

pdf bib abs

Personalized recommender systems play a crucial role in direct marketing, particularly in financial services, where delivering relevant content can enhance customer engagement and promote informed decision-making. This study explores interpretable knowledge graph (KG)-based recommender systems by proposing two distinct approaches for personalized article recommendations within a multinational financial services firm. The first approach leverages Reinforcement Learning (RL) to traverse a KG constructed from both structured (tabular) and unstructured (textual) data, enabling interpretability through Path Directed Reasoning (PDR). The second approach employs the XGBoost algorithm, with post-hoc explainability techniques such as SHAP and ELI5 to enhance transparency. By integrating machine learning with automatically generated KGs, our methods not only improve recommendation accuracy but also provide interpretable insights, facilitating more informed decision-making in customer relationship management.

pdf bib abs

Towards Semantic Integration of Opinions: Unified Opinion Concepts Ontology and Extraction Task
Gaurav Negi | Dhairya Dalal | Omnia Zayed | Paul Buitelaar
Proceedings of the 5th Conference on Language, Data and Knowledge

This paper introduces the Unified Opinion Concepts (UOC) ontology to integrate opinions within their semantic context. The UOC ontology bridges the gap between the semantic representation of opinion across different formulations. It is a unified conceptualisation based on the facets of opinions studied extensively in NLP and semantic structures described through symbolic descriptions. We further propose the Unified Opinion Concept Extraction (UOCE) task of extracting opinions from the text with enhanced expressivity. Additionally, we provide a manually extended and re-annotated evaluation dataset for this task and tailored evaluation metrics to assess the adherence of extracted opinions to UOC semantics. Finally, we establish baseline performance for the UOCE task using state-of-the-art generative models.

pdf bib abs

DiaSafety-CC: Annotating Dialogues with Safety Labels and Reasons for Cross-Cultural Analysis
Tunde Oluwaseyi Ajayi | Mihael Arcan | Paul Buitelaar
Proceedings of the 5th Conference on Language, Data and Knowledge

A dialogue dataset developed in a language can have diverse safety annotations when presented to raters from different cultures. What is considered acceptable in one culture can be perceived as offensive in another culture. Cultural differences in dialogue safety annotation is yet to be fully explored. In this work, we use the geopolitical entity, Country, as our base for cultural study. We extend DiaSafety, an existing English dialogue safety dataset that was originally annotated by raters from Western culture, to create a new dataset, DiaSafety-CC. In our work, three raters each from Nigeria and India reannotate the DiaSafety dataset and provide reasons for their choice of labels. We perform pairwise comparisons of the annotations across the cultures studied. Furthermore, we compare the representative labels of each rater group to that of an existing large language model (LLM). Due to the subjectivity of the dialogue annotation task, 32.6% of the considered dialogues achieve unanimous annotation consensus across the labels of DiaSafety and the six raters. In our analyses, we observe that the Unauthorized Expertise and Biased Opinion categories have dialogues with the highest label disagreement ratio across the cultures studied. On manual inspection of the reasons provided for the choice of labels, we observe that raters across the cultures in DiaSafety-CC are sensitive to dialogues directed at target groups compared to dialogues directed at individuals. We also observe that GPT-4o annotation shows a more positive agreement with DiaSafety labels in terms of F1 score and phi coefficient.

pdf bib abs

Findings of the Shared Task on Misogyny Meme Detection: DravidianLangTech@NAACL 2025
Bharathi Raja Chakravarthi | Rahul Ponnusamy | Saranya Rajiakodi | Shunmuga Priya Muthusamy Chinnan | Paul Buitelaar | Bhuvaneswari Sivagnanam | Anshid Kizhakkeparambil
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

The rapid expansion of social media has facilitated communication but also enabled the spread of misogynistic memes, reinforcing gender stereotypes and toxic online environments. Detecting such content is challenging due to the multimodal nature of memes, where meaning emerges from the interplay of text and images. The Misogyny Meme Detection shared task at DravidianLangTech@NAACL 2025 focused on Tamil and Malayalam, encouraging the development of multimodal approaches. With 114 teams registered and 23 submitting predictions, participants leveraged various pretrained language models and vision models through fusion techniques. The best models achieved high macro F1 scores (0.83682 for Tamil, 0.87631 for Malayalam), highlighting the effectiveness of multimodal learning. Despite these advances, challenges such as bias in the data set, class imbalance, and cultural variations persist. Future research should refine multimodal detection methods to improve accuracy and adaptability, fostering safer and more inclusive online spaces.

pdf bib abs

This overview paper presents the findings of the Shared Task on Abusive Tamil and Malayalam Text Targeting Women on Social Media, organized as part of DravidianLangTech@NAACL 2025. The task aimed to encourage the development of robust systems to detectabusive content targeting women in Tamil and Malayalam, two low-resource Dravidian languages. Participants were provided with annotated datasets containing abusive and nonabusive text curated from YouTube comments. We present an overview of the approaches and analyse the results of the shared task submissions. We believe the findings presented in this paper will be useful to researchers working in Dravidian language technology.

pdf bib abs

LUCE: A Dynamic Framework and Interactive Dashboard for Opinionated Text Analysis
Omnia Zayed | Gaurav Negi | Sampritha Hassan Manjunath | Devishree Pillai | Paul Buitelaar
Proceedings of the 31st International Conference on Computational Linguistics: System Demonstrations

We introduce LUCE, an advanced dynamic framework with an interactive dashboard for analysing opinionated text aiming to understand people-centred communication. The framework features computational modules of text classification and extraction explicitly designed for analysing different elements of opinions, e.g., sentiment/emotion, suggestion, figurative language, hate/toxic speech, and topics. We designed the framework using a modular architecture, allowing scalability and extensibility with the aim of supporting other NLP tasks in subsequent versions. LUCE comprises trained models, python-based APIs, and a user-friendly dashboard, ensuring an intuitive user experience. LUCE has been validated in a relevant environment, and its capabilities and performance have been demonstrated through initial prototypes and pilot studies.

2024

pdf bib abs

English-to-Low-Resource Translation: A Multimodal Approach for Hindi, Malayalam, Bengali, and Hausa
Ali Hatami | Shubhanker Banerjee | Mihael Arcan | Paul Buitelaar | John Philip McCrae
Proceedings of the Ninth Conference on Machine Translation

Multimodal machine translation leverages multiple data modalities to enhance translation quality, particularly for low-resourced languages. This paper uses a Multimodal model that integrates visual information with textual data to improve translation accuracy from English to Hindi, Malayalam, Bengali, and Hausa. This approach employs a gated fusion mechanism to effectively combine the outputs of textual and visual encoders, enabling more nuanced translations that consider both language and contextual visual cues. The performance of the multimodal model was evaluated against the text-only machine translation model based on BLEU, ChrF2 and TER. Experimental results demonstrate that the multimodal approach consistently outperforms the text-only baseline, highlighting the potential of integrating visual information in low-resourced language translation tasks.

pdf bib abs

This paper describes the structure and findings of the WILDRE 2024 shared task on Code-mixed Less-resourced Sentiment Analysis for Indo-Aryan Languages. The participants were asked to submit the test data’s final prediction on CodaLab. A total of fourteen teams registered for the shared task. Only four participants submitted the system for evaluation on CodaLab, with only two teams submitting the system description paper. While all systems show a rather promising performance, they outperform the baseline scores.

pdf bib abs

Cross-lingual Transfer and Multilingual Learning for Detecting Harmful Behaviour in African Under-Resourced Language Dialogue
Tunde Oluwaseyi Ajayi | Mihael Arcan | Paul Buitelaar
Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Most harmful dialogue detection models are developed for high-resourced languages. Consequently, users who speak under-resourced languages cannot fully benefit from these models in terms of usage, development, detection and mitigation of harmful dialogue utterances. Our work aims at detecting harmful utterances in under-resourced African languages. We leverage transfer learning using pretrained models trained with multilingual embeddings to develop a cross-lingual model capable of detecting harmful content across various African languages. We first fine-tune a harmful dialogue detection model on a selected African dialogue dataset. Additionally, we fine-tune a model on a combined dataset in some African languages to develop a multilingual harmful dialogue detection model. We then evaluate the cross-lingual model’s ability to generalise to an unseen African language by performing harmful dialogue detection in an under-resourced language not present during pretraining or fine-tuning. We evaluate our models on the test datasets. We show that our best performing models achieve impressive results in terms of F1 score. Finally, we discuss the results and limitations of our work.

pdf bib abs

Using Information Retrieval Techniques to Automatically Repurpose Existing Dialogue Datasets for Safe Chatbot Development
Tunde Oluwaseyi Ajayi | Gaurav Negi | Mihael Arcan | Paul Buitelaar
Proceedings of Safety4ConvAI: The Third Workshop on Safety for Conversational AI @ LREC-COLING 2024

There has been notable progress in the development of open-domain dialogue systems (chatbots) especially with the rapid advancement of the capabilities of Large Language Models. Chatbots excel at holding conversations in a manner that keeps a user interested and engaged. However, their responses can be unsafe, as they can respond in an offensive manner or offer harmful professional advice. As a way to mitigate this issue, recent work crowdsource datasets with exemplary responses or annotate dialogue safety datasets, which are relatively scarce compared to casual dialogues. Despite the quality of data obtained from crowdsourcing, it can be expensive and time consuming. This work proposes an effective pipeline, using information retrieval, to automatically repurpose existing dialogue datasets for safe chatbot development, as a way to address the aforementioned challenges. We select an existing dialogue dataset, revise its unsafe responses, as a way to obtain a dataset with safer responses to unsafe user inputs. We then fine-tune dialogue models on the original and revised datasets and generate responses to evaluate the safeness of the models.

This paper provides a comprehensive summary of the “Homophobia and Transphobia Detection in Social Media Comments” shared task, which was held at the LT-EDI@EACL 2024. The objective of this task was to develop systems capable of identifying instances of homophobia and transphobia within social media comments. This challenge was extended across ten languages: English, Tamil, Malayalam, Telugu, Kannada, Gujarati, Hindi, Marathi, Spanish, and Tulu. Each comment in the dataset was annotated into three categories. The shared task attracted significant interest, with over 60 teams participating through the CodaLab platform. The submission of prediction from the participants was evaluated with the macro F1 score.

pdf bib

Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion
Bharathi Raja Chakravarthi | Bharathi B | Paul Buitelaar | Thenmozhi Durairaj | György Kovács | Miguel Ángel García Cumbreras
Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion

pdf bib abs

From Laughter to Inequality: Annotated Dataset for Misogyny Detection in Tamil and Malayalam Memes
Rahul Ponnusamy | Kathiravan Pannerselvam | Saranya Rajiakodi | Prasanna Kumar Kumaresan | Sajeetha Thavareesan | Bhuvaneswari Sivagnanam | Anshid K.A | Susminu S Kumar | Paul Buitelaar | Bharathi Raja Chakravarthi
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In this digital era, memes have become a prevalent online expression, humor, sarcasm, and social commentary. However, beneath their surface lies concerning issues such as the propagation of misogyny, gender-based bias, and harmful stereotypes. To overcome these issues, we introduced MDMD (Misogyny Detection Meme Dataset) in this paper. This article focuses on creating an annotated dataset with detailed annotation guidelines to delve into online misogyny within the Tamil and Malayalam-speaking communities. Through analyzing memes, we uncover the intricate world of gender bias and stereotypes in these communities, shedding light on their manifestations and impact. This dataset, along with its comprehensive annotation guidelines, is a valuable resource for understanding the prevalence, origins, and manifestations of misogyny in various contexts, aiding researchers, policymakers, and organizations in developing effective strategies to combat gender-based discrimination and promote equality and inclusivity. It enables a deeper understanding of the issue and provides insights that can inform strategies for cultivating a more equitable and secure online environment. This work represents a crucial step in raising awareness and addressing gender-based discrimination in the digital space.

pdf bib abs

A Hybrid Approach to Aspect Based Sentiment Analysis Using Transfer Learning
Gaurav Negi | Rajdeep Sarkar | Omnia Zayed | Paul Buitelaar
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Aspect-Based Sentiment Analysis ( ABSA) aims to identify terms or multiword expressions (MWEs) on which sentiments are expressed and the sentiment polarities associated with them. The development of supervised models has been at the forefront of research in this area. However, training these models requires the availability of manually annotated datasets which is both expensive and time-consuming. Furthermore, the available annotated datasets are tailored to a specific domain, language, and text type. In this work, we address this notable challenge in current state-of-the-art ABSA research. We propose a hybrid approach for Aspect Based Sentiment Analysis using transfer learning. The approach focuses on generating weakly-supervised annotations by exploiting the strengths of both large language models (LLM) and traditional syntactic dependencies. We utilise syntactic dependency structures of sentences to complement the annotations generated by LLMs, as they may overlook domain-specific aspect terms. Extensive experimentation on multiple datasets is performed to demonstrate the efficacy of our hybrid method for the tasks of aspect term extraction and aspect sentiment classification.

pdf bib abs

Dataset for Identification of Homophobia and Transphobia for Telugu, Kannada, and Gujarati
Prasanna Kumar Kumaresan | Rahul Ponnusamy | Dhruv Sharma | Paul Buitelaar | Bharathi Raja Chakravarthi
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Users of social media platforms are negatively affected by the proliferation of hate or abusive content. There has been a rise in homophobic and transphobic content in recent years targeting LGBT+ individuals. The increasing levels of homophobia and transphobia online can make online platforms harmful and threatening for LGBT+ persons, potentially inhibiting equality, diversity, and inclusion. We are introducing a new dataset for three languages, namely Telugu, Kannada, and Gujarati. Additionally, we have created an expert-labeled dataset to automatically identify homophobic and transphobic content within comments collected from YouTube. We provided comprehensive annotation rules to educate annotators in this process. We collected approximately 10,000 comments from YouTube for all three languages. Marking the first dataset of these languages for this task, we also developed a baseline model with pre-trained transformers.

pdf bib abs

Enhancing Translation Quality by Leveraging Semantic Diversity in Multimodal Machine Translation
Ali Hatami | Mihael Arcan | Paul Buitelaar
Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

Despite advancements in neural machine translation, word sense disambiguation remains challenging, particularly with limited textual context. Multimodal Machine Translation enhances text-only models by integrating visual information, but its impact varies across translations. This study focuses on ambiguous sentences to investigate the effectiveness of utilizing visual information. By prioritizing these sentences, which benefit from visual cues, we aim to enhance hybrid multimodal and text-only translation approaches. We utilize Latent Semantic Analysis and Sentence-BERT to extract context vectors from the British National Corpus, enabling the assessment of semantic diversity. Our approach enhances translation quality for English-German and English-French on Multi30k, assessed through metrics including BLEU, chrF2, and TER.

pdf bib abs

Inference to the Best Explanation in Large Language Models
Dhairya Dalal | Marco Valentino | Andre Freitas | Paul Buitelaar
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

While Large Language Models (LLMs) have found success in real-world applications, their underlying explanatory process is still poorly understood. This paper proposes IBE-Eval, a framework inspired by philosophical accounts on Inference to the Best Explanation (IBE) to advance the interpretation and evaluation of LLMs’ explanations. IBE-Eval estimates the plausibility of natural language explanations through a combination of explicit logical and linguistic features including: consistency, parsimony, coherence, and uncertainty. Extensive experiments are conducted on Causal Question Answering (CQA), where IBE-Eval is tasked to select the most plausible causal explanation amongst competing ones generated by LLMs (i.e., GPT 3.5 and Llama 2). The experiments reveal that IBE-Eval can successfully identify the best explanation with up to 77% accuracy (≈ 27% above random), improving upon a GPT 3.5-as-a-Judge baseline (≈+17%) while being intrinsically more efficient and interpretable. Additional analyses suggest that, despite model-specific variances, LLM-generated explanations tend to conform to IBE criteria and that IBE-Eval is significantly correlated with human judgment, opening up opportunities for future development of automated explanation verification tools.

2023

pdf bib abs

A Filtering Approach to Object Region Detection in Multimodal Machine Translation
Ali Hatami | Paul Buitelaar | Mihael Arcan
Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track

Recent studies in Multimodal Machine Translation (MMT) have explored the use of visual information in a multimodal setting to analyze its redundancy with textual information. The aim of this work is to develop a more effective approach to incorporating relevant visual information into the translation process and improve the overall performance of MMT models. This paper proposes an object-level filtering approach in Multimodal Machine Translation, where the approach is applied to object regions extracted from an image to filter out irrelevant objects based on the image captions to be translated. Using the filtered image helps the model to consider only relevant objects and their relative locations to each other. Different matching methods, including string matching and word embeddings, are employed to identify relevant objects. Gaussian blurring is used to soften irrelevant objects from the image and to evaluate the effect of object filtering on translation quality. The performance of the filtering approaches was evaluated on the Multi30K dataset in English to German, French, and Czech translations, based on BLEU, ChrF2, and TER metrics.

pdf bib abs

Overview of Second Shared Task on Homophobia and Transphobia Detection in Social Media Comments
Bharathi Raja Chakravarthi | Rahul Ponnusamy | Malliga Subramanian | Paul Buitelaar | Miguel Ángel García-Cumbreras | Salud María Jiménez-Zafra | José Antonio García-Díaz | Rafael Valencia-García | Nitesh Jindal
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion

We present an overview of the second shared task on homophobia/transphobia Detection in social media comments. Given a comment, a system must predict whether or not it contains any form of homophobia/transphobia. The shared task included five languages: English, Spanish, Tamil, Hindi, and Malayalam. The data was given for two tasks. Task A was given three labels, and Task B fine-grained seven labels. In total, 75 teams enrolled for the shared task in Codalab. For task A, 12 teams submitted systems for English, eight teams for Tamil, eight teams for Spanish, and seven teams for Hindi. For task B, nine teams submitted for English, 7 teams for Tamil, 6 teams for Malayalam. We present and analyze all submissions in this paper.

pdf bib

Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion
Bharathi R. Chakravarthi | B. Bharathi | Joephine Griffith | Kalika Bali | Paul Buitelaar
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion

pdf bib

CURED4NLG: A Dataset for Table-to-Text Generation
Nivranshu Pasricha | Mihael Arcan | Paul Buitelaar
Proceedings of the 4th Conference on Language, Data and Knowledge

pdf bib

Multimodal Offensive Meme Classification with Natural Language Inference
Shardul Suryawanshi | Mihael Arcan | Suzanne Little | Paul Buitelaar
Proceedings of the 4th Conference on Language, Data and Knowledge

pdf bib abs

Identifying FrameNet Lexical Semantic Structures for Knowledge Graph Extraction from Financial Customer Interactions
Cécile Robin | Atharva Kulkarni | Paul Buitelaar
Proceedings of the 12th Global Wordnet Conference

We explore the use of the well established lexical resource and theory of the Berkeley FrameNet project to support the creation of a domain-specific knowledge graph in the financial domain, more precisely from financial customer interactions. We introduce a domain independent and unsupervised method that can be used across multiple applications, and test our experiments on the financial domain. We use an existing tool for term extraction and taxonomy generation in combination with information taken from FrameNet. By using principles from frame semantic theory, we show that we can connect domain-specific terms with their semantic concepts (semantic frames) and their properties (frame elements) to enrich knowledge about these terms, in order to improve the customer experience in customer-agent dialogue settings.

pdf bib abs

CALM-Bench: A Multi-task Benchmark for Evaluating Causality-Aware Language Models
Dhairya Dalal | Paul Buitelaar | Mihael Arcan
Findings of the Association for Computational Linguistics: EACL 2023

Causal reasoning is a critical component of human cognition and is required across a range of question-answering (QA) tasks (such as abductive reasoning, commonsense QA, and procedural reasoning). Research on causal QA has been underdefined, task-specific, and limited in complexity. Recent advances in foundation language models (such as BERT, ERNIE, and T5) have shown the efficacy of pre-trained models across diverse QA tasks. However, there is limited research exploring the causal reasoning capabilities of those language models and no standard evaluation benchmark. To unify causal QA research, we propose CALM-Bench, a multi-task benchmark for evaluating causality-aware language models (CALM). We present a standardized definition of causal QA tasks and show empirically that causal reasoning can be generalized and transferred across different QA tasks. Additionally, we share a strong multi-task baseline model which outperforms single-task fine-tuned models on the CALM-Bench tasks.

2022

pdf bib abs

Towards Bootstrapping a Chatbot on Industrial Heritage through Term and Relation Extraction
Mihael Arcan | Rory O’Halloran | Cécile Robin | Paul Buitelaar
Proceedings of the 2nd International Workshop on Natural Language Processing for Digital Humanities

We describe initial work in developing a methodology for the automatic generation of a conversational agent or ‘chatbot’ through term and relation extraction from a relevant corpus of language data. We develop our approach in the domain of industrial heritage in the 18th and 19th centuries, and more specifically on the industrial history of canals and mills in Ireland. We collected a corpus of relevant newspaper reports and Wikipedia articles, which we deemed representative of a layman’s understanding of this topic. We used the Saffron toolkit to extract relevant terms and relations between the terms from the corpus and leveraged the extracted knowledge to query the British Library Digital Collection and the Project Gutenberg library. We leveraged the extracted terms and relations in identifying possible answers for a constructed set of questions based on the extracted terms, by matching them with sentences in the British Library Digital Collection and the Project Gutenberg library. In a final step, we then took this data set of question-answer pairs to train a chatbot. We evaluate our approach by manually assessing the appropriateness of the generated answers for a random sample, each of which is judged by four annotators.

pdf bib abs

Analysing the Correlation between Lexical Ambiguity and Translation Quality in a Multimodal Setting using WordNet
Ali Hatami | Paul Buitelaar | Mihael Arcan
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop

Multimodal Neural Machine Translation is focusing on using visual information to translate sentences in the source language into the target language. The main idea is to utilise information from visual modalities to promote the output quality of the text-based translation model. Although the recent multimodal strategies extract the most relevant visual information in images, the effectiveness of using visual information on translation quality changes based on the text dataset. Due to this, this work studies the impact of leveraging visual information in multimodal translation models of ambiguous sentences. Our experiments analyse the Multi30k evaluation dataset and calculate ambiguity scores of sentences based on the WordNet hierarchical structure. To calculate the ambiguity of a sentence, we extract the ambiguity scores for all nouns based on the number of senses in WordNet. The main goal is to find in which sentences, visual content can improve the text-based translation model. We report the correlation between the ambiguity scores and translation quality extracted for all sentences in the English-German dataset.

pdf bib abs

Overview of The Shared Task on Homophobia and Transphobia Detection in Social Media Comments
Bharathi Raja Chakravarthi | Ruba Priyadharshini | Durairaj Thenmozhi | John Philip McCrae | Paul Buitelaar | Rahul Ponnusamy | Prasanna Kumar Kumaresan
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

Homophobia and Transphobia Detection is the task of identifying homophobia, transphobia, and non-anti-LGBT+ content from the given corpus. Homophobia and transphobia are both toxic languages directed at LGBTQ+ individuals that are described as hate speech. This paper summarizes our findings on the “Homophobia and Transphobia Detection in social media comments” shared task held at LT-EDI 2022 - ACL 2022 1. This shared taskfocused on three sub-tasks for Tamil, English, and Tamil-English (code-mixed) languages. It received 10 systems for Tamil, 13 systems for English, and 11 systems for Tamil-English. The best systems for Tamil, English, and Tamil-English scored 0.570, 0.870, and 0.610, respectively, on average macro F1-score.

pdf bib

Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion
Bharathi Raja Chakravarthi | B Bharathi | John P McCrae | Manel Zarrouk | Kalika Bali | Paul Buitelaar
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

pdf bib abs

Linghub2: Language Resource Discovery Tool for Language Technologies
Cécile Robin | Gautham Vadakkekara Suresh | Víctor Rodríguez Doncel | John McCrae | Paul Buitelaar
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Language resources are a key component of natural language processing and related research and applications. Users of language resources have different needs in terms of format, language, topics, etc. for the data they need to use. Linghub (McCrae and Cimiano, 2015) was first developed for this purpose, using the capabilities of linked data to represent metadata, and tackling the heterogeneous metadata issue. Linghub aimed at helping language resources and technology users to easily find and retrieve relevant data, and identify important information on access, topics, etc. This work describes a rejuvenation and modernisation of the 2015 platform into using a popular open source data management system, DSpace, as foundation. The new platform, Linghub2, contains updated and extended resources, more languages offered, and continues the work towards homogenisation of metadata through conversions, through linkage to standardisation strategies and community groups, such as the Open Digital Rights Language (ODRL) community group.

2021

pdf bib

Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion
Bharathi Raja Chakravarthi | John P. McCrae | Manel Zarrouk | Kalika Bali | Paul Buitelaar
Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion

pdf bib abs

NUIG-DSI’s submission to The GEM Benchmark 2021
Nivranshu Pasricha | Mihael Arcan | Paul Buitelaar
Proceedings of the First Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)

This paper describes the submission by NUIG-DSI to the GEM benchmark 2021. We participate in the modeling shared task where we submit outputs on four datasets for data-to-text generation, namely, DART, WebNLG (en), E2E and CommonGen. We follow an approach similar to the one described in the GEM benchmark paper where we use the pre-trained T5-base model for our submission. We train this model on additional monolingual data where we experiment with different masking strategies specifically focused on masking entities, predicates and concepts as well as a random masking strategy for pre-training. In our results we find that random masking performs the best in terms of automatic evaluation metrics, though the results are not statistically significantly different compared to other masking strategies.

pdf bib abs

Enhancing Multiple-Choice Question Answering with Causal Knowledge
Dhairya Dalal | Mihael Arcan | Paul Buitelaar
Proceedings of Deep Learning Inside Out (DeeLIO): The 2nd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures

The task of causal question answering aims to reason about causes and effects over a provided real or hypothetical premise. Recent approaches have converged on using transformer-based language models to solve question answering tasks. However, pretrained language models often struggle when external knowledge is not present in the premise or when additional context is required to answer the question. To the best of our knowledge, no prior work has explored the efficacy of augmenting pretrained language models with external causal knowledge for multiple-choice causal question answering. In this paper, we present novel strategies for the representation of causal knowledge. Our empirical results demonstrate the efficacy of augmenting pretrained models with external causal knowledge. We show improved performance on the COPA (Choice of Plausible Alternatives) and WIQA (What If Reasoning Over Procedural Text) benchmark tasks. On the WIQA benchmark, our approach is competitive with the state-of-the-art and exceeds it within the evaluation subcategories of In-Paragraph and Out-of-Paragraph perturbations.

2020

pdf bib abs

Social media are interactive platforms that facilitate the creation or sharing of information, ideas or other forms of expression among people. This exchange is not free from offensive, trolling or malicious contents targeting users or communities. One way of trolling is by making memes, which in most cases combines an image with a concept or catchphrase. The challenge of dealing with memes is that they are region-specific and their meaning is often obscured in humour or sarcasm. To facilitate the computational modelling of trolling in the memes for Indian languages, we created a meme dataset for Tamil (TamilMemes). We annotated and released the dataset containing suspected trolls and not-troll memes. In this paper, we use the a image classification to address the difficulties involved in the classification of troll memes with the existing methods. We found that the identification of a troll meme with such an image classifier is not feasible which has been corroborated with precision, recall and F1-score.

pdf bib abs

Utilising Knowledge Graph Embeddings for Data-to-Text Generation
Nivranshu Pasricha | Mihael Arcan | Paul Buitelaar
Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+)

Data-to-text generation has recently seen a move away from modular and pipeline architectures towards end-to-end architectures based on neural networks. In this work, we employ knowledge graph embeddings and explore their utility for end-to-end approaches in a data-to-text generation task. Our experiments show that using knowledge graph embeddings can yield an improvement of up to 2 – 3 BLEU points for seen categories on the WebNLG corpus without modifying the underlying neural network architecture.

pdf bib abs

NUIG-DSI at the WebNLG+ challenge: Leveraging Transfer Learning for RDF-to-text generation
Nivranshu Pasricha | Mihael Arcan | Paul Buitelaar
Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+)

This paper describes the system submitted by NUIG-DSI to the WebNLG+ challenge 2020 in the RDF-to-text generation task for the English language. For this challenge, we leverage transfer learning by adopting the T5 model architecture for our submission and fine-tune the model on the WebNLG+ corpus. Our submission ranks among the top five systems for most of the automatic evaluation metrics achieving a BLEU score of 51.74 over all categories with scores of 58.23 and 45.57 across seen and unseen categories respectively.

pdf bib abs

Multimodal Meme Dataset (MultiOFF) for Identifying Offensive Content in Image and Text
Shardul Suryawanshi | Bharathi Raja Chakravarthi | Mihael Arcan | Paul Buitelaar
Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying

A meme is a form of media that spreads an idea or emotion across the internet. As posting meme has become a new form of communication of the web, due to the multimodal nature of memes, postings of hateful memes or related events like trolling, cyberbullying are increasing day by day. Hate speech, offensive content and aggression content detection have been extensively explored in a single modality such as text or image. However, combining two modalities to detect offensive content is still a developing area. Memes make it even more challenging since they express humour and sarcasm in an implicit way, because of which the meme may not be offensive if we only consider the text or the image. Therefore, it is necessary to combine both modalities to identify whether a given meme is offensive or not. Since there was no publicly available dataset for multimodal offensive meme content detection, we leveraged the memes related to the 2016 U.S. presidential election and created the MultiOFF multimodal meme dataset for offensive content detection dataset. We subsequently developed a classifier for this task using the MultiOFF dataset. We use an early fusion technique to combine the image and text modality and compare it with a text- and an image-only baseline to investigate its effectiveness. Our results show improvements in terms of Precision, Recall, and F-Score. The code and dataset for this paper is published in https://github.com/bharathichezhiyan/Multimodal-Meme-Classification-Identifying-Offensive-Content-in-Image-and-Text

pdf bib abs

NUIG at SemEval-2020 Task 12: Pseudo Labelling for Offensive Content Classification
Shardul Suryawanshi | Mihael Arcan | Paul Buitelaar
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This work addresses the classification problem defined by sub-task A (English only) of the OffensEval 2020 challenge. We used a semi-supervised approach to classify given tweets into an offensive (OFF) or not-offensive (NOT) class. As the OffensEval 2020 dataset is loosely labelled with confidence scores given by unsupervised models, we used last year’s offensive language identification dataset (OLID) to label the OffensEval 2020 dataset. Our approach uses a pseudo-labelling method to annotate the current dataset. We trained four text classifiers on the OLID dataset and the classifier with the highest macro-averaged F1-score has been used to pseudo label the OffensEval 2020 dataset. The same model which performed best amongst four text classifiers on OLID dataset has been trained on the combined dataset of OLID and pseudo labelled OffensEval 2020. We evaluated the classifiers with precision, recall and macro-averaged F1-score as the primary evaluation metric on the OLID and OffensEval 2020 datasets. This work is licensed under a Creative Commons Attribution 4.0 International Licence. Licence details: http://creativecommons.org/licenses/by/4.0/.

pdf bib abs

Figure Me Out: A Gold Standard Dataset for Metaphor Interpretation
Omnia Zayed | John P. McCrae | Paul Buitelaar
Proceedings of the Twelfth Language Resources and Evaluation Conference

Metaphor comprehension and understanding is a complex cognitive task that requires interpreting metaphors by grasping the interaction between the meaning of their target and source concepts. This is very challenging for humans, let alone computers. Thus, automatic metaphor interpretation is understudied in part due to the lack of publicly available datasets. The creation and manual annotation of such datasets is a demanding task which requires huge cognitive effort and time. Moreover, there will always be a question of accuracy and consistency of the annotated data due to the subjective nature of the problem. This work addresses these issues by presenting an annotation scheme to interpret verb-noun metaphoric expressions in text. The proposed approach is designed with the goal of reducing the workload on annotators and maintain consistency. Our methodology employs an automatic retrieval approach which utilises external lexical resources, word embeddings and semantic similarity to generate possible interpretations of identified metaphors in order to enable quick and accurate annotation. We validate our proposed approach by annotating around 1,500 metaphors in tweets which were annotated by six native English speakers. As a result of this work, we publish as linked data the first gold standard dataset for metaphor interpretation which will facilitate research in this area.

pdf bib abs

Evaluation Dataset and Methodology for Extracting Application-Specific Taxonomies from the Wikipedia Knowledge Graph
Georgeta Bordea | Stefano Faralli | Fleur Mougin | Paul Buitelaar | Gayo Diallo
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this work, we address the task of extracting application-specific taxonomies from the category hierarchy of Wikipedia. Previous work on pruning the Wikipedia knowledge graph relied on silver standard taxonomies which can only be automatically extracted for a small subset of domains rooted in relatively focused nodes, placed at an intermediate level in the knowledge graphs. In this work, we propose an iterative methodology to extract an application-specific gold standard dataset from a knowledge graph and an evaluation framework to comparatively assess the quality of noisy automatically extracted taxonomies. We employ an existing state of the art algorithm in an iterative manner and we propose several sampling strategies to reduce the amount of manual work needed for evaluation. A first gold standard dataset is released to the research community for this task along with a companion evaluation framework. This dataset addresses a real-world application from the medical domain, namely the extraction of food-drug and herb-drug interactions.

pdf bib abs

A Term Extraction Approach to Survey Analysis in Health Care
Cécile Robin | Mona Isazad Mashinchi | Fatemeh Ahmadi Zeleti | Adegboyega Ojo | Paul Buitelaar
Proceedings of the Twelfth Language Resources and Evaluation Conference

The voice of the customer has for a long time been a key focus of businesses in all domains. It has received a lot of attention from the research community in Natural Language Processing (NLP) resulting in many approaches to analyzing customers feedback ((aspect-based) sentiment analysis, topic modeling, etc.). In the health domain, public and private bodies are increasingly prioritizing patient engagement for assessing the quality of the service given at each stage of the care. Patient and customer satisfaction analysis relate in many ways. In the domain of health particularly, a more precise and insightful analysis is needed to help practitioners locate potential issues and plan actions accordingly. We introduce here an approach to patient experience with the analysis of free text questions from the 2017 Irish National Inpatient Survey campaign using term extraction as a means to highlight important and insightful subject matters raised by patients. We evaluate the results by mapping them to a manually constructed framework following the Activity, Resource, Context (ARC) methodology (Ordenes, 2014) and specific to the health care environment, and compare our results against manual annotations done on the full 2017 dataset based on those categories.

pdf bib abs

Contextual Modulation for Relation-Level Metaphor Identification
Omnia Zayed | John P. McCrae | Paul Buitelaar
Findings of the Association for Computational Linguistics: EMNLP 2020

Identifying metaphors in text is very challenging and requires comprehending the underlying comparison. The automation of this cognitive process has gained wide attention lately. However, the majority of existing approaches concentrate on word-level identification by treating the task as either single-word classification or sequential labelling without explicitly modelling the interaction between the metaphor components. On the other hand, while existing relation-level approaches implicitly model this interaction, they ignore the context where the metaphor occurs. In this work, we address these limitations by introducing a novel architecture for identifying relation-level metaphoric expressions of certain grammatical relations based on contextual modulation. In a methodology inspired by works in visual reasoning, our approach is based on conditioning the neural network computation on the deep contextualised features of the candidate expressions using feature-wise linear modulation. We demonstrate that the proposed architecture achieves state-of-the-art results on benchmark datasets. The proposed methodology is generic and could be applied to other textual classification problems that benefit from contextual interaction.

pdf bib abs

Adaptation of Word-Level Benchmark Datasets for Relation-Level Metaphor Identification
Omnia Zayed | John P. McCrae | Paul Buitelaar
Proceedings of the Second Workshop on Figurative Language Processing

Metaphor processing and understanding has attracted the attention of many researchers recently with an increasing number of computational approaches. A common factor among these approaches is utilising existing benchmark datasets for evaluation and comparisons. The availability, quality and size of the annotated data are among the main difficulties facing the growing research area of metaphor processing. The majority of current approaches pertaining to metaphor processing concentrate on word-level processing due to data availability. On the other hand, approaches that process metaphors on the relation-level ignore the context where the metaphoric expression. This is due to the nature and format of the available data. Word-level annotation is poorly grounded theoretically and is harder to use in downstream tasks such as metaphor interpretation. The conversion from word-level to relation-level annotation is non-trivial. In this work, we attempt to fill this research gap by adapting three benchmark datasets, namely the VU Amsterdam metaphor corpus, the TroFi dataset and the TSV dataset, to suit relation-level metaphor identification. We publish the adapted datasets to facilitate future research in relation-level metaphor processing.

2019

pdf bib abs

SemEval-2019 Task 9: Suggestion Mining from Online Reviews and Forums
Sapna Negi | Tobias Daudert | Paul Buitelaar
Proceedings of the 13th International Workshop on Semantic Evaluation

We present the pilot SemEval task on Suggestion Mining. The task consists of subtasks A and B, where we created labeled data from feedback forum and hotel reviews respectively. Subtask A provides training and test data from the same domain, while Subtask B evaluates the system on a test dataset from a different domain than the available training data. 33 teams participated in the shared task, with a total of 50 members. We summarize the problem definition, benchmark dataset preparation, and methods used by the participating teams, providing details of the methods used by the top ranked systems. The dataset is made freely available to help advance the research in suggestion mining, and reproduce the systems submitted under this task

2018

pdf bib abs

Linking News Sentiment to Microblogs: A Distributional Semantics Approach to Enhance Microblog Sentiment Classification
Tobias Daudert | Paul Buitelaar
Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

Social media’s popularity in society and research is gaining momentum and simultaneously increasing the importance of short textual content such as microblogs. Microblogs are affected by many factors including the news media, therefore, we exploit sentiments conveyed from news to detect and classify sentiment in microblogs. Given that texts can deal with the same entity but might not be vastly related when it comes to sentiment, it becomes necessary to introduce further measures ensuring the relatedness of texts while leveraging the contained sentiments. This paper describes ongoing research introducing distributional semantics to improve the exploitation of news-contained sentiment to enhance microblog sentiment classification.

pdf bib abs

Leveraging News Sentiment to Improve Microblog Sentiment Classification in the Financial Domain
Tobias Daudert | Paul Buitelaar | Sapna Negi
Proceedings of the First Workshop on Economics and Natural Language Processing

With the rising popularity of social media in the society and in research, analysing texts short in length, such as microblogs, becomes an increasingly important task. As a medium of communication, microblogs carry peoples sentiments and express them to the public. Given that sentiments are driven by multiple factors including the news media, the question arises if the sentiment expressed in news and the news article themselves can be leveraged to detect and classify sentiment in microblogs. Prior research has highlighted the impact of sentiments and opinions on the market dynamics, making the financial domain a prime case study for this approach. Therefore, this paper describes ongoing research dealing with the exploitation of news contained sentiment to improve microblog sentiment classification in a financial context.

pdf bib abs

Phrase-Level Metaphor Identification Using Distributed Representations of Word Meaning
Omnia Zayed | John P. McCrae | Paul Buitelaar
Proceedings of the Workshop on Figurative Language Processing

Metaphor is an essential element of human cognition which is often used to express ideas and emotions that might be difficult to express using literal language. Processing metaphoric language is a challenging task for a wide range of applications ranging from text simplification to psychotherapy. Despite the variety of approaches that are trying to process metaphor, there is still a need for better models that mimic the human cognition while exploiting fewer resources. In this paper, we present an approach based on distributional semantics to identify metaphors on the phrase-level. We investigated the use of different word embeddings models to identify verb-noun pairs where the verb is used metaphorically. Several experiments are conducted to show the performance of the proposed approach on benchmark datasets.

pdf bib

Teanga: A Linked Data based platform for Natural Language Processing
Housam Ziad | John P. McCrae | Paul Buitelaar
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib

A supervised approach to taxonomy extraction using word embeddings
Rajdeep Sarkar | John P. McCrae | Paul Buitelaar
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib

A Comparison Of Emotion Annotation Schemes And A New Annotated Data Set
Ian D. Wood | John P. McCrae | Vladimir Andryushechkin | Paul Buitelaar
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib

Automatic Enrichment of Terminological Resources: the IATE RDF Example
Mihael Arcan | Elena Montiel-Ponsoda | John P. McCrae | Paul Buitelaar
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib

A Study of Suggestions in Opinionated Texts and their Automatic Detection
Sapna Negi | Kartik Asooja | Shubham Mehrotra | Paul Buitelaar
Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics

pdf bib

SemEval-2016 Task 13: Taxonomy Extraction Evaluation (TExEval-2)
Georgeta Bordea | Els Lefever | Paul Buitelaar
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib

NUIG-UNLP at SemEval-2016 Task 1: Soft Alignment and Deep Learning for Semantic Textual Similarity
John P. McCrae | Kartik Asooja | Nitish Aggarwal | Paul Buitelaar
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib abs

Generating a Large-Scale Entity Linking Dictionary from Wikipedia Link Structure and Article Text
Ravindra Harige | Paul Buitelaar
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Wikipedia has been increasingly used as a knowledge base for open-domain Named Entity Linking and Disambiguation. In this task, a dictionary with entity surface forms plays an important role in finding a set of candidate entities for the mentions in text. Existing dictionaries mostly rely on the Wikipedia link structure, like anchor texts, redirect links and disambiguation links. In this paper, we introduce a dictionary for Entity Linking that includes name variations extracted from Wikipedia article text, in addition to name variations derived from the Wikipedia link structure. With this approach, we show an increase in the coverage of entities and their mentions in the dictionary in comparison to other Wikipedia based dictionaries.

pdf bib abs

IRIS: English-Irish Machine Translation System
Mihael Arcan | Caoilfhionn Lane | Eoin Ó Droighneáin | Paul Buitelaar
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We describe IRIS, a statistical machine translation (SMT) system for translating from English into Irish and vice versa. Since Irish is considered an under-resourced language with a limited amount of machine-readable text, building a machine translation system that produces reasonable translations is rather challenging. As translation is a difficult task, current research in SMT focuses on obtaining statistics either from a large amount of parallel, monolingual or other multilingual resources. Nevertheless, we collected available English-Irish data and developed an SMT system aimed at supporting human translators and enabling cross-lingual language technology tasks.

pdf bib abs

Forecasting Emerging Trends from Scientific Literature
Kartik Asooja | Georgeta Bordea | Gabriela Vulcu | Paul Buitelaar
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Text analysis methods for the automatic identification of emerging technologies by analyzing the scientific publications, are gaining attention because of their socio-economic impact. The approaches so far have been mainly focused on retrospective analysis by mapping scientific topic evolution over time. We propose regression based approaches to predict future keyword distribution. The prediction is based on historical data of the keywords, which in our case, are LREC conference proceedings. Considering the insufficient number of data points available from LREC proceedings, we do not employ standard time series forecasting methods. We form a dataset by extracting the keywords from previous year proceedings and quantify their yearly relevance using tf-idf scores. This dataset additionally contains ranked lists of related keywords and experts for each keyword.

pdf bib abs

Expanding wordnets to new languages with multilingual sense disambiguation
Mihael Arcan | John P. McCrae | Paul Buitelaar
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Princeton WordNet is one of the most important resources for natural language processing, but is only available for English. While it has been translated using the expand approach to many other languages, this is an expensive manual process. Therefore it would be beneficial to have a high-quality automatic translation approach that would support NLP techniques, which rely on WordNet in new languages. The translation of wordnets is fundamentally complex because of the need to translate all senses of a word including low frequency senses, which is very challenging for current machine translation approaches. For this reason we leverage existing translations of WordNet in other languages to identify contextual information for wordnet senses from a large set of generic parallel corpora. We evaluate our approach using 10 translated wordnets for European languages. Our experiment shows a significant improvement over translation without any contextual information. Furthermore, we evaluate how the choice of pivot languages affects performance of multilingual word sense disambiguation.

2015

pdf bib

MixedEmotions: Social Semantic Emotion Analysis for Innovative Multilingual Big Data Analytics Markets
Mihael Arcan | Paul Buitelaar
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib

pdf bib

Curse or Boon? Presence of Subjunctive Mood in Opinionated Text
Sapna Negi | Paul Buitelaar
Proceedings of the 11th International Conference on Computational Semantics

pdf bib

SemEval-2015 Task 17: Taxonomy Extraction Evaluation (TExEval)
Georgeta Bordea | Paul Buitelaar | Stefano Faralli | Roberto Navigli
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib

Non-Orthogonal Explicit Semantic Analysis
Nitish Aggarwal | Kartik Asooja | Georgeta Bordea | Paul Buitelaar
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics

pdf bib

Knowledge Portability with Semantic Expansion of Ontology Labels
Mihael Arcan | Marco Turchi | Paul Buitelaar
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib

Towards the Extraction of Customer-to-Customer Suggestions from Reviews
Sapna Negi | Paul Buitelaar
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf bib

Identification of Bilingual Terms from Monolingual Documents for Statistical Machine Translation
Mihael Arcan | Claudio Giuliano | Marco Turchi | Paul Buitelaar
Proceedings of the 4th International Workshop on Computational Terminology (Computerm)

pdf bib

Using Distributional Semantics to Trace Influence and Imitation in Romantic Orientalist Poetry
Nitish Aggarwal | Justin Tonra | Paul Buitelaar
Proceedings of the First AHA!-Workshop on Information Discovery in Text

pdf bib

INSIGHT Galway: Syntactic and Lexical Features for Aspect Based Sentiment Analysis
Sapna Negi | Paul Buitelaar
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib

Exploring ESA to Improve Word Relatedness
Nitish Aggarwal | Kartik Asooja | Paul Buitelaar
Proceedings of the Third Joint Conference on Lexical and Computational Semantics (*SEM 2014)

bib abs

Hot Topics and Schisms in NLP: Community and Trend Analysis with Saffron on ACL and LREC Proceedings
Paul Buitelaar | Georgeta Bordea | Barry Coughlan
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper we present a comparative analysis of two series of conferences in the field of Computational Linguistics, the LREC conference and the ACL conference. Conference proceedings were analysed using Saffron by performing term extraction and topical hierarchy construction with the goal of analysing topic trends and research communities. The system aims to provide insight into a research community and to guide publication and participation strategies, especially of novice researchers.

bib abs

Missed opportunities in translation memory matching
Friedel Wolff | Laurette Pretorius | Paul Buitelaar
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

A translation memory system stores a data set of source-target pairs of translations. It attempts to respond to a query in the source language with a useful target text from the data set to assist a human translator. Such systems estimate the usefulness of a target text suggestion according to the similarity of its associated source text to the source text query. This study analyses two data sets in two language pairs each to find highly similar target texts, which would be useful mutual suggestions. We further investigate which of these useful suggestions can not be selected through source text similarity, and we do a thorough analysis of these cases to categorise and quantify them. This analysis provides insight into areas where the recall of translation memory systems can be improved. Specifically, source texts with an omission, and semantically very similar source texts are some of the more frequent cases with useful target text suggestions that are not selected with the baseline approach of simple edit distance between the source texts.

pdf bib abs

Enhancing statistical machine translation with bilingual terminology in a CAT environment
Mihael Arcan | Marco Turchi | Sara Topelli | Paul Buitelaar
Proceedings of the 11th Conference of the Association for Machine Translation in the Americas: MT Researchers Track

In this paper, we address the problem of extracting and integrating bilingual terminology into a Statistical Machine Translation (SMT) system for a Computer Aided Translation (CAT) tool scenario. We develop a framework that, taking as input a small amount of parallel in-domain data, gathers domain-specific bilingual terms and injects them in an SMT system to enhance the translation productivity. Therefore, we investigate several strategies to extract and align bilingual terminology, and to embed it into the SMT. We compare two embedding methods that can be easily used at run-time without altering the normal activity of an SMT system: XML markup and the cache-based model. We tested our framework on two different domains showing improvements up to 15% BLEU score points.

2013

pdf bib

Linguistic Linked Data for Sentiment Analysis
Paul Buitelaar | Mihael Arcan | Carlos Iglesias | Fernando Sánchez-Rada | Carlo Strapparava
Proceedings of the 2nd Workshop on Linked Data in Linguistics (LDL-2013): Representing and linking lexicons, terminologies and other language data

pdf bib

Ontology Label Translation
Mihael Arcan | Paul Buitelaar
Proceedings of the 2013 NAACL HLT Student Research Workshop

pdf bib

Translating the FINREP Taxonomy using a Domain-specific Corpus
Mihael Arcan | Susan Marie Thomas | Derek De Brandt | Paul Buitelaar
Proceedings of Machine Translation Summit XIV: Posters

pdf bib

MONNET: Multilingual Ontologies for Networked Knowledge
Mihael Arcan | Paul Buitelaar
Proceedings of Machine Translation Summit XIV: European projects

2012

pdf bib

Using Domain-specific and Collaborative Resources for Term Translation
Mihael Arcan | Christian Federmann | Paul Buitelaar
Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf bib

DERI&UPM: Pushing Corpus Based Relatedness to Similarity: Shared Task System Description
Nitish Aggarwal | Kartik Asooja | Paul Buitelaar
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

bib abs

Expertise Mining for Enterprise Content Management
Georgeta Bordea | Sabrina Kirrane | Paul Buitelaar | Bianca Pereira
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Enterprise content analysis and platform configuration for enterprise content management is often carried out by external consultants that are not necessarily domain experts. In this paper, we propose a set of methods for automatic content analysis that allow users to gain a high level view of the enterprise content. Here, a main concern is the automatic identification of key stakeholders that should ideally be involved in analysis interviews. The proposed approach employs recent advances in term extraction, semantic term grounding, expert profiling and expert finding in an enterprise content management setting. Extracted terms are evaluated using human judges, while term grounding is evaluated using a manually created gold standard for the DBpedia datasource.

bib abs

Semi-Supervised Technical Term Tagging With Minimal User Feedback
Behrang QasemiZadeh | Paul Buitelaar | Tianqi Chen | Georgeta Bordea
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this paper, we address the problem of extracting technical terms automatically from an unannotated corpus. We introduce a technology term tagger that is based on Liblinear Support Vector Machines and employs linguistic features including Part of Speech tags and Dependency Structures, in addition to user feedback to perform the task of identification of technology related terms. Our experiments show the applicability of our approach as witnessed by acceptable results on precision and recall.

pdf bib

Experiments with Term Translation
Mihael Arcan | Christian Federmann | Paul Buitelaar
Proceedings of COLING 2012

This paper is motivated by the demand for more linguistic resources for the study of languages and the improvement of those already existing. The first step in our work is the selection of the most significant frames in the English FrameNet according to a representative medical corpus. These frames were subsequently attached to different EuroWordNet synsets and translated into Spanish. Results show how the translation was made with high accuracy (95.9 % of correct words). In addition to that, the original English lexical units were augmented with new units by 120%

bib abs

Ontology Search with the OntoSelect Ontology Library
Paul Buitelaar | Thomas Eigner
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

OntoSelect is a dynamic web-based ontology library that harvests, analyzes and organizes ontologies published on the Semantic Web. OntoSelect allows searching as well as browsing of ontologies according to size (number of classes, properties), representation format (DAML, RDFS, OWL), connectedness (score over the number of included and referring ontologies) and human languages used for class- and object property-labels. Ontology search in OntoSelect is based on a combined measure of coverage, structure and connectedness. Further, and in contrast to other ontology search engines, OntoSelect provides ontology search based on a complete web document instead of one or more keywords only.

2006

pdf bib

Proceedings of the 2nd Workshop on Ontology Learning and Population: Bridging the Gap between Text and Knowledge
Paul Buitelaar | Philipp Cimiano | Berenike Loos
Proceedings of the 2nd Workshop on Ontology Learning and Population: Bridging the Gap between Text and Knowledge

bib abs

Ontology-based Information Extraction with SOBA
Paul Buitelaar | Philipp Cimiano | Stefania Racioppa | Melanie Siegel
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper we describe SOBA, a sub-component of the SmartWeb multi-modal dialog system. SOBA is a component for ontologybased information extraction from soccer web pages for automatic population of a knowledge base that can be used for domainspecific question answering. SOBA realizes a tight connection between the ontology, knowledge base and the information extraction component. The originality of SOBA is in the fact that it extracts information from heterogeneous sources such as tabular structures, text and image captions in a semantically integrated way. In particular, it stores extracted information in a knowledge base, and in turn uses the knowledge base to interpret and link newly extracted information with respect to already existing entities.

pdf bib