John Kelleher - ACL Anthology

John Kelleher

Also published as: John D. Kelleher

2026

When LLMs Annotate: Reliability Challenges in Low-Resource NLI
Solmaz Panahi | John Kelleher | Vasudevan Nedumpozhimana
Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)

This paper systematically evaluates LLM reliability on the complex semantic task of Natural Language Inference (NLI) in Farsi, assessing six prominent models across eight prompt variations through a multi-dimensional framework that measures accuracy, prompt sensitivity, and intra-class consistency. Our results demonstrate that prompt design—particularly the order of premise and hypothesis—significantly impacts prediction stability. Proprietary models (Claude-Opus-4, GPT-4o) exhibit superior stability and accuracy compared to open-weight alternatives. Across all models, the ’Neutral’ class emerges as the most challenging and least stable category. Crucially, we redefine model instability as a diagnostic tool for benchmark quality, demonstrating that observed disagreement often reflects valid challenges to ambiguous or erroneous gold-standard labels.

2025

Iterative Layer Pruning for Efficient Translation Inference
Yasmin Moslem | Muhammad Hazim Al Farouq | John Kelleher
Proceedings of the Tenth Conference on Machine Translation

Large language models (LLMs) have transformed many areas of natural language processing, including machine translation. However, efficient deployment of LLMs remains challenging due to their intensive computational requirements. In this paper, we address this challenge and present our submissions to the Model Compression track at the Conference on Machine Translation (WMT 2025). In our experiments, we investigate iterative layer pruning guided by layer importance analysis. We evaluate this method using the Aya-Expanse-8B model for translation from Czech to German, and from English to Egyptian Arabic. Our approach achieves substantial reductions in model size and inference time, while maintaining the translation quality of the baseline models.

2024

ReproHum #1018-09: Reproducing Human Evaluations of Redundancy Errors in Data-To-Text Systems
Filip Klubička | John D. Kelleher
Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024

This paper describes a reproduction of a human evaluation study evaluating redundancies generated in automatically generated text from a data-to-text system. While the scope of the original study is broader, a human evaluation—a manual error analysis—is included as part of the system evaluation. We attempt a reproduction of this human evaluation, however while the authors annotate multiple properties of the generated text, we focus exclusively on a single quality criterion, that of redundancy. In focusing our study on a single minimal reproducible experimental unit, with the experiment being fairly straightforward and all data made available by the authors, we encountered no challenges with our reproduction and were able to reproduce the trend found in the original experiment. However, while still confirming the general trend, we found that both our annotators identified twice as many errors in the dataset than the original authors.

2023

On the role of resources in the age of large language models
Simon Dobnik | John Kelleher
Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD)

We evaluate the role of expert-based domain knowledge and resources in relation to training large language models by referring to our work on training and evaluating neural models, also in under-resourced scenarios which we believe also informs training models for “well-resourced” languages and domains. We argue that our community needs both large-scale datasets and small but high-quality data based on expert knowledge and that both activities should work hand-in-hand.

Adaptive Machine Translation with Large Language Models
Yasmin Moslem | Rejwanul Haque | John D. Kelleher | Andy Way
Proceedings of the 24th Annual Conference of the European Association for Machine Translation

Consistency is a key requirement of high-quality translation. It is especially important to adhere to pre-approved terminology and adapt to corrected translations in domain-specific projects. Machine translation (MT) has achieved significant progress in the area of domain adaptation. However, real-time adaptation remains challenging. Large-scale language models (LLMs) have recently shown interesting capabilities of in-context learning, where they learn to replicate certain input-output text generation patterns, without further fine-tuning. By feeding an LLM at inference time with a prompt that consists of a list of translation pairs, it can then simulate the domain and style characteristics. This work aims to investigate how we can utilize in-context learning to improve real-time adaptive MT. Our extensive experiments show promising results at translation time. For example, GPT-3.5 can adapt to a set of in-domain sentence pairs and/or terminology while translating a new sentence. We observe that the translation quality with few-shot in-context learning can surpass that of strong encoder-decoder MT systems, especially for high-resource languages. Moreover, we investigate whether we can combine MT from strong encoder-decoder models with fuzzy matches, which can further improve translation quality, especially for less supported languages. We conduct our experiments across five diverse language pairs, namely English-to-Arabic (EN-AR), English-to-Chinese (EN-ZH), English-to-French (EN-FR), English-to-Kinyarwanda (EN-RW), and English-to-Spanish (EN-ES).

Using MT for multilingual covid-19 case load prediction from social media texts
Maja Popovic | Vasudevan Nedumpozhimana | Meegan Gower | Sneha Rautmare | Nishtha Jain | John Kelleher
Proceedings of the 24th Annual Conference of the European Association for Machine Translation

In the context of an epidemiological study involving multilingual social media, this paper reports on the ability of machine translation systems to preserve content relevant for a document classification task designed to determine whether the social media text is related to covid. The results indicate that machine translation does provide a feasible basis for scaling epidemiological social media surveillance to multiple languages. Moreover, a qualitative error analysis revealed that the majority of classification errors are not caused by MT errors.

Probing Taxonomic and Thematic Embeddings for Taxonomic Information
Filip Klubička | John Kelleher
Proceedings of the 12th Global Wordnet Conference

Modelling taxonomic and thematic relatedness is important for building AI with comprehensive natural language understanding. The goal of this paper is to learn more about how taxonomic information is structurally encoded in embeddings. To do this, we design a new hypernym-hyponym probing task and perform a comparative probing study of taxonomic and thematic SGNS and GloVe embeddings. Our experiments indicate that both types of embeddings encode some taxonomic information, but the amount, as well as the geometric properties of the encodings, are independently related to both the encoder architecture, as well as the embedding training data. Specifically, we find that only taxonomic embeddings carry taxonomic information in their norm, which is determined by the underlying distribution in the data.

HumEval’23 Reproduction Report for Paper 0040: Human Evaluation of Automatically Detected Over- and Undertranslations
Filip Klubička | John D. Kelleher
Proceedings of the 3rd Workshop on Human Evaluation of NLP Systems

This report describes a reproduction of a human evaluation study evaluating automatically detected over- and undertranslations obtained using neural machine translation approaches. While the scope of the original study is much broader, a human evaluation is included as part of its system evaluation. We attempt an exact reproduction of this human evaluation, pertaining to translations on the the English-German language pair. While encountering minor logistical challenges, with all the source material being publicly available and some additional instructions provided by the original authors, we were able to reproduce the original experiment with only minor differences in the results.

Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Anya Belz | Craig Thomson | Ehud Reiter | Gavin Abercrombie | Jose M. Alonso-Moral | Mohammad Arvan | Anouck Braggaar | Mark Cieliebak | Elizabeth Clark | Kees van Deemter | Tanvi Dinkar | Ondřej Dušek | Steffen Eger | Qixiang Fang | Mingqi Gao | Albert Gatt | Dimitra Gkatzia | Javier González-Corbelle | Dirk Hovy | Manuela Hürlimann | Takumi Ito | John D. Kelleher | Filip Klubicka | Emiel Krahmer | Huiyuan Lai | Chris van der Lee | Yiru Li | Saad Mahamood | Margot Mieskes | Emiel van Miltenburg | Pablo Mosteiro | Malvina Nissim | Natalie Parde | Ondřej Plátek | Verena Rieser | Jie Ruan | Joel Tetreault | Antonio Toral | Xiaojun Wan | Leo Wanner | Lewis Watson | Diyi Yang
Proceedings of the Fourth Workshop on Insights from Negative Results in NLP

We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible. We present our results and findings, which include that just 13% of papers had (i) sufficiently low barriers to reproduction, and (ii) enough obtainable information, to be considered for reproduction, and that all but one of the experiments we selected for reproduction was discovered to have flaws that made the meaningfulness of conducting a reproduction questionable. As a result, we had to change our coordinated study design from a reproduce approach to a standardise-then-reproduce-twice approach. Our overall (negative) finding that the great majority of human evaluations in NLP is not repeatable and/or not reproducible and/or too flawed to justify reproduction, paints a dire picture, but presents an opportunity for a rethink about how to design and report human evaluations in NLP.

Instance-Based Domain Adaptation for Improving Terminology Translation
Prashanth Nayak | John Kelleher | Rejwanul Haque | Andy Way
Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track

Terms are essential indicators of a domain, and domain term translation is dealt with priority in any translation workflow. Translation service providers who use machine translation (MT) expect term translation to be unambiguous and consistent with the context and domain in question. Although current state-of-the-art neural MT (NMT) models are able to produce high-quality translations for many languages, they are still not at the level required when it comes to translating domain-specific terms. This study presents a terminology-aware instance- based adaptation method for improving terminology translation in NMT. We conducted our experiments for French-to-English and found that our proposed approach achieves a statistically significant improvement over the baseline NMT system in translating domain-specific terms. Specifically, the translation of multi-word terms is improved by 6.7% compared to the strong baseline.

Idioms, Probing and Dangerous Things: Towards Structural Probing for Idiomaticity in Vector Space
Filip Klubička | Vasudevan Nedumpozhimana | John Kelleher
Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)

The goal of this paper is to learn more about how idiomatic information is structurally encoded in embeddings, using a structural probing method. We repurpose an existing English verbal multi-word expression (MWE) dataset to suit the probing framework and perform a comparative probing study of static (GloVe) and contextual (BERT) embeddings. Our experiments indicate that both encode some idiomatic information to varying degrees, but yield conflicting evidence as to whether idiomaticity is encoded in the vector norm, leaving this an open question. We also identify some limitations of the used dataset and highlight important directions for future work in improving its suitability for a probing analysis.

Medical Concept Mention Identification in Social Media Posts Using a Small Number of Sample References
Vasudevan Nedumpozhimana | Sneha Rautmare | Meegan Gower | Nishtha Jain | Maja Popović | Patricia Buffini | John Kelleher
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

Identification of mentions of medical concepts in social media text can provide useful information for caseload prediction of diseases like Covid-19 and Measles. We propose a simple model for the automatic identification of the medical concept mentions in the social media text. We validate the effectiveness of the proposed model on Twitter, Reddit, and News/Media datasets.

Domain Terminology Integration into Machine Translation: Leveraging Large Language Models
Yasmin Moslem | Gianfranco Romani | Mahdi Molaei | John D. Kelleher | Rejwanul Haque | Andy Way
Proceedings of the Eighth Conference on Machine Translation

This paper discusses the methods that we used for our submissions to the WMT 2023 Terminology Shared Task for German-to-English (DE-EN), English-to-Czech (EN-CS), and Chinese-to-English (ZH-EN) language pairs. The task aims to advance machine translation (MT) by challenging participants to develop systems that accurately translate technical terms, ultimately enhancing communication and understanding in specialised domains. To this end, we conduct experiments that utilise large language models (LLMs) for two purposes: generating synthetic bilingual terminology-based data, and post-editing translations generated by an MT model through incorporating pre-approved terms. Our system employs a four-step process: (i) using an LLM to generate bilingual synthetic data based on the provided terminology, (ii) fine-tuning a generic encoder-decoder MT model, with a mix of the terminology-based synthetic data generated in the first step and a randomly sampled portion of the original generic training data, (iii) generating translations with the fine-tuned MT model, and (iv) finally, leveraging an LLM for terminology-constrained automatic post-editing of the translations that do not include the required terms. The results demonstrate the effectiveness of our proposed approach in improving the integration of pre-approved terms into translations. The number of terms incorporated into the translations of the blind dataset increases from an average of 36.67% with the generic model to an average of 72.88% by the end of the process. In other words, successful utilisation of terms nearly doubles across the three language pairs.

2022

Domain-Specific Text Generation for Machine Translation
Yasmin Moslem | Rejwanul Haque | John Kelleher | Andy Way
Proceedings of the 15th biennial conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

Preservation of domain knowledge from the source to target is crucial in any translation workflow. It is common in the translation industry to receive highly-specialized projects, where there is hardly any parallel in-domain data. In such scenarios where there is insufficient in-domain data to fine-tune Machine Translation (MT) models, producing translations that are consistent with the relevant context is challenging. In this work, we propose leveraging state-of-the-art pretrained language models (LMs) for domain-specific data augmentation for MT, simulating the domain characteristics of either (a) a small bilingual dataset, or (b) the monolingual source text to be translated. Combining this idea with back-translation, we can generate huge amounts of synthetic bilingual in-domain data for both use cases. For our investigation, we used the state-of-the-art MT architecture, Transformer. We employed mixed fine-tuning to train models that significantly improve translation of in-domain texts. More specifically, our proposed methods achieved improvements of approximately 5-6 BLEU and 2-3 BLEU, respectively, on Arabic-to-English and English-to-Arabic language pairs. Furthermore, the outcome of human evaluation corroborates the automatic evaluation results.

Probing with Noise: Unpicking the Warp and Weft of Embeddings
Filip Klubicka | John Kelleher
Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Improving our understanding of how information is encoded in vector space can yield valuable interpretability insights. Alongside vector dimensions, we argue that it is possible for the vector norm to also carry linguistic information. We develop a method to test this: an extension of the probing framework which allows for relative intrinsic interpretations of probing results. It relies on introducing noise that ablates information encoded in embeddings, grounded in random baselines and confidence intervals. We apply the method to well-established probing tasks and find evidence that confirms the existence of separate information containers in English GloVe and BERT embeddings. Our correlation analysis aligns with the experimental findings that different encoders use the norm to encode different kinds of information: GloVe stores syntactic and sentence length information in the vector norm, while BERT uses it to encode contextual incongruity.

Domain-Informed Probing of wav2vec 2.0 Embeddings for Phonetic Features
Patrick Cormac English | John D. Kelleher | Julie Carson-Berndsen
Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

In recent years large transformer model architectures have become available which provide a novel means of generating high-quality vector representations of speech audio. These transformers make use of an attention mechanism to generate representations enhanced with contextual and positional information from the input sequence. Previous works have explored the capabilities of these models with regard to performance in tasks such as speech recognition and speaker verification, but there has not been a significant inquiry as to the manner in which the contextual information provided by the transformer architecture impacts the representation of phonetic information within these models. In this paper, we report the results of a number of probing experiments on the representations generated by the wav2vec 2.0 model’s transformer component, with regard to the encoding of phonetic categorization information within the generated embeddings. We find that the contextual information generated by the transformer’s operation results in enhanced capture of phonetic detail by the model, and allows for distinctions to emerge in acoustic data that are otherwise difficult to separate.

Traitement Automatique des Langues, Volume 63, Numéro 2 : Traitement automatique des langues intermodal et multimodal [Cross-modal and multimodal natural language processing]
Gwénolé Lecorvé | John D. Kelleher
Traitement Automatique des Langues, Volume 63, Numéro 2 : Traitement automatique des langues intermodal et multimodal [Cross-modal and multimodal natural language processing]

Introduction to the special issue on cross-modal and multimodal natural language processing
Gwénolé Lecorvé | John D. Kelleher
Traitement Automatique des Langues, Volume 63, Numéro 2 : Traitement automatique des langues intermodal et multimodal [Cross-modal and multimodal natural language processing]

2021

Poisoning Knowledge Graph Embeddings via Relation Inference Patterns
Peru Bhardwaj | John Kelleher | Luca Costabello | Declan O’Sullivan
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

We study the problem of generating data poisoning attacks against Knowledge Graph Embedding (KGE) models for the task of link prediction in knowledge graphs. To poison KGE models, we propose to exploit their inductive abilities which are captured through the relationship patterns like symmetry, inversion and composition in the knowledge graph. Specifically, to degrade the model’s prediction confidence on target facts, we propose to improve the model’s prediction confidence on a set of decoy facts. Thus, we craft adversarial additions that can improve the model’s prediction confidence on decoy facts through different inference patterns. Our experiments demonstrate that the proposed poisoning attacks outperform state-of-art baselines on four KGE models for two publicly available datasets. We also find that the symmetry pattern based attacks generalize across all model-dataset combinations which indicates the sensitivity of KGE models to this pattern.

Adversarial Attacks on Knowledge Graph Embeddings via Instance Attribution Methods
Peru Bhardwaj | John Kelleher | Luca Costabello | Declan O’Sullivan
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Despite the widespread use of Knowledge Graph Embeddings (KGE), little is known about the security vulnerabilities that might disrupt their intended behaviour. We study data poisoning attacks against KGE models for link prediction. These attacks craft adversarial additions or deletions at training time to cause model failure at test time. To select adversarial deletions, we propose to use the model-agnostic instance attribution methods from Interpretable Machine Learning, which identify the training instances that are most influential to a neural model’s predictions on test instances. We use these influential triples as adversarial deletions. We further propose a heuristic method to replace one of the two entities in each influential triple to generate adversarial additions. Our experiments show that the proposed strategies outperform the state-of-art data poisoning attacks on KGE models and improve the MRR degradation due to the attacks by up to 62% over the baselines.

Finding BERT’s Idiomatic Key
Vasudevan Nedumpozhimana | John Kelleher
Proceedings of the 17th Workshop on Multiword Expressions (MWE 2021)

Sentence embeddings encode information relating to the usage of idioms in a sentence. This paper reports a set of experiments that combine a probing methodology with input masking to analyse where in a sentence this idiomatic information is taken from, and what form it takes. Our results indicate that BERT’s idiomatic key is primarily found within an idiomatic expression, but also draws on information from the surrounding context. Also, BERT can distinguish between the disruption in a sentence caused by words missing and the incongruity caused by idiomatic usage.

2020

Language-Driven Region Pointer Advancement for Controllable Image Captioning
Annika Lindh | Robert Ross | John Kelleher
Proceedings of the 28th International Conference on Computational Linguistics

Controllable Image Captioning is a recent sub-field in the multi-modal task of Image Captioning wherein constraints are placed on which regions in an image should be described in the generated natural language caption. This puts a stronger focus on producing more detailed descriptions, and opens the door for more end-user control over results. A vital component of the Controllable Image Captioning architecture is the mechanism that decides the timing of attending to each region through the advancement of a region pointer. In this paper, we propose a novel method for predicting the timing of region pointer advancement by treating the advancement step as a natural part of the language structure via a NEXT-token, motivated by a strong correlation to the sentence structure in the training data. We find that our timing agrees with the ground-truth timing in the Flickr30k Entities test data with a precision of 86.55% and a recall of 97.92%. Our model implementing this technique improves the state-of-the-art on standard captioning metrics while additionally demonstrating a considerably larger effective vocabulary size.

Style versus Content: A distinction without a (learnable) difference?
Somayeh Jafaritazehjani | Gwénolé Lecorvé | Damien Lolive | John Kelleher
Proceedings of the 28th International Conference on Computational Linguistics

Textual style transfer involves modifying the style of a text while preserving its content. This assumes that it is possible to separate style from content. This paper investigates whether this separation is possible. We use sentiment transfer as our case study for style transfer analysis. Our experimental methodology frames style transfer as a multi-objective problem, balancing style shift with content preservation and fluency. Due to the lack of parallel data for style transfer we employ a variety of adversarial encoder-decoder networks in our experiments. Also, we use of a probing methodology to analyse how these models encode style-related features in their latent spaces. The results of our experiments which are further confirmed by a human evaluation reveal the inherent trade-off between the multiple style transfer objectives which indicates that style cannot be usefully separated from content within these style-transfer systems.

Proceedings of the 13th International Conference on Natural Language Generation
Brian Davis | Yvette Graham | John Kelleher | Yaji Sripada
Proceedings of the 13th International Conference on Natural Language Generation

English WordNet Random Walk Pseudo-Corpora
Filip Klubička | Alfredo Maldonado | Abhijit Mahalunkar | John Kelleher
Proceedings of the Twelfth Language Resources and Evaluation Conference

This is a resource description paper that describes the creation and properties of a set of pseudo-corpora generated artificially from a random walk over the English WordNet taxonomy. Our WordNet taxonomic random walk implementation allows the exploration of different random walk hyperparameters and the generation of a variety of different pseudo-corpora. We find that different combinations of parameters result in varying statistical properties of the generated pseudo-corpora. We have published a total of 81 pseudo-corpora that we have used in our previous research, but have not exhausted all possible combinations of hyperparameters, which is why we have also published a codebase that allows the generation of additional WordNet taxonomic pseudo-corpora as needed. Ultimately, such pseudo-corpora can be used to train taxonomic word embeddings, as a way of transferring taxonomic knowledge into a word embedding space.

Energy-based Neural Modelling for Large-Scale Multiple Domain Dialogue State Tracking
Anh Duong Trinh | Robert J. Ross | John D. Kelleher
Proceedings of the Fourth Workshop on Structured Prediction for NLP

Scaling up dialogue state tracking to multiple domains is challenging due to the growth in the number of variables being tracked. Furthermore, dialog state tracking models do not yet explicitly make use of relationships between dialogue variables, such as slots across domains. We propose using energy-based structure prediction methods for large-scale dialogue state tracking task in two multiple domain dialogue datasets. Our results indicate that: (i) modelling variable dependencies yields better results; and (ii) the structured prediction output aligns with the dialogue slot-value constraint principles. This leads to promising directions to improve state-of-the-art models by incorporating variable dependencies into their prediction process.

2019

Synthetic, yet natural: Properties of WordNet random walk corpora and the impact of rare words on embedding performance
Filip Klubička | Alfredo Maldonado | Abhijit Mahalunkar | John Kelleher
Proceedings of the 10th Global Wordnet Conference

Creating word embeddings that reflect semantic relationships encoded in lexical knowledge resources is an open challenge. One approach is to use a random walk over a knowledge graph to generate a pseudo-corpus and use this corpus to train embeddings. However, the effect of the shape of the knowledge graph on the generated pseudo-corpora, and on the resulting word embeddings, has not been studied. To explore this, we use English WordNet, constrained to the taxonomic (tree-like) portion of the graph, as a case study. We investigate the properties of the generated pseudo-corpora, and their impact on the resulting embeddings. We find that the distributions in the psuedo-corpora exhibit properties found in natural corpora, such as Zipf’s and Heaps’ law, and also observe that the proportion of rare words in a pseudo-corpus affects the performance of its embeddings on word similarity.

Persistence pays off: Paying Attention to What the LSTM Gating Mechanism Persists
Giancarlo Salton | John Kelleher
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Recurrent Neural Network Language Models composed of LSTM units, especially those augmented with an external memory, have achieved state-of-the-art results in Language Modeling. However, these models still struggle to process long sequences which are more likely to contain long-distance dependencies because of information fading. In this paper we demonstrate an effective mechanism for retrieving information in a memory augmented LSTM LM based on attending to information in memory in proportion to the number of timesteps the LSTM gating mechanism persisted the information.

Bigger versus Similar: Selecting a Background Corpus for First Story Detection Based on Distributional Similarity
Fei Wang | Robert J. Ross | John D. Kelleher
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

The current state of the art for First Story Detection (FSD) are nearest neighbour-based models with traditional term vector representations; however, one challenge faced by FSD models is that the document representation is usually defined by the vocabulary and term frequency from a background corpus. Consequently, the ideal background corpus should arguably be both large-scale to ensure adequate term coverage, and similar to the target domain in terms of the language distribution. However, given these two factors cannot always be mutually satisfied, in this paper we examine whether the distributional similarity of common terms is more important than the scale of common terms for FSD. As a basis for our analysis we propose a set of metrics to quantitatively measure the scale of common terms and the distributional similarity between corpora. Using these metrics we rank different background corpora relative to a target corpus. We also apply models based on different background corpora to the FSD task. Our results show that term distributional similarity is more predictive of good FSD performance than the scale of common terms; and, thus we demonstrate that a smaller recent domain-related corpus will be more suitable than a very large-scale general corpus for FSD.

Multi-Element Long Distance Dependencies: Using SPk Languages to Explore the Characteristics of Long-Distance Dependencies
Abhijit Mahalunkar | John Kelleher
Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges

In order to successfully model Long Distance Dependencies (LDDs) it is necessary to under-stand the full-range of the characteristics of the LDDs exhibited in a target dataset. In this paper, we use Strictly k-Piecewise languages to generate datasets with various properties. We then compute the characteristics of the LDDs in these datasets using mutual information and analyze the impact of factors such as (i) k, (ii) length of LDDs, (iii) vocabulary size, (iv) forbidden strings, and (v) dataset size. This analysis reveal that the number of interacting elements in a dependency is an important characteristic of LDDs. This leads us to the challenge of modelling multi-element long-distance dependencies. Our results suggest that attention mechanisms in neural networks may aide in modeling datasets with multi-element long-distance dependencies. However, we conclude that there is a need to develop more efficient attention mechanisms to address this issue.

Energy-Based Modelling for Dialogue State Tracking
Anh Duong Trinh | Robert Ross | John Kelleher
Proceedings of the First Workshop on NLP for Conversational AI

The uncertainties of language and the complexity of dialogue contexts make accurate dialogue state tracking one of the more challenging aspects of dialogue processing. To improve state tracking quality, we argue that relationships between different aspects of dialogue state must be taken into account as they can often guide a more accurate interpretation process. To this end, we present an energy-based approach to dialogue state tracking as a structured classification task. The novelty of our approach lies in the use of an energy network on top of a deep learning architecture to explore more signal correlations between network variables including input features and output labels. We demonstrate that the energy-based approach improves the performance of a deep learning dialogue state tracker towards state-of-the-art results without the need for many of the other steps required by current state-of-the-art methods.

Capturing Dialogue State Variable Dependencies with an Energy-based Neural Dialogue State Tracker
Anh Duong Trinh | Robert J. Ross | John D. Kelleher
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

Dialogue state tracking requires the population and maintenance of a multi-slot frame representation of the dialogue state. Frequently, dialogue state tracking systems assume independence between slot values within a frame. In this paper we argue that treating the prediction of each slot value as an independent prediction task may ignore important associations between the slot values, and, consequently, we argue that treating dialogue state tracking as a structured prediction problem can help to improve dialogue state tracking performance. To support this argument, the research presented in this paper is structured into three stages: (i) analyzing variable dependencies in dialogue data; (ii) applying an energy-based methodology to model dialogue state tracking as a structured prediction task; and (iii) evaluating the impact of inter-slot relationships on model performance. Overall we demonstrate that modelling the associations between target slots with an energy-based formalism improves dialogue state tracking performance in a number of ways.

2018

Is it worth it? Budget-related evaluation metrics for model selection
Filip Klubička | Giancarlo D. Salton | John D. Kelleher
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Exploring the Functional and Geometric Bias of Spatial Relations Using Neural Language Models
Simon Dobnik | Mehdi Ghanimifard | John Kelleher
Proceedings of the First International Workshop on Spatial Language Understanding

The challenge for computational models of spatial descriptions for situated dialogue systems is the integration of information from different modalities. The semantics of spatial descriptions are grounded in at least two sources of information: (i) a geometric representation of space and (ii) the functional interaction of related objects that. We train several neural language models on descriptions of scenes from a dataset of image captions and examine whether the functional or geometric bias of spatial descriptions reported in the literature is reflected in the estimated perplexity of these models. The results of these experiments have implications for the creation of models of spatial lexical semantics for human-robot dialogue systems. Furthermore, they also provide an insight into the kinds of the semantic knowledge captured by neural language models trained on spatial descriptions, which has implications for image captioning systems.

2017

Attentive Language Models
Giancarlo Salton | Robert Ross | John Kelleher
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In this paper, we extend Recurrent Neural Network Language Models (RNN-LMs) with an attention mechanism. We show that an “attentive” RNN-LM (with 11M parameters) achieves a better perplexity than larger RNN-LMs (with 66M parameters) and achieves performance comparable to an ensemble of 10 similar sized RNN-LMs. We also show that an “attentive” RNN-LM needs less contextual information to achieve similar results to the state-of-the-art on the wikitext2 dataset.

Idiom Type Identification with Smoothed Lexical Features and a Maximum Margin Classifier
Giancarlo Salton | Robert Ross | John Kelleher
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

In our work we address limitations in the state-of-the-art in idiom type identification. We investigate different approaches for a lexical fixedness metric, a component of the state-of the-art model. We also show that our Machine Learning based approach to the idiom type identification task achieves an F1-score of 0.85, an improvement of 11 points over the state-of the-art.

2016

Idiom Token Classification using Sentential Distributed Semantics
Giancarlo Salton | Robert Ross | John Kelleher
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2014

DIT: Summarisation and Semantic Expansion in Evaluating Semantic Similarity
Magdalena Kacmajor | John D. Kelleher
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

TCDSCSS: Dimensionality Reduction to Evaluate Texts of Varying Lengths - an IR Approach
Arun Kumar Jayapal | Martin Emms | John Kelleher
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

Evaluation of a Substitution Method for Idiom Transformation in Statistical Machine Translation
Giancarlo Salton | Robert Ross | John Kelleher
Proceedings of the 10th Workshop on Multiword Expressions (MWE)

An Empirical Study of the Impact of Idioms on Phrase Based Statistical Machine Translation of English to Brazilian-Portuguese
Giancarlo Salton | Robert Ross | John Kelleher
Proceedings of the 3rd Workshop on Hybrid Approaches to Machine Translation (HyTra)

The Effect of Sensor Errors in Situated Human-Computer Dialogue
Niels Schütte | John Kelleher | Brian Mac Namee
Proceedings of the Third Workshop on Vision and Language

Exploration of functional semantics of prepositions from corpora of descriptions of visual scenes
Simon Dobnik | John Kelleher
Proceedings of the Third Workshop on Vision and Language

2013

Proceedings of the IWCS 2013 Workshop on Computational Models of Spatial Language Interpretation and Generation (CoSLI-3)
John Kelleher | Robert Ross | Simon Dobnik
Proceedings of the IWCS 2013 Workshop on Computational Models of Spatial Language Interpretation and Generation (CoSLI-3)

2010

Proceedings of the 6th International Natural Language Generation Conference
John Kelleher | Brian Mac Namee | Ielka van der Sluis
Proceedings of the 6th International Natural Language Generation Conference

2009

Applying Computational Models of Spatial Prepositions to Visually Situated Dialog
John D. Kelleher | Fintan J. Costello
Computational Linguistics, Volume 35, Number 2, June 2009 - Special Issue on Prepositions

2008

Referring Expression Generation Challenge 2008 DIT System Descriptions (DIT-FBI, DIT-TVAS, DIT-CBSR, DIT-RBR, DIT-FBI-CBSR, DIT-TVAS-RBR)
John D. Kelleher | Brian Mac Namee
Proceedings of the Fifth International Natural Language Generation Conference

2007

Proceedings of the Fourth ACL-SIGSEM Workshop on Prepositions
Fintan Costello | John Kelleher | Martin Volk
Proceedings of the Fourth ACL-SIGSEM Workshop on Prepositions

2006

Proximity in Context: An Empirically Grounded Computational Model of Proximity for Processing Topological Spatial Expressions
John D. Kelleher | Geert-Jan M. Kruijff | Fintan J. Costello
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

Incremental Generation of Spatial Referring Expressions in Situated Dialog
John D. Kelleher | Geert-Jan M. Kruijff
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

Spatial Prepositions in Context: The Semantics of near in the Presence of Distractor Objects
Fintan J. Costello | John D. Kelleher
Proceedings of the Third ACL-SIGSEM Workshop on Prepositions

2005

A Context-dependent Algorithm for Generating Locative Expressions in Physically Situated Environments
John Kelleher | Geert-Jan Kruijff
Proceedings of the Tenth European Workshop on Natural Language Generation (ENLG-05)

Co-authors

Rejwanul Haque 4

Yasmin Moslem 4

Geert-Jan M. Kruijff 3

Gwénolé Lecorvé 3

Brian Mac Namee 3

Abhijit Mahalunkar 3

Robert J. Ross 3

Anh Duong Trinh 3

Peru Bhardwaj 2

Luca Costabello 2

Alfredo Maldonado 2

Declan O’Sullivan 2

Maja Popović 2

Sneha Rautmare 2

Gavin Abercrombie 1

Jose M. Alonso-Moral 1

Mohammad Arvan 1

Anouck Braggaar 1

Patricia Buffini 1

Julie Carson-Berndsen 1

Mark Cieliebak 1

Elizabeth Clark 1

Patrick Cormac English 1

Ondřej Dušek 1

Muhammad Hazim Al Farouq 1

Mehdi Ghanimifard 1

Dimitra Gkatzia 1

Javier González Corbelle 1

Yvette Graham 1

Manuela Huerlimann 1

Somayeh Jafaritazehjani 1

Arun Kumar Jayapal 1

Magdalena Kacmajor 1

Emiel Krahmer 1

Damien Lolive 1

Saad Mahamood 1

Margot Mieskes 1

Pablo Mosteiro 1

Prashanth Nayak 1

Malvina Nissim 1

Solmaz Panahi 1

Natalie Parde 1

Ondřej Plátek 1

Verena Rieser 1

Gianfranco Romani 1

Niels Schütte 1

Ielka van der Sluis 1

Joel Tetreault 1

Craig Thomson 1

Antonio Toral 1

Emiel Van Miltenburg 1

Kees van Deemter 1

Chris van der Lee 1

Venues