Andrew Gargett

2026

Beyond Single Words: MWE Identification in Bioinformatics Research Articles and Dispersion Profiling Across IMRaD
Jurgi Giraud | Andrew Gargett
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)

Multiword Expressions (MWEs) are pervasive in scientific writing, and in specialized domains they include both multiword terminology (e.g., noun compounds) and recurrent academic phrasing. This study profiles MWEs in a large corpus of bioinformatics research articles segmented by IMRaD sections. Building on recent multi-method approaches to scientific MWE identification, we extract MWEs using complementary automated strategies (semantic matching, dependency parsing, controlled vocabularies, and academic formula lists) and compare the resulting inventories by size, form, and IMRaD section distribution. We further quantify cross-document dispersion using document frequency and Gries’ DP to distinguish widely reused expressions from items concentrated in a small subset of articles. Results show that bioinformatics MWEs are predominantly short and nominal, but that extraction methods differ in the extent to which they recover discourse and reporting phraseology. Dispersion is strongly long-tailed across sections with most MWEs being document-specific, while a smaller recurrent core aligns with section function and is enriched for conventional templates and standardized multiword terms. Overall, the findings argue for combining complementary identification methods with dispersion profiling to characterize domain "multiwordness" in a principled and section-sensitive way.

2020

pdf bib abs

Building the Emirati Arabic FrameNet
Andrew Gargett | Tommi Leung
Proceedings of the International FrameNet Workshop 2020: Towards a Global, Multilingual FrameNet

The Emirati Arabic FrameNet (EAFN) project aims to initiate a FrameNet for Emirati Arabic, utilizing the Emirati Arabic Corpus. The goal is to create a resource comparable to the initial stages of the Berkeley FrameNet. The project is divided into manual and automatic tracks, based on the predominant techniques being used to collect frames in each track. Work on the EAFN is progressing, and we here report on initial results for annotations and evaluation. The EAFN project aims to provide a general semantic resource for the Arabic language, sure to be of interest to researchers from general linguistics to natural language processing. As we report here, the EAFN is well on target for the first release of data in the coming year.

2018

pdf bib abs

Learning Neural Word Salience Scores
Krasen Samardzhiev | Andrew Gargett | Danushka Bollegala
Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics

Measuring the salience of a word is an essential step in numerous NLP tasks. Heuristic approaches such as tfidf have been used so far to estimate the salience of words. We propose Neural Word Salience (NWS) scores, unlike heuristics, are learnt from a corpus. Specifically, we learn word salience scores such that, using pre-trained word embeddings as the input, can accurately predict the words that appear in a sentence, given the words that appear in the sentences preceding or succeeding that sentence. Experimental results on sentence similarity prediction show that the learnt word salience scores perform comparably or better than some of the state-of-the-art approaches for representing sentences on benchmark datasets for sentence similarity, while using only a fraction of the training and prediction times required by prior methods. Moreover, our NWS scores positively correlate with psycholinguistic measures such as concreteness, and imageability implying a close connection to the salience as perceived by humans.

2015

pdf bib

Modeling the interaction between sensory and affective meanings for detecting metaphor
Andrew Gargett | John Barnden
Proceedings of the Third Workshop on Metaphor in NLP

2014

pdf bib abs

Mining Online Discussion Forums for Metaphors
Andrew Gargett | John Barnden
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present an approach to mining online forums for figurative language such as metaphor. We target in particular online discussions within the illness and the political conflict domains, with a view to constructing corpora of Metaphor in Illness Discussion, andMetaphor in Political Conflict Discussion. This paper reports on our ongoing efforts to combine manual and automatic detection strategies for labelling the corpora, and present some initial results from our work showing that metaphor use is not independent of illness domain.

pdf bib abs

DiVE-Arabic: Gulf Arabic Dialogue in a Virtual Environment
Andrew Gargett | Sam Hellmuth | Ghazi AlGethami
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Documentation of communicative behaviour across languages seems at a crossroads. While methods for collecting data on spoken or written communication, backed up by computational techniques, are evolving, the actual data being collected remain largely the same. Inspired by the efforts of some innovative researchers who are directly tackling the various obstacles to investigating language in the field (e.g. see various papers collected in Enfield & Stivers 2007), we report here about ongoing work to solve the general problem of collecting in situ data for situated linguistic interaction. The initial stages of this project have involved employing a portable format designed to increase range and flexibility of doing such collections in the field. Our motivation is to combine this with a parallel data set for a typologically distinct language, in order to contribute a parallel corpus of situated language use.

pdf bib

Dimensions of Metaphorical Meaning
Andrew Gargett | Josef Ruppenhofer | John Barnden
Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex)

2011

pdf bib

Report on the Second Second Challenge on Generating Instructions in Virtual Environments (GIVE-2.5)
Kristina Striegnitz | Alexandre Denis | Andrew Gargett | Konstantina Garoufi | Alexander Koller | Mariët Theune
Proceedings of the 13th European Workshop on Natural Language Generation

2010

pdf bib abs

The GIVE-2 Corpus of Giving Instructions in Virtual Environments
Andrew Gargett | Konstantina Garoufi | Alexander Koller | Kristina Striegnitz
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We present the GIVE-2 Corpus, a new corpus of human instruction giving. The corpus was collected by asking one person in each pair of subjects to guide the other person towards completing a task in a virtual 3D environment with typed instructions. This is the same setting as that of the recent GIVE Challenge, and thus the corpus can serve as a source of data and as a point of comparison for NLG systems that participate in the GIVE Challenge. The instruction-giving data we collect is multilingual (45 German and 63 English dialogues), and can easily be extended to further languages by using our software, which we have made available. We analyze the corpus to study the effects of learning by repeated participation in the task and the effects of the participants' spatial navigation abilities. Finally, we present a novel annotation scheme for situated referring expressions and compare the referring expressions in the German and English data.

pdf bib