Sian Gooding

2023

Towards Better Evaluation of Instruction-Following: A Case-Study in Summarization
Ondrej Skopek | Rahul Aralikatte | Sian Gooding | Victor Carbune
Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL)

Despite recent advances, evaluating how well large language models (LLMs) follow user instructions remains an open problem. While evaluation methods of language models have seen a rise in prompt-based approaches, limited work on the correctness of these methods has been conducted. In this work, we perform a meta-evaluation of a variety of metrics to quantify how accurately they measure the instruction-following abilities of LLMs. Our investigation is performed on grounded query-based summarization by collecting a new short-form, real-world dataset riSum, containing 300 document-instruction pairs with 3 answers each. All 900 answers are rated by 3 human annotators. Using riSum, we analyze the agreement between evaluation methods and human judgment. Finally, we propose new LLM-based reference-free evaluation methods that improve upon established baselines and perform on par with costly reference-based metrics that require high-quality summaries.

pdf bib abs

A Study on Annotation Interfaces for Summary Comparison
Sian Gooding | Lucas Werner | Victor Cărbune
Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII)

The task of summarisation is notoriously difficult to evaluate, with agreement even between expert raters unlikely to be perfect. One technique for summary evaluation relies on collecting comparison data by presenting annotators with generated summaries and tasking them with selecting the best one. This paradigm is currently being exploited in reinforcement learning using human feedback, whereby a reward function is trained using pairwise choice data. Comparisons are an easier way to elicit human feedback for summarisation, however, such decisions can be bottle necked by the usability of the annotator interface. In this paper, we present the results of a pilot study exploring how the user interface impacts annotator agreement when judging summary quality.

2022

pdf bib abs

One Size Does Not Fit All: The Case for Personalised Word Complexity Models
Sian Gooding | Manuel Tragut
Findings of the Association for Computational Linguistics: NAACL 2022

Complex Word Identification (CWI) aims to detect words within a text that a reader may find difficult to understand. It has been shown that CWI systems can improve text simplification, readability prediction and vocabulary acquisition modelling. However, the difficulty of a word is a highly idiosyncratic notion that depends on a reader’s first language, proficiency and reading experience. In this paper, we show that personal models are best when predicting word complexity for individual readers. We use a novel active learning framework that allows models to be tailored to individuals and release a dataset of complexity annotations and models as a benchmark for further research.

pdf bib abs

On the Ethical Considerations of Text Simplification
Sian Gooding
Ninth Workshop on Speech and Language Processing for Assistive Technologies (SLPAT-2022)

This paper outlines the ethical implications of text simplification within the framework of assistive systems. We argue that a distinction should be made between the technologies that perform text simplification and the realisation of these in assistive technologies. When using the latter as a motivation for research, it is important that the subsequent ethical implications be carefully considered. We provide guidelines for the framing of text simplification independently of assistive systems, as well as suggesting directions for future research and discussion based on the concerns raised.

2021

pdf bib abs

Predicting Text Readability from Scrolling Interactions
Sian Gooding | Yevgeni Berzak | Tony Mak | Matt Sharifi
Proceedings of the 25th Conference on Computational Natural Language Learning

Judging the readability of text has many important applications, for instance when performing text simplification or when sourcing reading material for language learners. In this paper, we present a 518 participant study which investigates how scrolling behaviour relates to the readability of English texts. We make our dataset publicly available and show that (1) there are statistically significant differences in the way readers interact with text depending on the text level, (2) such measures can be used to predict the readability of text, and (3) the background of a reader impacts their reading interactions and the factors contributing to text difficulty.

pdf bib abs

Word Complexity is in the Eye of the Beholder
Sian Gooding | Ekaterina Kochmar | Seid Muhie Yimam | Chris Biemann
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Lexical complexity is a highly subjective notion, yet this factor is often neglected in lexical simplification and readability systems which use a ”one-size-fits-all” approach. In this paper, we investigate which aspects contribute to the notion of lexical complexity in various groups of readers, focusing on native and non-native speakers of English, and how the notion of complexity changes depending on the proficiency level of a non-native reader. To facilitate reproducibility of our approach and foster further research into these aspects, we release a dataset of complex words annotated by readers with different backgrounds.

2020

pdf bib abs

Detecting Multiword Expression Type Helps Lexical Complexity Assessment
Ekaterina Kochmar | Sian Gooding | Matthew Shardlow
Proceedings of the Twelfth Language Resources and Evaluation Conference

Multiword expressions (MWEs) represent lexemes that should be treated as single lexical units due to their idiosyncratic nature. Multiple NLP applications have been shown to benefit from MWE identification, however the research on lexical complexity of MWEs is still an under-explored area. In this work, we re-annotate the Complex Word Identification Shared Task 2018 dataset of Yimam et al. (2017), which provides complexity scores for a range of lexemes, with the types of MWEs. We release the MWE-annotated dataset with this paper, and we believe this dataset represents a valuable resource for the text simplification community. In addition, we investigate which types of expressions are most problematic for native and non-native readers. Finally, we show that a lexical complexity assessment system benefits from the information about MWE types.

pdf bib abs

SeCoDa: Sense Complexity Dataset
David Strohmaier | Sian Gooding | Shiva Taslimipoor | Ekaterina Kochmar
Proceedings of the Twelfth Language Resources and Evaluation Conference

The Sense Complexity Dataset (SeCoDa) provides a corpus that is annotated jointly for complexity and word senses. It thus provides a valuable resource for both word sense disambiguation and the task of complex word identification. The intention is that this dataset will be used to identify complexity at the level of word senses rather than word tokens. For word sense annotation SeCoDa uses a hierarchical scheme that is based on information available in the Cambridge Advanced Learner’s Dictionary. This way we can offer more coarse-grained senses than directly available in WordNet.

pdf bib abs

Incorporating Multiword Expressions in Phrase Complexity Estimation
Sian Gooding | Shiva Taslimipoor | Ekaterina Kochmar
Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI)

Multiword expressions (MWEs) were shown to be useful in a number of NLP tasks. However, research on the use of MWEs in lexical complexity assessment and simplification is still an under-explored area. In this paper, we propose a text complexity assessment system for English, which incorporates MWE identification. We show that detecting MWEs using state-of-the-art systems improves predicting complexity on an established lexical complexity dataset.

2019

pdf bib abs

Recursive Context-Aware Lexical Simplification
Sian Gooding | Ekaterina Kochmar
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

This paper presents a novel architecture for recursive context-aware lexical simplification, REC-LS, that is capable of (1) making use of the wider context when detecting the words in need of simplification and suggesting alternatives, and (2) taking previous simplification steps into account. We show that our system outputs lexical simplifications that are grammatically correct and semantically appropriate, and outperforms the current state-of-the-art systems in lexical simplification.

pdf bib abs

Complex Word Identification as a Sequence Labelling Task
Sian Gooding | Ekaterina Kochmar
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Complex Word Identification (CWI) is concerned with detection of words in need of simplification and is a crucial first step in a simplification pipeline. It has been shown that reliable CWI systems considerably improve text simplification. However, most CWI systems to date address the task on a word-by-word basis, not taking the context into account. In this paper, we present a novel approach to CWI based on sequence modelling. Our system is capable of performing CWI in context, does not require extensive feature engineering and outperforms state-of-the-art systems on this task.

pdf bib abs

Comparative judgments are more consistent than binary classification for labelling word complexity
Sian Gooding | Ekaterina Kochmar | Advait Sarkar | Alan Blackwell
Proceedings of the 13th Linguistic Annotation Workshop

Lexical simplification systems replace complex words with simple ones based on a model of which words are complex in context. We explore how users can help train complex word identification models through labelling more efficiently and reliably. We show that using an interface where annotators make comparative rather than binary judgments leads to more reliable and consistent labels, and explore whether comparative judgments may provide a faster way for collecting labels.

pdf bib

Active Learning for Financial Investment Reports
Sian Gooding | Ted Briscoe
Proceedings of the Second Financial Narrative Processing Workshop (FNP 2019)

2018

pdf bib abs

CAMB at CWI Shared Task 2018: Complex Word Identification with Ensemble-Based Voting
Sian Gooding | Ekaterina Kochmar
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications

This paper presents the winning systems we submitted to the Complex Word Identification Shared Task 2018. We describe our best performing systems’ implementations and discuss our key findings from this research. Our best-performing systems achieve an F1 score of 0.8792 on the NEWS, 0.8430 on the WIKINEWS and 0.8115 on the WIKIPEDIA test sets in the monolingual English binary classification track, and a mean absolute error of 0.0558 on the NEWS, 0.0674 on the WIKINEWS and 0.0739 on the WIKIPEDIA test sets in the probabilistic track.