Matthew Shardlow - ACL Anthology

Matthew Shardlow

2025

Learn, Achieve, Predict, Propose, Forget, Suffer: Analysing and Classifying Anthropomorphisms of LLMs
Matthew Shardlow | Ashley Williams | Charlie Roadhouse | Filippos Ventirozos | Piotr Przybyła
Proceedings of Interdisciplinary Workshop on Observations of Misunderstood, Misguided and Malicious Use of Language Models

Anthropomorphism is a literary device where human-like characteristics are used to refer to non-human entities. However, the use of anthropomorphism in the scientific description and public communication of large language models could lead to misunderstanding amongst scientists and lay-people regarding the technical capabilities and limitations of these models. In this study, we present an analysis of anthropomorphised language commonly used to describe LLMs, showing that the presence of terms such as ‘learn’, ‘achieve’, ‘predict’ and ‘can’ are typically correlated with human labels of anthropomorphism. We also perform experiments to develop a classification system for anthropomorphic descriptions of LLMs in scientific writing at the sentence level. We find that whilst a supervised Roberta-based system identifies anthropomorphisms with F1-score of 0.564, state-of-the-art LLM-based approaches regularly overfit to the task.

Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025)
Matthew Shardlow | Fernando Alva-Manchego | Kai North | Regina Stodden | Horacio Saggion | Nouran Khallaf | Akio Hayakawa
Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025)

GPT-Based Lexical Simplification for Multi-Word Expressions Using Prompt Engineering
Sardar Khan Khayamkhani | Matthew Shardlow
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

Multiword Lexical Simplification (MWLS) is the task of replacing a complex phrase in a sentence with a simpler alternative. Whereas previous approaches to MWLS made use of the BERT language model, we make use of the Generative Pre-trained Transformer architecture. Our approach employs Large Language Models in an auto-regressive format, making use of prompt engineering and few-shot learning to develop new strategies for the MWLS task. We experiment with several GPT-based models and differing experimental settings including varying the number of requested examples, changing the base model type, adapting the prompt and zero-shot, one-shot and k-shot in-context learning. We show that a GPT-4o model with k-shot in-context learning (k=6) demonstrates state-of-the-art performance for the MWLS1 dataset with NDCG=0.3143, PREC@5=0.1048, beating the previous Bert-based approach by a wide margin on several metrics and consistently across subsets. Our findings indicate that GPT-based models are superior to BERT-based models for the MWLS task.

Proceedings of Interdisciplinary Workshop on Observations of Misunderstood, Misguided and Malicious Use of Language Models
Piotr Przybyła | Matthew Shardlow | Clara Colombatto | Nanna Inie
Proceedings of Interdisciplinary Workshop on Observations of Misunderstood, Misguided and Malicious Use of Language Models

Aspect–Sentiment Quad Prediction with Distilled Large Language Models
Filippos Ventirozos | Peter Appleby | Matthew Shardlow
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

Aspect-based sentiment analysis offers detailed insights by pinpointing specific product aspects in a text that are associated with sentiments. This study explores it through the prediction of quadruples, comprising aspect, category, opinion, and polarity. We evaluated in-context learning strategies using recently released distilled large language models, ranging from zero to full-dataset demonstrations. Our findings reveal that the performance of these models now positions them between the current state-of-the-art and significantly higher than their earlier generations. Additionally, we experimented with various chain-of-thought prompts, examining sequences such as aspect to category to sentiment in different orders. Our results indicate that the optimal sequence differs from previous assumptions. Additionally, we found that for quadruple prediction, few-shot demonstrations alone yield better performance than chain-of-thought prompting.

Differential Robustness in Transformer Language Models: Empirical Evaluation under Adversarial Text Attacks
Taniya Gidatkar | Oluwaseun Ajao | Matthew Shardlow
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

This study evaluates the resilience of large language models (LLMs) against adversarial attacks, specifically focusing on Flan-T5, BERT, and RoBERTa-Base. Using systematically designed adversarial tests through TextFooler and BERTAttack, we found significant variations in model robustness. RoBERTa-Base and Flan-T5 demonstrated remarkable resilience, maintaining accuracy even when subjected to sophisticated attacks, with attack success rates of 0%. In contrast, BERT-Base showed considerable vulnerability, with TextFooler achieving a 93.75% success rate in reducing model accuracy from 48% to just 3%. Our research reveals that while certain LLMs have developed effective defensive mechanisms, these safeguards often require substantial computational resources. This study contributes to the understanding of LLM security by identifying existing strengths and weaknesses in current safeguarding approaches and proposes practical recommendations for developing more efficient and effective defensive strategies

TVS Sidekick: Challenges and Practical Insights from Deploying Large Language Models in the Enterprise
Paula Reyero Lobo | Kevin Johnson | Bill Buchanan | Matthew Shardlow | Ashley Williams | Sam Attwood
Proceedings of the First Workshop on Ethical Concerns in Training, Evaluating and Deploying Large Language Models

Many enterprises are increasingly adopting Artificial Intelligence (AI) to make internal processes more competitive and efficient. In response to public concern and new regulations for the ethical and responsible use of AI, implementing AI governance frameworks could help to integrate AI within organisations and mitigate associated risks. However, the rapid technological advances and lack of shared ethical AI infrastructures creates barriers to their practical adoption in businesses. This paper presents a real-world AI application at TVS Supply Chain Solutions, reporting on the experience developing an AI assistant underpinned by large language models and the ethical, regulatory, and sociotechnical challenges in deployment for enterprise use.

Are You Sure You’re Positive? Consolidating Chain-of-Thought Agents with Uncertainty Quantification for Aspect-Category Sentiment Analysis
Filippos Ventirozos | Peter A. Appleby | Matthew Shardlow
Proceedings of the 1st Workshop for Research on Agent Language Models (REALM 2025)

Aspect-category sentiment analysis provides granular insights by identifying specific themes within product reviews that are associated with particular opinions. Supervised learning approaches dominate the field. However, data is scarce and expensive to annotate for new domains. We argue that leveraging large language models in a zero-shot setting is beneficial where the time and resources required for dataset annotation are limited. Furthermore, annotation bias may lead to strong results using supervised methods but transfer poorly to new domains in contexts that lack annotations and demand reproducibility. In our work, we propose novel techniques that combine multiple chain-of-thought agents by leveraging large language models’ token-level uncertainty scores. We experiment with the 3B and 70B+ parameter size variants of Llama and Qwen models, demonstrating how these approaches can fulfil practical needs and opening a discussion on how to gauge accuracy in label-scarce conditions.

Exploring Supervised Approaches to the Detection of Anthropomorphic Language in the Reporting of NLP Venues
Matthew Shardlow | Ashley Williams | Charlie Roadhouse | Filippos Ventirozos | Piotr Przybyła
Findings of the Association for Computational Linguistics: ACL 2025

We investigate the prevalence of anthropomorphic language in the reporting of AI technology, focussed on NLP and LLMs. We undertake a corpus annotation focussing on one year of ACL long-paper abstracts and news articles from the same period. We find that 74% of ACL abstracts and 88% of news articles contain some form of anthropomorphic description of AI technology. Further, we train a regression classifier based on BERT, demonstrating that we can automatically label abstracts for their degree of anthropomorphism based on our corpus. We conclude by applying this labelling process to abstracts available in the entire history of the ACL Anthology and reporting on diachronic and inter-venue findings, showing that the degree of anthropomorphism is increasing at all examined venues over time.

Efficient On-Device Text Simplification for Firefox with Synthetic Data Fine-Tuning
Pablo Romero | Zihao Li | Matthew Shardlow
Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025)

This work presents a system for on-device text simplification that enables users to process sensitive text without relying on cloud-based services. Through the use of quantization techniques and a novel approach to controllable text simplification we reduce model size by up to 75 percent with minimal performance degradation. Our models demonstrate efficient state-of-the-art results using a synthetic dataset of 2909 examples outperforming prior work trained on 300K examples. This efficiency stems from (1) a single control token strategy that precisely targets specific reading levels (2) a contrastive training approach that enriches model understanding through exposure to multiple simplification levels and (3) individual models that dedicate full parameter capacity to specific reading level transformations. Our best models achieve up to 82.18 BLEU at the Advanced level and 46.12 SARI at the Elementary level on standard benchmarks with performance preserved even after aggressive quantization. This work is implemented as a collaboration with the Mozilla AI team to process text entirely locally ensuring sensitive information never leaves the users device. We have a demonstration video https//youtu.be/TzmaxnARMzg and a web demo available at https//pablorom2004.github.io/Simplification-Web-Demo

2024

We report the findings of the 2024 Multilingual Lexical Simplification Pipeline shared task. We released a new dataset comprising 5,927 instances of lexical complexity prediction and lexical simplification on common contexts across 10 languages, split into trial (300) and test (5,627). 10 teams participated across 2 tracks and 10 languages with 233 runs evaluated across all systems. Five teams participated in all languages for the lexical complexity prediction task and 4 teams participated in all languages for the lexical simplification task. Teams employed a range of strategies, making use of open and closed source large language models for lexical simplification, as well as feature-based approaches for lexical complexity prediction. The highest scoring team on the combined multilingual data was able to obtain a Pearson’s correlation of 0.6241 and an ACC@1@Top1 of 0.3772, both demonstrating that there is still room for improvement on two difficult sub-tasks of the lexical simplification pipeline.

We present preliminary findings on the MultiLS dataset, developed in support of the 2024 Multilingual Lexical Simplification Pipeline (MLSP) Shared Task. This dataset currently comprises of 300 instances of lexical complexity prediction and lexical simplification across 10 languages. In this paper, we (1) describe the annotation protocol in support of the contribution of future datasets and (2) present summary statistics on the existing data that we have gathered. Multilingual lexical simplification can be used to support low-ability readers to engage with otherwise difficult texts in their native, often low-resourced, languages.

Multilingual Resources for Lexical Complexity Prediction: A Review
Matthew Shardlow | Kai North | Marcos Zampieri
Proceedings of the Workshop on DeTermIt! Evaluating Text Difficulty in a Multilingual Context @ LREC-COLING 2024

Lexical complexity prediction is the NLP task aimed at using machine learning to predict the difficulty of a target word in context for a given user or user group. Multiple datasets exist for lexical complexity prediction, many of which have been published recently in diverse languages. In this survey, we discuss nine recent datasets (2018-2024) all of which provide lexical complexity prediction annotations. Particularly, we identified eight languages (French, Spanish, Chinese, German, Russian, Japanese, Turkish and Portuguese) with at least one lexical complexity dataset. We do not consider the English datasets, which have already received significant treatment elsewhere in the literature. To survey these datasets, we use the recommendations of the Complex 2.0 Framework (Shardlow et al., 2022), identifying how the datasets differ along the following dimensions: annotation scale, context, multiple token instances, multiple token annotations, diverse annotators. We conclude with future research challenges arising from our survey of existing lexical complexity prediction datasets.

Overview of the BioLaySumm 2024 Shared Task on the Lay Summarization of Biomedical Research Articles
Tomas Goldsack | Carolina Scarton | Matthew Shardlow | Chenghua Lin
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing

This paper presents the setup and results of the second edition of the BioLaySumm shared task on the Lay Summarisation of Biomedical Research Articles, hosted at the BioNLP Workshop at ACL 2024. In this task edition, we aim to build on the first edition’s success by further increasing research interest in this important task and encouraging participants to explore novel approaches that will help advance the state-of-the-art. Encouragingly, we found research interest in the task to be high, with this edition of the task attracting a total of 53 participating teams, a significant increase in engagement from the previous edition. Overall, our results show that a broad range of innovative approaches were adopted by task participants, with a predictable shift towards the use of Large Language Models (LLMs).

MultiLS: An End-to-End Lexical Simplification Framework
Kai North | Tharindu Ranasinghe | Matthew Shardlow | Marcos Zampieri
Proceedings of the Third Workshop on Text Simplification, Accessibility and Readability (TSAR 2024)

Lexical Simplification (LS) automatically replaces difficult to read words for easier alternatives while preserving a sentence’s original meaning. Several datasets exist for LS and each of them specialize in one or two sub-tasks within the LS pipeline. However, as of this moment, no single LS dataset has been developed that covers all LS sub-tasks. We present MultiLS, the first LS framework that allows for the creation of a multi-task LS dataset. We also present MultiLS-PT, the first dataset created using the MultiLS framework. We demonstrate the potential of MultiLS-PT by carrying out all LS sub-tasks of (1) lexical complexity prediction (LCP), (2) substitute generation, and (3) substitute ranking for Portuguese.

Proceedings of the Third Workshop on Text Simplification, Accessibility and Readability (TSAR 2024)
Matthew Shardlow | Horacio Saggion | Fernando Alva-Manchego | Marcos Zampieri | Kai North | Sanja Štajner | Regina Stodden
Proceedings of the Third Workshop on Text Simplification, Accessibility and Readability (TSAR 2024)

2023

Proceedings of the Second Workshop on Text Simplification, Accessibility and Readability
Sanja Štajner | Horacio Saggio | Matthew Shardlow | Fernando Alva-Manchego
Proceedings of the Second Workshop on Text Simplification, Accessibility and Readability

Document-level Text Simplification with Coherence Evaluation
Laura Vásquez-Rodríguez | Matthew Shardlow | Piotr Przybyła | Sophia Ananiadou
Proceedings of the Second Workshop on Text Simplification, Accessibility and Readability

We present a coherence-aware evaluation of document-level Text Simplification (TS), an approach that has not been considered in TS so far. We improve current TS sentence-based models to support a multi-sentence setting and the implementation of a state-of-the-art neural coherence model for simplification quality assessment. We enhanced English sentence simplification neural models for document-level simplification using 136,113 paragraph-level samples from both the general and medical domains to generate multiple sentences. Additionally, we use document-level simplification, readability and coherence metrics for evaluation. Our contributions include the introduction of coherence assessment into simplification evaluation with the automatic evaluation of 34,052 simplifications, a fine-tuned state-of-the-art model for document-level simplification, a coherence-based analysis of our results and a human evaluation of 300 samples that demonstrates the challenges encountered when moving towards document-level simplification.

Overview of the BioLaySumm 2023 Shared Task on Lay Summarization of Biomedical Research Articles
Tomas Goldsack | Zheheng Luo | Qianqian Xie | Carolina Scarton | Matthew Shardlow | Sophia Ananiadou | Chenghua Lin
Proceedings of the 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks

This paper presents the results of the shared task on Lay Summarisation of Biomedical Research Articles (BioLaySumm), hosted at the BioNLP Workshop at ACL 2023. The goal of this shared task is to develop abstractive summarisation models capable of generating “lay summaries” (i.e., summaries that are comprehensible to non-technical audiences) in both a controllable and non-controllable setting. There are two subtasks: 1) Lay Summarisation, where the goal is for participants to build models for lay summary generation only, given the full article text and the corresponding abstract as input; and2) Readability-controlled Summarisation, where the goal is for participants to train models to generate both the technical abstract and the lay summary, given an article’s main text as input. In addition to overall results, we report on the setup and insights from the BioLaySumm shared task, which attracted a total of 20 participating teams across both subtasks.

ALEXSIS+: Improving Substitute Generation and Selection for Lexical Simplification with Information Retrieval
Kai North | Alphaeus Dmonte | Tharindu Ranasinghe | Matthew Shardlow | Marcos Zampieri
Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)

Lexical simplification (LS) automatically replaces words that are deemed difficult to understand for a given target population with simpler alternatives, whilst preserving the meaning of the original sentence. The TSAR-2022 shared task on LS provided participants with a multilingual lexical simplification test set. It contained nearly 1,200 complex words in English, Portuguese, and Spanish and presented multiple candidate substitutions for each complex word. The competition did not make training data available; therefore, teams had to use either off-the-shelf pre-trained large language models (LLMs) or out-domain data to develop their LS systems. As such, participants were unable to fully explore the capabilities of LLMs by re-training and/or fine-tuning them on in-domain data. To address this important limitation, we present ALEXSIS+, a multilingual dataset in the aforementioned three languages, and ALEXSIS++, an English monolingual dataset that together contains more than 50,000 unique sentences retrieved from news corpora and annotated with cosine similarities to the original complex word and sentence. Using these additional contexts, we are able to generate new high-quality candidate substitutions that improve LS performance on the TSAR-2022 test set regardless of the language or model.

BLESS: Benchmarking Large Language Models on Sentence Simplification
Tannon Kew | Alison Chi | Laura Vásquez-Rodríguez | Sweta Agrawal | Dennis Aumiller | Fernando Alva-Manchego | Matthew Shardlow
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

We present BLESS, a comprehensive performance benchmark of the most recent state-of-the-art Large Language Models (LLMs) on the task of text simplification (TS). We examine how well off-the-shelf LLMs can solve this challenging task, assessing a total of 44 models, differing in size, architecture, pre-training methods, and accessibility, on three test sets from different domains (Wikipedia, news, and medical) under a few-shot setting. Our analysis considers a suite of automatic metrics, as well as a large-scale quantitative investigation into the types of common edit operations performed by the different models. Furthermore, we perform a manual qualitative analysis on a subset of model outputs to better gauge the quality of the generated simplifications. Our evaluation indicates that the best LLMs, despite not being trained on TS perform comparably with state-of-the-art TS baselines. Additionally, we find that certain LLMs demonstrate a greater range and diversity of edit operations. Our performance benchmark will be available as a resource for the development of future TS methods and evaluation metrics.

Comparing Generic and Expert Models for Genre-Specific Text Simplification
Zihao Li | Matthew Shardlow | Fernando Alva-Manchego
Proceedings of the Second Workshop on Text Simplification, Accessibility and Readability

We investigate how text genre influences the performance of models for controlled text simplification. Regarding datasets from Wikipedia and PubMed as two different genres, we compare the performance of genre-specific models trained by transfer learning and prompt-only GPT-like large language models. Our experiments showed that: (1) the performance loss of genre-specific models on general tasks can be limited to 2%, (2) transfer learning can improve performance on genre-specific datasets up to 10% in SARI score from the base model without transfer learning, (3) simplifications generated by the smaller but more customized models show similar performance in simplicity and a better meaning reservation capability to the larger generic models in both automatic and human evaluations.

Simplification by Lexical Deletion
Matthew Shardlow | Piotr Przybyła
Proceedings of the Second Workshop on Text Simplification, Accessibility and Readability

Lexical simplification traditionally focuses on the replacement of tokens with simpler alternatives. However, in some cases the goal of this task (simplifying the form while preserving the meaning) may be better served by removing a word rather than replacing it. In fact, we show that existing datasets rely heavily on the deletion operation. We propose supervised and unsupervised solutions for lexical deletion based on classification, end-to-end simplification systems and custom language models. We contribute a new silver-standard corpus of lexical deletions (called SimpleDelete), which we mine from simple English Wikipedia edit histories and use to evaluate approaches to detecting superfluous words. The results show that even unsupervised approaches (TerseBERT) can achieve good performance in this new task. Deletion is one part of the wider lexical simplification puzzle, which we show can be isolated and investigated.

2022

Findings of the TSAR-2022 Shared Task on Multilingual Lexical Simplification
Horacio Saggion | Sanja Štajner | Daniel Ferrés | Kim Cheng Sheang | Matthew Shardlow | Kai North | Marcos Zampieri
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)

We report findings of the TSAR-2022 shared task on multilingual lexical simplification, organized as part of the Workshop on Text Simplification, Accessibility, and Readability TSAR-2022 held in conjunction with EMNLP 2022. The task called the Natural Language Processing research community to contribute with methods to advance the state of the art in multilingual lexical simplification for English, Portuguese, and Spanish. A total of 14 teams submitted the results of their lexical simplification systems for the provided test data. Results of the shared task indicate new benchmarks in Lexical Simplification with English lexical simplification quantitative results noticeably higher than those obtained for Spanish and (Brazilian) Portuguese.

Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)
Sanja Štajner | Horacio Saggion | Daniel Ferrés | Matthew Shardlow | Kim Cheng Sheang | Kai North | Marcos Zampieri | Wei Xu
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)

Agree to Disagree: Exploring Subjectivity in Lexical Complexity
Matthew Shardlow
Proceedings of the 2nd Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI) within the 13th Language Resources and Evaluation Conference

Subjective factors affect our familiarity with different words. Our education, mother tongue, dialect or social group all contribute to the words we know and understand. When asking people to mark words they understand some words are unanimously agreed to be complex, whereas other annotators universally disagree on the complexity of other words. In this work, we seek to expose this phenomenon and investigate the factors affecting whether a word is likely to be subjective, or not. We investigate two recent word complexity datasets from shared tasks. We demonstrate that subjectivity is present and describable in both datasets. Further we show results of modelling and predicting the subjectivity of the complexity annotations in the most recent dataset, attaining an F1-score of 0.714.

An Evaluation of Binary Comparative Lexical Complexity Models
Kai North | Marcos Zampieri | Matthew Shardlow
Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022)

Identifying complex words in texts is an important first step in text simplification (TS) systems. In this paper, we investigate the performance of binary comparative Lexical Complexity Prediction (LCP) models applied to a popular benchmark dataset — the CompLex 2.0 dataset used in SemEval-2021 Task 1. With the data from CompLex 2.0, we create a new dataset contain 1,940 sentences referred to as CompLex-BC. Using CompLex-BC, we train multiple models to differentiate which of two target words is more or less complex in the same sentence. A linear SVM model achieved the best performance in our experiments with an F1-score of 0.86.

UoM&MMU at TSAR-2022 Shared Task: Prompt Learning for Lexical Simplification
Laura Vásquez-Rodríguez | Nhung Nguyen | Matthew Shardlow | Sophia Ananiadou
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)

We present PromptLS, a method for fine-tuning large pre-trained Language Models (LM) to perform the task of Lexical Simplification. We use a predefined template to attain appropriate replacements for a term, and fine-tune a LM using this template on language specific datasets. We filter candidate lists in post-processing to improve accuracy. We demonstrate that our model can work in a) a zero shot setting (where we only require a pre-trained LM), b) a fine-tuned setting (where language-specific data is required), and c) a multilingual setting (where the model is pre-trained across multiple languages and fine-tuned in an specific language). Experimental results show that, although the zero-shot setting is competitive, its performance is still far from the fine-tuned setting. Also, the multilingual is unsurprisingly worse than the fine-tuned model. Among all TSAR-2022 Shared Task participants, our team was ranked second in Spanish and third in English.

Towards Readability-Controlled Machine Translation of COVID-19 Texts
Fernando Alva-Manchego | Matthew Shardlow
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

This project investigates the capabilities of Machine Translation models for generating translations at varying levels of readability, focusing on texts related to COVID-19. Whilst it is possible to automatically translate this information, the resulting text may contain specialised terminology, or may be written in a style that is difficult for lay readers to understand. So far, we have collected a new dataset with manual simplifications for English and Spanish sentences in the TICO-19 dataset, as well as implemented baseline pipelines combining Machine Translation and Text Simplification models.

Using NLP to quantify the environmental cost and diversity benefits of in-person NLP conferences
Piotr Przybyła | Matthew Shardlow
Findings of the Association for Computational Linguistics: ACL 2022

The environmental costs of research are progressively important to the NLP community and their associated challenges are increasingly debated. In this work, we analyse the carbon cost (measured as CO2-equivalent) associated with journeys made by researchers attending in-person NLP conferences. We obtain the necessary data by text-mining all publications from the ACL anthology available at the time of the study (n=60,572) and extracting information about an author’s affiliation, including their address. This allows us to estimate the corresponding carbon cost and compare it to previously known values for training large models. Further, we look at the benefits of in-person conferences by demonstrating that they can increase participation diversity by encouraging attendance from the region surrounding the host country. We show how the trade-off between carbon cost and diversity of an event depends on its location and type. Our aim is to foster further discussion on the best way to address the joint issue of emissions and diversity in the future.

Simple TICO-19: A Dataset for Joint Translation and Simplification of COVID-19 Texts
Matthew Shardlow | Fernando Alva-Manchego
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Specialist high-quality information is typically first available in English, and it is written in a language that may be difficult to understand by most readers. While Machine Translation technologies contribute to mitigate the first issue, the translated content will most likely still contain complex language. In order to investigate and address both problems simultaneously, we introduce Simple TICO-19, a new language resource containing manual simplifications of the English and Spanish portions of the TICO-19 corpus for Machine Translation of COVID-19 literature. We provide an in-depth description of the annotation process, which entailed designing an annotation manual and employing four annotators (two native English speakers and two native Spanish speakers) who simplified over 6,000 sentences from the English and Spanish portions of the TICO-19 corpus. We report several statistics on the new dataset, focusing on analysing the improvements in readability from the original texts to their simplified versions. In addition, we propose baseline methodologies for automatically generating the simplifications, translations and joint translation and simplifications contained in our dataset.

An Investigation into the Effect of Control Tokens on Text Simplification
Zihao Li | Matthew Shardlow | Saeed Hassan
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)

Recent work on text simplification has focused on the use of control tokens to further the state of the art. However, it is not easy to further improve without an in-depth comprehension of the mechanisms underlying control tokens. One unexplored factor is the tokenisation strategy, which we also explore. In this paper, we (1) reimplemented ACCESS, (2) explored the effects of varying control tokens, (3) tested the influences of different tokenisation strategies, and (4) demonstrated how separate control tokens affect performance. We show variations of performance in the four control tokens separately. We also uncover how the design of control tokens could influence the performance and propose some suggestions for designing control tokens, which also reaches into other controllable text generation tasks.

2021

Investigating Text Simplification Evaluation
Laura Vásquez-Rodríguez | Matthew Shardlow | Piotr Przybyła | Sophia Ananiadou
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

Manchester Metropolitan at SemEval-2021 Task 1: Convolutional Networks for Complex Word Identification
Robert Flynn | Matthew Shardlow
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

We present two convolutional neural networks for predicting the complexity of words and phrases in context on a continuous scale. Both models utilize word and character embeddings alongside lexical features as inputs. Our system displays reasonable results with a Pearson correlation of 0.7754 on the task as a whole. We highlight the limitations of this method in properly assessing the context of the target text, and explore the effectiveness of both systems across a range of genres. Both models were submitted as part of LCP 2021, which focuses on the identification of complex words and phrases as a context dependent, regression based task.

SemEval-2021 Task 1: Lexical Complexity Prediction
Matthew Shardlow | Richard Evans | Gustavo Henrique Paetzold | Marcos Zampieri
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper presents the results and main findings of SemEval-2021 Task 1 - Lexical Complexity Prediction. We provided participants with an augmented version of the CompLex Corpus (Shardlow et al. 2020). CompLex is an English multi-domain corpus in which words and multi-word expressions (MWEs) were annotated with respect to their complexity using a five point Likert scale. SemEval-2021 Task 1 featured two Sub-tasks: Sub-task 1 focused on single words and Sub-task 2 focused on MWEs. The competition attracted 198 teams in total, of which 54 teams submitted official runs on the test data to Sub-task 1 and 37 to Sub-task 2.

2020

CompLex — A New Corpus for Lexical Complexity Prediction from Likert Scale Data
Matthew Shardlow | Michael Cooper | Marcos Zampieri
Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI)

Predicting which words are considered hard to understand for a given target population is a vital step in many NLP applications such astext simplification. This task is commonly referred to as Complex Word Identification (CWI). With a few exceptions, previous studieshave approached the task as a binary classification task in which systems predict a complexity value (complex vs. non-complex) fora set of target words in a text. This choice is motivated by the fact that all CWI datasets compiled so far have been annotated using abinary annotation scheme. Our paper addresses this limitation by presenting the first English dataset for continuous lexical complexityprediction. We use a 5-point Likert scale scheme to annotate complex words in texts from three sources/domains: the Bible, Europarl,and biomedical texts. This resulted in a corpus of 9,476 sentences each annotated by around 7 annotators.

Detecting Multiword Expression Type Helps Lexical Complexity Assessment
Ekaterina Kochmar | Sian Gooding | Matthew Shardlow
Proceedings of the Twelfth Language Resources and Evaluation Conference

Multiword expressions (MWEs) represent lexemes that should be treated as single lexical units due to their idiosyncratic nature. Multiple NLP applications have been shown to benefit from MWE identification, however the research on lexical complexity of MWEs is still an under-explored area. In this work, we re-annotate the Complex Word Identification Shared Task 2018 dataset of Yimam et al. (2017), which provides complexity scores for a range of lexemes, with the types of MWEs. We release the MWE-annotated dataset with this paper, and we believe this dataset represents a valuable resource for the text simplification community. In addition, we investigate which types of expressions are most problematic for native and non-native readers. Finally, we show that a lexical complexity assessment system benefits from the information about MWE types.

Multi-Word Lexical Simplification
Piotr Przybyła | Matthew Shardlow
Proceedings of the 28th International Conference on Computational Linguistics

In this work we propose the task of multi-word lexical simplification, in which a sentence in natural language is made easier to understand by replacing its fragment with a simpler alternative, both of which can consist of many words. In order to explore this new direction, we contribute a corpus (MWLS1), including 1462 sentences in English from various sources with 7059 simplifications provided by human annotators. We also propose an automatic solution (Plainifier) based on a purpose-trained neural language model and evaluate its performance, comparing to human and resource-based baselines.

CombiNMT: An Exploration into Neural Text Simplification Models
Michael Cooper | Matthew Shardlow
Proceedings of the Twelfth Language Resources and Evaluation Conference

This work presents a replication study of Exploring Neural Text Simplification Models (Nisioi et al., 2017). We were able to successfully replicate and extend the methods presented in the original paper. Alongside the replication results, we present our improvements dubbed CombiNMT. By using an updated implementation of OpenNMT, and incorporating the Newsela corpus alongside the original Wikipedia dataset (Hwang et al., 2016), as well as refining both datasets to select high quality training examples. Our work present two new systems, CombiNMT995, which is a result of matched sentences with a cosine similarity of 0.995 or less, and CombiNMT98, which, similarly, runs on a cosine similarity of 0.98 or less. By extending the human evaluation presented within the original paper, increasing both the number of annotators and the number of sentences annotated, with the intention of increasing the quality of the results, CombiNMT998 shows significant improvement over any of the Neural Text Simplification (NTS) systems from the original paper in terms of both the number of changes and the percentage of correct changes made.

2019

Neural Text Simplification of Clinical Letters with a Domain Specific Phrase Table
Matthew Shardlow | Raheel Nawaz
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Clinical letters are infamously impenetrable for the lay patient. This work uses neural text simplification methods to automatically improve the understandability of clinical letters for patients. We take existing neural text simplification software and augment it with a new phrase table that links complex medical terminology to simpler vocabulary by mining SNOMED-CT. In an evaluation task using crowdsourcing, we show that the results of our new system are ranked easier to understand (average rank 1.93) than using the original system (2.34) without our phrase table. We also show improvement against baselines including the original text (2.79) and using the phrase table without the neural text simplification software (2.94). Our methods can easily be transferred outside of the clinical domain by using domain-appropriate resources to provide effective neural text simplification for any domain without the need for costly annotation.

2018

Manchester Metropolitan at SemEval-2018 Task 2: Random Forest with an Ensemble of Features for Predicting Emoji in Tweets
Luciano Gerber | Matthew Shardlow
Proceedings of the 12th International Workshop on Semantic Evaluation

We present our submission to the Semeval 2018 task on emoji prediction. We used a random forest, with an ensemble of bag-of-words, sentiment and psycholinguistic features. Although we performed well on the trial dataset (attaining a macro f-score of 63.185 for English and 81.381 for Spanish), our approach did not perform as well on the test data. We describe our features and classi cation protocol, as well as initial experiments, concluding with a discussion of the discrepancy between our trial and test results.

A New Corpus to Support Text Mining for the Curation of Metabolites in the ChEBI Database
Matthew Shardlow | Nhung Nguyen | Gareth Owen | Claire O’Donovan | Andrew Leach | John McNaught | Steve Turner | Sophia Ananiadou
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

NaCTeM at SemEval-2016 Task 1: Inferring sentence-level semantic similarity from an ensemble of complementary lexical and sentence-level features
Piotr Przybyła | Nhung T. H. Nguyen | Matthew Shardlow | Georgios Kontonatsios | Sophia Ananiadou
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2014

Out in the Open: Finding and Categorising Errors in the Lexical Simplification Pipeline
Matthew Shardlow
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Lexical simplification is the task of automatically reducing the complexity of a text by identifying difficult words and replacing them with simpler alternatives. Whilst this is a valuable application of natural language generation, rudimentary lexical simplification systems suffer from a high error rate which often results in nonsensical, non-simple text. This paper seeks to characterise and quantify the errors which occur in a typical baseline lexical simplification system. We expose 6 distinct categories of error and propose a classification scheme for these. We also quantify these errors for a moderate size corpus, showing the magnitude of each error type. We find that for 183 identified simplification instances, only 19 (10.38%) result in a valid simplification, with the rest causing errors of varying gravity.

2013

The CW Corpus: A New Resource for Evaluating the Identification of Complex Words
Matthew Shardlow
Proceedings of the Second Workshop on Predicting and Improving Text Readability for Target Reader Populations

A Comparison of Techniques to Automatically Identify Complex Words.
Matthew Shardlow
51st Annual Meeting of the Association for Computational Linguistics Proceedings of the Student Research Workshop

Co-authors

Horacio Saggion 6

Sanja Štajner 5

Tharindu Ranasinghe 4

Filippos Ventirozos 4

Laura Vásquez-Rodríguez 4

Akio Hayakawa 3

Ashley Williams 3

Riza Theresa Batista-Navarro 2

Saul Calderon-Ramirez 2

Michael Cooper 2

Daniel Ferrés 2

Thomas François 2

Tomas Goldsack 2

Andrea Horbach 2

Anna Hülsing 2

Joseph Marvin Imperial 2

Laura Occhipinti 2

Nishat Raihan 2

Charlie Roadhouse 2

Nelson Peréz Rojas 2

Martin Solis Salazar 2

Carolina Scarton 2

Kim Cheng Sheang 2

Regina Stodden 2

Sweta Agrawal 1

Oluwaseun Ajao 1

Peter Appleby 1

Peter A. Appleby 1

Dennis Aumiller 1

Bill Buchanan 1

Clara Colombatto 1

Alphaeus Dmonte 1

Richard Evans 1

Luciano Gerber 1

Taniya Gidatkar 1

Kevin Johnson 1

Nouran Khallaf 1

Sardar Khan Khayamkhani 1

Ekaterina Kochmar 1

Georgios Kontonatsios 1

John McNaught 1

Nhung T. H. Nguyen 1

Claire O’Donovan 1

Gustavo Paetzold 1

Paula Reyero Lobo 1

Horacio Saggio 1

Venues