David Lillis

2025

pdf bib abs
Benchmarking AI Text Detection: Assessing Detectors Against New Datasets, Evasion Tactics, and Enhanced LLMs
Shushanta Pudasaini | Luis Miralles | David Lillis | Marisa Llorens Salvador
Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect)

The rapid advancement of Large Language Models (LLMs), such as GPT-4, has sparked concerns regarding academic misconduct, misinformation, and the erosion of originality. Despite the growing number of AI detection tools, their effectiveness is often undermined by sophisticated evasion tactics and the continuous evolution of LLMs. This research benchmarks the performance of leading AI detectors, including OpenAI Detector, RADAR, and ArguGPT, across a variety of text domains, evaded content, and text generated by cutting-edge LLMs. Our experiments reveal that current detection models show considerable unreliability in real-world scenarios, particularly when tested against diverse data domains and novel evasion strategies. The study underscores the need for enhanced robustness in detection systems and provides valuable insights into areas of improvement for these models. Additionally, this work lays the groundwork for future research by offering a comprehensive evaluation of existing detectors under challenging conditions, fostering a deeper understanding of their limitations. The experimental code and datasets are publicly available for further benchmarking on Github.

2022

pdf bib
Experimenting with ensembles of pre-trained language models for classification of custom legal datasets
Tamara Matthews | David Lillis
Proceedings of the 5th International Conference on Natural Language and Speech Processing (ICNLSP 2022)

2021

pdf bib abs
Can Domain Pre-training Help Interdisciplinary Researchers from Data Annotation Poverty? A Case Study of Legal Argument Mining with BERT-based Transformers
Gechuan Zhang | David Lillis | Paul Nulty
Proceedings of the Workshop on Natural Language Processing for Digital Humanities

Interdisciplinary Natural Language Processing (NLP) research traditionally suffers from the requirement for costly data annotation. However, transformer frameworks with pre-training have shown their ability on many downstream tasks including digital humanities tasks with limited small datasets. Considering the fact that many digital humanities fields (e.g. law) feature an abundance of non-annotated textual resources, and the recent achievements led by transformer models, we pay special attention to whether domain pre-training will enhance transformer’s performance on interdisciplinary tasks and how. In this work, we use legal argument mining as our case study. This aims to automatically identify text segments with particular linguistic structures (i.e., arguments) from legal documents and to predict the reasoning relations between marked arguments. Our work includes a broad survey of a wide range of BERT variants with different pre-training strategies. Our case study focuses on: the comparison of general pre-training and domain pre-training; the generalisability of different domain pre-trained transformers; and the potential of merging general pre-training with domain pre-training. We also achieve better results than the current transformer baseline in legal argument mining.

2020

pdf bib abs
The UCD-Net System at SemEval-2020 Task 1: Temporal Referencing with Semantic Network Distances
Paul Nulty | David Lillis
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes the UCD system entered for SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection. We propose a novel method based on distance between temporally referenced nodes in a semantic network constructed from a combination of the time specific corpora. We argue for the value of semantic networks as objects for transparent exploratory analysis and visualisation of lexical semantic change, and present an implementation of a web application for the purpose of searching and visualising semantic networks. The results of the change measure used for this task were not among the best performing systems, but further calibration of the distance metric and backoff approaches may improve this method.

pdf bib abs
UCD-CS at W-NUT 2020 Shared Task-3: A Text to Text Approach for COVID-19 Event Extraction on Social Media
Congcong Wang | David Lillis
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

In this paper, we describe our approach in the shared task: COVID-19 event extraction from Twitter. The objective of this task is to extract answers from COVID-related tweets to a set of predefined slot-filling questions. Our approach treats the event extraction task as a question answering task by leveraging the transformer-based T5 text-to-text model. According to the official evaluation scores returned, namely F1, our submitted run achieves competitive performance compared to other participating runs (Top 3). However, we argue that this evaluation may underestimate the actual performance of runs based on text-generation. Although some such runs may answer the slot questions well, they may not be an exact string match for the gold standard answers. To measure the extent of this underestimation, we adopt a simple exact-answer transformation method aiming at converting the well-answered predictions to exactly-matched predictions. The results show that after this transformation our run overall reaches the same level of performance as the best participating run and state-of-the-art F1 scores in three of five COVID-related events. Our code is publicly available to aid reproducibility

Co-authors

Congcong Wang 1

Gechuan Zhang 1

Venues

ws1

Fix author