uppdf
bib
Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025)
Michael Zock
|
Kentaro Inui
|
Zheng Yuan
pdf
bib
abs
Chain-of-MetaWriting: Linguistic and Textual Analysis of How Small Language Models Write Young Students Texts
Ioana Buhnila
|
Georgeta Cislaru
|
Amalia Todirascu
Large Language Models (LLMs) have been used to generate texts in response to different writing tasks: reports, essays, story telling. However, language models do not have a metarepresentation of the text writing process, nor inherent communication learning needs, comparable to those of young human students. This paper introduces a fine-grained linguistic and textual analysis of multilingual Small Language Models’ (SLMs) writing. With our method, Chain-of-MetaWriting, SLMs can imitate some steps of the human writing process, such as planning and evaluation. We mainly focused on short story and essay writing tasks in French for schoolchildren and undergraduate students respectively. Our results show that SLMs encounter difficulties in assisting young students on sensitive topics such as violence in the schoolyard, and they sometimes use words too complex for the target audience. In particular, the output is quite different from the human produced texts in term of text cohesion and coherence regarding temporal connectors, topic progression, reference.
pdf
bib
abs
Semantic Masking in a Needle-in-a-haystack Test for Evaluating Large Language Model Long-Text Capabilities
Ken Shi
|
Gerald Penn
In this paper, we introduce the concept of Semantic Masking, where semantically coherent surrounding text (the haystack) interferes with the retrieval and comprehension of specific information (the needle) embedded within it. We propose the Needle-in-a-Haystack-QA Test, an evaluation pipeline that assesses LLMs’ long-text capabilities through question answering, explicitly accounting for the Semantic Masking effect. We conduct experiments to demonstrate that Semantic Masking significantly impacts LLM performance more than text length does. By accounting for Semantic Masking, we provide a more accurate assessment of LLMs’ true proficiency in utilizing extended contexts, paving the way for future research to develop models that are not only capable of handling longer inputs but are also adept at navigating complex semantic landscapes.
pdf
bib
abs
Reading Between the Lines: A dataset and a study on why some texts are tougher than others
Nouran Khallaf
|
Carlo Eugeni
|
Serge Sharoff
Our research aims at better understanding what makes a text difficult to read for specific audiences with intellectual disabilities, more specifically, people who have limitations in cognitive functioning, such as reading and understanding skills, an IQ below 70, and challenges in conceptual domains. We introduce a scheme for the annotation of difficulties which is based on empirical research in psychology as well as on research in translation studies. The paper describes the annotated dataset, primarily derived from the parallel texts (standard English and Easy to Read English translations) made available online. we fine-tuned four different pre-trained transformer models to perform the task of multiclass classification to predict the strategies required for simplification. We also investigate the possibility to interpret the decisions of this language model when it is aimed at predicting the difficulty of sentences in this dataset.
pdf
bib
abs
ParaRev : Building a dataset for Scientific Paragraph Revision annotated with revision instruction
Léane Jourdan
|
Florian Boudin
|
Richard Dufour
|
Nicolas Hernandez
|
Akiko Aizawa
Revision is a crucial step in scientific writing, where authors refine their work to improve clarity, structure, and academic quality. Existing approaches to automated writing assistance often focus on sentence-level revisions, which fail to capture the broader context needed for effective modification. In this paper, we explore the impact of shifting from sentence-level to paragraph-level scope for the task of scientific text revision. The paragraph level definition of the task allows for more meaningful changes, and is guided by detailed revision instructions rather than general ones. To support this task, we introduce ParaRev, the first dataset of revised scientific paragraphs with an evaluation subset manually annotated with revision instructions. Our experiments demonstrate that using detailed instructions significantly improves the quality of automated revisions compared to general approaches, no matter the model or the metric considered.
pdf
bib
abs
Towards an operative definition of creative writing: a preliminary assessment of creativeness in AI and human texts
Chiara Maggi
|
Andrea Vitaletti
Nowadays, AI is present in all our activities. This pervasive presence is perceived as a threat by many category of users that might be substituted by their AI counterpart. While the potential of AI in handling repetitive tasks is clear, the potentials of its creativeness is still misunderstood. We believe that understanding this aspects of AI can transform a threat into an opportunity. This paper is a first attempt to provide a measurable definition of creativeness. We applied our definition to AI and human generated texts, proving the viability of the proposed approach. Our preliminary experiments show that human texts are more creative.
pdf
bib
abs
Decoding Semantic Representations in the Brain Under Language Stimuli with Large Language Models
Anna Sato
|
Ichiro Kobayashi
Brain decoding technology is paving the way for breakthroughs in the interpretation of neural activity to recreate thoughts, emotions, and movements. Tang et al. (2023) introduced a novel approach that uses language models as generative models for brain decoding based on functional magnetic resonance imaging (fMRI) data. Building on their work, this study explored the use of three additional language models along with the GPT model used in previous research to improve decoding accuracy. Furthermore, we added an evaluation metric using an embedding model, providing higher-level semantic similarity than the BERTScore. By comparing the decoding performance and identifying the factors contributing to good performance, we found that high decoding accuracy does not solely depend on the ability to accurately predict brain activity. Instead, the type of text (e.g., web text, blogs, news articles, and books) that the model tends to generate plays a more significant role in achieving more precise sentence reconstruction.