Yuan Ma

2026

Text-to-Text Automatic Story Generation: A Survey
Yuan Ma | Hanna Suominen | Patrik Haslum | Richard Susilo
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

Automatic story generation aims to produce coherent, engaging, and contextually consistent narratives with minimal or no human involvement, thereby advancing research in computational creativity and applications in human language technologies. The emergence of large language models has progressed the task, enabling systems to generate multi-thousand-word stories under diverse constraints. Despite these advances, maintaining narrative coherence, character consistency, storyline diversity, and plot controllability in generating stories is still challenging. In this survey, we conduct a systematic review of research published over the past four years to examine the major trends and key limitations in story generation methods, model architectures, datasets, and evaluation methodologies. Based on this analysis of 57 included papers, we propose developing new evaluation metrics and creating more suitable datasets, together with ongoing improvement of narrative coherence and consistency, as well as their exploration in practical applications of story generation, as actions to support continued progress in automatic story generation.

2022

pdf bib abs

Improving Text Simplification with Factuality Error Detection
Yuan Ma | Sandaru Seneviratne | Elena Daskalaki
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)

In the past few years, the field of text simplification has been dominated by supervised learning approaches thanks to the appearance of large parallel datasets such as Wikilarge and Newsela. However, these datasets suffer from sentence pairs with factuality errors which compromise the models’ performance. So, we proposed a model-independent factuality error detection mechanism, considering bad simplification and bad alignment, to refine the Wikilarge dataset through reducing the weight of these samples during training. We demonstrated that this approach improved the performance of the state-of-the-art text simplification model TST5 by an FKGL reduction of 0.33 and 0.29 on the TurkCorpus and ASSET testing datasets respectively. Our study illustrates the impact of erroneous samples in TS datasets and highlights the need for automatic methods to improve their quality.

2018

pdf bib

Multi-lingual Argumentative Corpora in English, Turkish, Greek, Albanian, Croatian, Serbian, Macedonian, Bulgarian, Romanian and Arabic
Alfred Sliwa | Yuan Ma | Ruishen Liu | Niravkumar Borad | Seyedeh Ziyaei | Mina Ghobadi | Firas Sabbah | Ahmet Aker
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib abs

This paper offers a comparative analysis of the performance of different supervised machine learning methods and feature sets on argument mining tasks. Specifically, we address the tasks of extracting argumentative segments from texts and predicting the structure between those segments. Eight classifiers and different combinations of six feature types reported in previous work are evaluated. The results indicate that overall best performing features are the structural ones. Although the performance of classifiers varies depending on the feature combinations and corpora used for training and testing, Random Forest seems to be among the best performing classifiers. These results build a basis for further development of argument mining techniques and can guide an implementation of argument mining into different applications such as argument based search.

Co-authors

Sandaru Seneviratne 1

Hanna Suominen 1

Richard Susilo 1

Venues

Fix author