Daria Seredina

2024

pdf bib abs
Overview of Long Story Generation Challenge (LSGC) at INLG 2024
Aleksandr Migal | Daria Seredina | Ludmila Telnina | Nikita Nazarov | Anastasia Kolmogorova | Nikolay Mikhaylovskiy
Proceedings of the 17th International Natural Language Generation Conference: Generation Challenges

This report describes the setup and results of the shared task of human-like long story generation, the LSG Challenge, which asks to generate a consistent, human-like long story (a Harry Potter fanfic in English for a general audience) given a prompt of about 1,000 tokens. We evaluated the submissions using both automated metrics and human evaluation protocols. The automated metrics, including the GAPELMAPER score, assessed the structuredness of the generated texts, while human annotators rated stories on dimensions such as relevance, consistency, fluency, and coherence. Additionally, annotators evaluated the models’ understanding of abstract concepts, causality, the logical order of events, and the avoidance of repeated plot elements. The results highlight the current strengths and limitations of state-of-the-art models in long-form story generation, with key challenges emerging in maintaining coherence over extended narratives and handling complex story dynamics. Our analysis provides insights into future directions for improving long story generation systems.

pdf bib abs
A Report on LSG 2024: LLM Fine-Tuning for Fictional Stories Generation
Daria Seredina
Proceedings of the 17th International Natural Language Generation Conference: Generation Challenges

Our methodology centers around fine-tuning a large language model (LLM), leveraging supervised learning to produce fictional text. Our model was trained on a dataset crafted from a collection of public domain books sourced from Project Gutenberg, which underwent thorough processing. The final fictional text was generated in response to a set of prompts provided in the baseline. Our approach was evaluated using a combination of automatic and human assessments, ensuring a comprehensive evaluation of our model’s performance.

Co-authors

Venues

inlg2

Fix data