Is Sentence Splitting a Solved Task? Experiments to the Intersection between NLP and Italian Linguistics

Arianna Redaelli, Rachele Sprugnoli


Abstract
Sentence splitting, that is the segmentation of the raw input text into sentences, is a fundamental step in text processing. Although it is considered a solved task for texts such as news articles and Wikipedia pages, the performance of systems can vary greatly depending on the text genre. This paper presents the evaluation of the performance of eight sentence splitting tools adopting different approaches (rule-based, supervised, semi-supervised, and unsupervised learning) on Italian 19th-century novels, a genre that has not received sufficient attention so far but which can be an interesting common ground between Natural Language Processing and Digital Humanities.
Anthology ID:
2024.clicit-1.88
Volume:
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
Month:
December
Year:
2024
Address:
Pisa, Italy
Editors:
Felice Dell'Orletta, Alessandro Lenci, Simonetta Montemagni, Rachele Sprugnoli
Venue:
CLiC-it
SIG:
Publisher:
CEUR Workshop Proceedings
Note:
Pages:
813–820
Language:
URL:
https://aclanthology.org/2024.clicit-1.88/
DOI:
Bibkey:
Cite (ACL):
Arianna Redaelli and Rachele Sprugnoli. 2024. Is Sentence Splitting a Solved Task? Experiments to the Intersection between NLP and Italian Linguistics. In Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), pages 813–820, Pisa, Italy. CEUR Workshop Proceedings.
Cite (Informal):
Is Sentence Splitting a Solved Task? Experiments to the Intersection between NLP and Italian Linguistics (Redaelli & Sprugnoli, CLiC-it 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.clicit-1.88.pdf