Sandra Szasz


2024

pdf bib
Lexical Complexity Prediction and Lexical Simplification for Catalan and Spanish: Resource Creation, Quality Assessment, and Ethical Considerations
Horacio Saggion | Stefan Bott | Sandra Szasz | Nelson Pérez | Saúl Calderón | Martín Solís
Proceedings of the Third Workshop on Text Simplification, Accessibility and Readability (TSAR 2024)

Automatic lexical simplification is a task to substitute lexical items that may be unfamiliar and difficult to understand with easier and more common words. This paper presents the description and analysis of two novel datasets for lexical simplification in Spanish and Catalan. This dataset represents the first of its kind in Catalan and a substantial addition to the sparse data on automatic lexical simplification which is available for Spanish. Specifically, it is the first dataset for Spanish which includes scalar ratings of the understanding difficulty of lexical items. In addition, we present a detailed analysis aiming at assessing the appropriateness and ethical dimensions of the data for the lexical simplification task.

2012

pdf bib
The CONCISUS Corpus of Event Summaries
Horacio Saggion | Sandra Szasz
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Text summarization and information extraction systems require adaptation to new domains and languages. This adaptation usually depends on the availability of language resources such as corpora. In this paper we present a comparable corpus in Spanish and English for the study of cross-lingual information extraction and summarization: the CONCISUS Corpus. It is a rich human-annotated dataset composed of comparable event summaries in Spanish and English covering four different domains: aviation accidents, rail accidents, earthquakes, and terrorist attacks. In addition to the monolingual summaries in English and Spanish, we provide automatic translations and ``comparable'' full event reports of the events. The human annotations are concepts marked in the textual sources representing the key event information associated to the event type. The dataset has also been annotated using text processing pipelines. It is being made freely available to the research community for research purposes.

2011

pdf bib
Multi-domain Cross-lingual Information Extraction from Clean and Noisy Texts
Horacio Saggion | Sandra Szasz
Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology

2010

pdf bib
Human Language Technology for Text-based Analysis of Psychotherapy Sessions in the Spanish Language
Horacio Saggion | Elena Stein-Sparvieri | David Maldavsky | Sandra Szasz
Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas

pdf bib
NLP Resources for the Analysis of Patient/Therapist Interviews
Horacio Saggion | Elena Stein-Sparvieri | David Maldavsky | Sandra Szasz
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We present a set of tools and resources for the analysis of interviews during psychotherapy sessions. One of the main components of the work is a dictionary-based text interpretation tool for the Spanish language. The tool is designed to identify a subset of Freudian drives in patient and therapist discourse.