Miriam Schirmer


2024

pdf bib
The Language of Trauma: Modeling Traumatic Event Descriptions Across Domains with Explainable AI
Miriam Schirmer | Tobias Leemann | Gjergji Kasneci | Jürgen Pfeffer | David Jurgens
Findings of the Association for Computational Linguistics: EMNLP 2024

Psychological trauma can manifest following various distressing events and is captured in diverse online contexts. However, studies traditionally focus on a single aspect of trauma, often neglecting the transferability of findings across different scenarios. We address this gap by training various language models with progressing complexity on trauma-related datasets, including genocide-related court data, a Reddit dataset on post-traumatic stress disorder (PTSD), counseling conversations, and Incel forum posts. Our results show that the fine-tuned RoBERTa model excels in predicting traumatic events across domains, slightly outperforming large language models like GPT-4. Additionally, SLALOM-feature scores and conceptual explanations effectively differentiate and cluster trauma-related language, highlighting different trauma aspects and identifying sexual abuse and experiences related to death as a common traumatic event across all datasets. This transferability is crucial as it allows for the development of tools to enhance trauma detection and intervention in diverse populations and settings.

pdf bib
GENTRAC: A Tool for Tracing Trauma in Genocide and Mass Atrocity Court Transcripts
Miriam Schirmer | Christian Brechenmacher | Endrit Jashari | Juergen Pfeffer
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper introduces GENTRAC, an open-access web-based tool built to interactively detect and analyze potentially traumatic content in witness statements of genocide and mass atrocity trials. Harnessing recent developments in natural language processing (NLP) to detect trauma, GENTRAC processes and formats court transcripts for NLP analysis through a sophisticated parsing algorithm and detects the likelihood of traumatic content for each speaker segment. The tool visualizes the density of such content throughout a trial day and provides statistics on the overall amount of traumatic content and speaker distribution. Capable of processing transcripts from four prominent international criminal courts, including the International Criminal Court (ICC), GENTRAC’s reach is vast, tailored to handle millions of pages of documents from past and future trials. Detecting potentially re-traumatizing examination methods can enhance the development of trauma-informed legal procedures. GENTRAC also serves as a reliable resource for legal, human rights, and other professionals, aiding their comprehension of mass atrocities’ emotional toll on survivors.

2022

pdf bib
A New Dataset for Topic-Based Paragraph Classification in Genocide-Related Court Transcripts
Miriam Schirmer | Udo Kruschwitz | Gregor Donabauer
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Recent progress in natural language processing has been impressive in many different areas with transformer-based approaches setting new benchmarks for a wide range of applications. This development has also lowered the barriers for people outside the NLP community to tap into the tools and resources applied to a variety of domain-specific applications. The bottleneck however still remains the lack of annotated gold-standard collections as soon as one’s research or professional interest falls outside the scope of what is readily available. One such area is genocide-related research (also including the work of experts who have a professional interest in accessing, exploring and searching large-scale document collections on the topic, such as lawyers). We present GTC (Genocide Transcript Corpus), the first annotated corpus of genocide-related court transcripts which serves three purposes: (1) to provide a first reference corpus for the community, (2) to establish benchmark performances (using state-of-the-art transformer-based approaches) for the new classification task of paragraph identification of violence-related witness statements, (3) to explore first steps towards transfer learning within the domain. We consider our contribution to be addressing in particular this year’s hot topic on Language Technology for All.