Roman Kern

2025

pdf bib abs
Addressing Hallucination in Causal Q&A: The Efficacy of Fine-tuning over Prompting in LLMs
Georg Niess | Houssam Razouk | Stasa Mandic | Roman Kern
Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal)

This paper presents our approach and findings for participating in the FinCausal 2025 competition, which addresses causal question answering derived from financial documents, specifically English and Spanish annual reports. We investigate the effectiveness of generative models, such as Llama, in contrast to common extractive methods like BERT-based token classification. While prompt optimization and few-shot learning offer some improvements, they were insufficient for consistently outperforming extractive methods in FinCausal, suffering from hallucinations. In contrast, fine-tuning generative models was shown to be essential for minimizing hallucinations and achieving superior performance. Using our fine-tuned multilingual model for both tasks, we outperform our extractive and monolingual approaches, achieving top results for Spanish and second-best for English in the competition. Our findings indicate that fine-tuned large language models are well-suited for causal Q&A from complex financial narratives, offering robust multilingual capabilities and effectively mitigating hallucinations.

2022

pdf bib abs
Causal Investigation of Public Opinion during the COVID-19 Pandemic via Social Media Text
Michael Jantscher | Roman Kern
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Understanding the needs and fears of citizens, especially during a pandemic such as COVID-19, is essential for any government or legislative entity. An effective COVID-19 strategy further requires that the public understand and accept the restriction plans imposed by these entities. In this paper, we explore a causal mediation scenario in which we want to emphasize the use of NLP methods in combination with methods from economics and social sciences. Based on sentiment analysis of Tweets towards the current COVID-19 situation in the UK and Sweden, we conduct several causal inference experiments and attempt to decouple the effect of government restrictions on mobility behavior from the effect that occurs due to public perception of the COVID-19 strategy in a country. To avoid biased results we control for valid country specific epidemiological and time-varying confounders. Comprehensive experiments show that not all changes in mobility are caused by countries implemented policies but also by the support of individuals in the fight against this pandemic. We find that social media texts are an important source to capture citizens’ concerns and trust in policy makers and are suitable to evaluate the success of government policies.

pdf bib abs
Impact of Training Instance Selection on Domain-Specific Entity Extraction using BERT
Eileen Salhofer | Xing Lan Liu | Roman Kern
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop

State of the art performances for entity extraction tasks are achieved by supervised learning, specifically, by fine-tuning pretrained language models such as BERT. As a result, annotating application specific data is the first step in many use cases. However, no practical guidelines are available for annotation requirements. This work supports practitioners by empirically answering the frequently asked questions (1) how many training samples to annotate? (2) which examples to annotate? We found that BERT achieves up to 80% F1 when fine-tuned on only 70 training examples, especially on biomedical domain. The key features for guiding the selection of high performing training instances are identified to be pseudo-perplexity and sentence-length. The best training dataset constructed using our proposed selection strategy shows F1 score that is equivalent to a random selection with twice the sample size. The requirement of only a small number of training data implies cheaper implementations and opens door to wider range of applications.

2019

pdf bib abs
Know-Center at SemEval-2019 Task 5: Multilingual Hate Speech Detection on Twitter using CNNs
Kevin Winter | Roman Kern
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper presents the Know-Center system submitted for task 5 of the SemEval-2019 workshop. Given a Twitter message in either English or Spanish, the task is to first detect whether it contains hateful speech and second, to determine the target and level of aggression used. For this purpose our system utilizes word embeddings and a neural network architecture, consisting of both dilated and traditional convolution layers. We achieved average F1-scores of 0.57 and 0.74 for English and Spanish respectively.

2017

pdf bib abs
Know-Center at SemEval-2017 Task 10: Sequence Classification with the CODE Annotator
Roman Kern | Stefan Falk | Andi Rexha
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes our participation in SemEval-2017 Task 10. We competed in Subtask 1 and 2 which consist respectively in identifying all the key phrases in scientific publications and label them with one of the three categories: Task, Process, and Material. These scientific publications are selected from Computer Science, Material Sciences, and Physics domains. We followed a supervised approach for both subtasks by using a sequential classifier (CRF - Conditional Random Fields). For generating our solution we used a web-based application implemented in the EU-funded research project, named CODE. Our system achieved an F1 score of 0.39 for the Subtask 1 and 0.28 for the Subtask 2.

Roman Kern

2025

2022

2019

2017

2016

2014

2013

2010

Co-authors

Venues