Erkki Mervaala
2024
Order Up! Micromanaging Inconsistencies in ChatGPT-4o Text Analyses
Erkki Mervaala
|
Ilona Kousa
Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities
Large language model (LLM) applications have taken the world by storm in the past two years, and the academic sphere has not been an exception. One common, cumbersome task for researchers to attempt to automatise has been text annotation and, to an extent, analysis. Popular LLMs such as ChatGPT have been examined as a research assistant and as an analysis tool, and several discrepancies regarding both transparency and the generative content have been uncovered. Our research approaches the usability and trustworthiness of ChatGPT for text analysis from the point of view of an “out-of-the-box” zero-shot or few-shot setting, focusing on how the context window and mixed text types affect the analyses generated. Results from our testing indicate that both the types of the texts and the ordering of different kinds of texts do affect the ChatGPT analysis, but also that the context-building is less likely to cause analysis deterioration when analysing similar texts. Though some of these issues are at the core of how LLMs function, many of these caveats can be addressed by transparent research planning.
2023
Efficient and reliable utilization of automated data collection applied to news on climate change
Erkki Mervaala
|
Jari Lyytimäki
Proceedings of the Joint 3rd International Conference on Natural Language Processing for Digital Humanities and 8th International Workshop on Computational Linguistics for Uralic Languages
Automated data collection provides tempting opportunities for social sciences and humanities studies. Abundant data accumulating in various digital archives allows more comprehensive, timely and cost-efficient ways of harvesting and processing information. While easing or even removing some of the key problems, such as laborious and time-consuming data collection and potential errors and biases related to subjective coding of materials and distortions caused by focus on small samples, automated methods also bring in new risks such as poor understanding of contexts of the data or non-recognition of underlying systematic errors or missing information. Results from testing different methods to collect data describing newspaper coverage of climate change in Finland emphasize that fully relying on automatable tools such as media scrapers has its limitations and can provide comprehensive but incomplete document acquisition for research. Many of these limitations can, however, be addressed and not all of them rely on manual control.
Search