Gizem Gezici


2024

pdf bib
An Overview of Recent Approaches to Enable Diversity in Large Language Models through Aligning with Human Perspectives
Benedetta Muscato | Chandana Sree Mala | Marta Marchiori Manerba | Gizem Gezici | Fosca Giannotti
Proceedings of the 3rd Workshop on Perspectivist Approaches to NLP (NLPerspectives) @ LREC-COLING 2024

The varied backgrounds and experiences of human annotators inject different opinions and potential biases into the data, inevitably leading to disagreements. Yet, traditional aggregation methods fail to capture individual judgments since they rely on the notion of a single ground truth. Our aim is to review prior contributions to pinpoint the shortcomings that might cause stereotypical content generation. As a preliminary study, our purpose is to investigate state-of-the-art approaches, primarily focusing on the following two research directions. First, we investigate how adding subjectivity aspects to LLMs might guarantee diversity. We then look into the alignment between humans and LLMs and discuss how to measure it. Considering existing gaps, our review explores possible methods to mitigate the perpetuation of biases targeting specific communities. However, we recognize the potential risk of disseminating sensitive information due to the utilization of socio-demographic data in the training process. These considerations underscore the inclusion of diverse perspectives while taking into account the critical importance of implementing robust safeguards to protect individuals’ privacy and prevent the inadvertent propagation of sensitive information.

2020

pdf bib
#Turki$hTweets: A Benchmark Dataset for Turkish Text Correction
Asiye Tuba Koksal | Ozge Bozal | Emre Yürekli | Gizem Gezici
Findings of the Association for Computational Linguistics: EMNLP 2020

#Turki$hTweets is a benchmark dataset for the task of correcting the user misspellings, with the purpose of introducing the first public Turkish dataset in this area. #Turki$hTweets provides correct/incorrect word annotations with a detailed misspelling category formulation based on the real user data. We evaluated four state-of-the-art approaches on our dataset to present a preliminary analysis for the sake of reproducibility.

2013

pdf bib
SU-Sentilab : A Classification System for Sentiment Analysis in Twitter
Gizem Gezici | Rahim Dehkharghani | Berrin Yanikoglu | Dilek Tapucu | Yucel Saygin
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)