Haejin Lee

Also published as: HaeJin Lee

2024

Name-based gender prediction has traditionally categorized individuals as either female or male based on their names, using a binary classification system. That binary approach can be problematic in the cases of gender-neutral names that do not align with any one gender, among other reasons. Relying solely on binary gender categories without recognizing gender-neutral names can reduce the inclusiveness of gender prediction tasks. We introduce an additional gender category, i.e., “neutral”, to study and address potential gender biases in Large Language Models (LLMs). We evaluate the performance of several foundational and large language models in predicting gender based on first names only. Additionally, we investigate the impact of adding birth years to enhance the accuracy of gender prediction, accounting for shifting associations between names and genders over time. Our findings indicate that most LLMs identify male and female names with high accuracy (over 80%) but struggle with gender-neutral names (under 40%), and the accuracy of gender prediction is higher for English-based first names than non-English names. The experimental results show that incorporating the birth year does not improve the overall accuracy of gender prediction, especially for names with evolving gender associations. We recommend using caution when applying LLMs for gender identification in downstream tasks, particularly when dealing with non-binary gender labels.

pdf bib abs

Detecting Impact Relevant Sections in Scientific Research
Maria Becker | Kanyao Han | Haejin Lee | Antonina Werthmann | Rezvaneh Rezapour | Jana Diesner | Andreas Witt
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Impact assessment is an evolving area of research that aims at measuring and predicting the potential effects of projects or programs. Measuring the impact of scientific research is a vibrant subdomain, closely intertwined with impact assessment. A recurring obstacle pertains to the absence of an efficient framework which can facilitate the analysis of lengthy reports and text labeling. To address this issue, we propose a framework for automatically assessing the impact of scientific research projects by identifying pertinent sections in project reports that indicate the potential impacts. We leverage a mixed-method approach, combining manual annotations with supervised machine learning, to extract these passages from project reports. We experiment with different machine learning algorithms, including traditional statistical models as well as pre-trained transformer language models. Our experiments show that our proposed method achieves accuracy scores up to 0.81, and that our method is generalizable to scientific research from different domains and different languages.

Co-authors

Venues

Fix author