Judith Jeyafreeda Andrew


2024

pdf bib
JudithJeyafreeda_StressIdent_LT-EDI@EACL2024: GPT for stress identification
Judith Jeyafreeda Andrew
Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion

Stress detection from social media texts has proved to play an important role in mental health assessments. People tend to express their stress on social media more easily. Analysing and classifying these texts allows for improvements in development of recommender systems and automated mental health assessments. In this paper, a GPT model is used for classification of social media texts into two classes - stressed and not-stressed. The texts used for classification are in two Dravidian languages - Tamil and Telugu. The results, although not very good shows a promising direction of research to use GPT models for classification.

pdf bib
Evaluating LLMs for Temporal Entity Extraction from Pediatric Clinical Text in Rare Diseases Context
Judith Jeyafreeda Andrew | Marc Vincent | Anita Burgun | Nicolas Garcelon
Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC-COLING 2024

The aim of this work is to extract Temporal Entities from patients’ EHR from pediatric hospital specialising in Rare Diseases, thus allowing to create a patient timeline relative to diagnosis . We aim to perform an evaluation of NLP tools and Large Language Models (LLM) to test their application in the field of clinical study where data is limited and sensitive. We present a short annotation guideline for temporal entity identification. We then use the tool EDS-NLP, the Language Model CamemBERT-with-Dates and the LLM Vicuna to extract temporal entities. We perform experiments using three different prompting techniques on the LLM Vicuna to evaluate the model thoroughly. We use a small dataset of 50 EHR describing the evolution of rare diseases in patients to perform our experiments. We show that among the different methods to prompt a LLM, using a decomposed structure of prompting method on the LLM vicuna produces the best results for temporal entity recognition. The LLM learns from examples in the prompt and decomposing one prompt to several prompts allows the model to avoid confusions between the different entity types. Identifying the temporal entities in EHRs helps to build the timeline of a patient and to learn the evolution of a diseases. This is specifically important in the case of rare diseases due to the availability of limited examples. In this paper, we show that this can be made possible with the use of Language Models and LLM in a secure environment, thus preserving the privacy of the patient

pdf bib
Team NLPeers at Chemotimelines 2024: Evaluation of two timeline extraction methods, can generative LLM do it all or is smaller model fine-tuning still relevant ?
Nesrine Bannour | Judith Jeyafreeda Andrew | Marc Vincent
Proceedings of the 6th Clinical Natural Language Processing Workshop

This paper presents our two deep learning-based approaches to participate in subtask 1 of the Chemotimelines 2024 Shared task. The first uses a fine-tuning strategy on a relatively small general domain Masked Language Model (MLM) model, with additional normalization steps obtained using a simple Large Language Model (LLM) prompting technique. The second is an LLM-based approach combining advanced automated prompt search with few-shot in-context learning using the DSPy framework.Our results confirm the continued relevance of the smaller MLM fine-tuned model. It also suggests that the automated few-shot LLM approach can perform close to the fine-tuning-based method without extra LLM normalization and be advantageous under scarce data access conditions. We finally hint at the possibility to choose between lower training examples or lower computing resources requirements when considering both methods.

2023

pdf bib
JudithJeyafreeda at SemEval-2023 Task 10: Machine Learning for Explainable Detection of Online Sexism
Judith Jeyafreeda Andrew
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

The rise of the internet and social media platforms has brought about significant changes in how people interact with each another. For a lot of people, the internet have also become the only source of news and information about the world. Thus due to the increase in accessibility of information, online sexism has also increased. Efforts should be made to make the internet a safe space for everyone, irrespective of gender, both from a larger social norms perspective and legal or technical regulations to help alleviate online gender-based violence. As a part of this, this paper explores simple methods that can be easily deployed to automatically detect online sexism in textual statements.

pdf bib
JudithJeyafreeda@LT-EDI-2023: Using GPT model for recognition of Homophobia/Transphobia detection from social media
Judith Jeyafreeda Andrew
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion

Homophobia and Transphobia is defined as hatred or discomfort towards Gay, Lesbian, Transgender or Bisexual people. With the increase in social media, communication has become free and easy. This also means that people can also express hatred and discomfort towards others. Studies have shown that these can cause mental health issues. Thus detection and masking/removal of these comments from the social media platforms can help with understanding and improving the mental health of LGBTQ+ people. In this paper, GPT2 is used to detect homophobic and/or transphobic comments in social media comments. The comments used in this paper are from five (English, Spanish, Tamil, Malayalam and Hindi) languages. The results show that detecting comments in English language is easier when compared to the other languages.

2022

pdf bib
JudithJeyafreedaAndrew@TamilNLP-ACL2022:CNN for Emotion Analysis in Tamil
Judith Jeyafreeda Andrew
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages

Using technology for analysis of human emotion is a relatively nascent research area. There are several types of data where emotion recognition can be employed, such as - text, images, audio and video. In this paper, the focus is on emotion recognition in text data. Emotion recognition in text can be performed from both written comments and from conversations. In this paper, the dataset used for emotion recognition is a list of comments. While extensive research is being performed in this area, the language of the text plays a very important role. In this work, the focus is on the Dravidian language of Tamil. The language and its script demands an extensive pre-processing. The paper contributes to this by adapting various pre-processing methods to the Dravidian Language of Tamil. A CNN method has been adopted for the task at hand. The proposed method has achieved a comparable result.

2021

pdf bib
JudithJeyafreedaAndrew@DravidianLangTech-EACL2021:Offensive language detection for Dravidian Code-mixed YouTube comments
Judith Jeyafreeda Andrew
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

Title: JudithJeyafreedaAndrew@DravidianLangTech-EACL2021:Offensive language detection for Dravidian Code-mixed YouTube comments Author: Judith Jeyafreeda Andrew Messaging online has become one of the major ways of communication. At this level, there are cases of online/digital bullying. These include rants, taunts, and offensive phrases. Thus the identification of offensive language on the internet is a very essential task. In this paper, the task of offensive language detection on YouTube comments from the Dravidian lan- guages of Tamil, Malayalam and Kannada are seen upon as a mutliclass classification prob- lem. After being subjected to language spe- cific pre-processing, several Machine Learn- ing algorithms have been trained for the task at hand. The paper presents the accuracy results on the development datasets for all Machine Learning models that have been used and fi- nally presents the weighted average scores for the test set when using the best performing Ma- chine Learning model.

2018

pdf bib
Automatic Extraction of Entities and Relation from Legal Documents
Judith Jeyafreeda Andrew
Proceedings of the Seventh Named Entities Workshop

In recent years, the journalists and computer sciences speak to each other to identify useful technologies which would help them in extracting useful information. This is called “computational Journalism”. In this paper, we present a method that will enable the journalists to automatically identifies and annotates entities such as names of people, organizations, role and functions of people in legal documents; the relationship between these entities are also explored. The system uses a combination of both statistical and rule based technique. The statistical method used is Conditional Random Fields and for the rule based technique, document and language specific regular expressions are used.