Damon Woodard

2025

PsyTEx: A Knowledge-Guided Approach to Refining Text for Psychological Analysis
Avanti Bhandarkar | Ronald Wilson | Anushka Swarup | Gregory Webster | Damon Woodard
Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities

LLMs are increasingly applied for tasks requiring deep interpretive abilities and psychological insights, such as identity profiling, mental health diagnostics, personalized content curation, and human resource management. However, their performance in these tasks remains inconsistent, as these characteristics are not explicitly perceptible in the text. To address this challenge, this paper introduces a novel protocol called the “Psychological Text Extraction and Refinement Framework (PsyTEx)” that leverages LLMs to isolate and amplify psychologically informative segments and evaluate LLM proficiency in interpreting complex psychological constructs from text. Using personality recognition as a case study, our extensive evaluation of five SOTA LLMs across two personality models (Big Five and Dark Triad) and two assessment levels (detection and prediction) highlights significant limitations in LLM’s ability to accurately interpret psychological traits. However, our findings show that LLMs, when used within the PsyTEx protocol, can effectively extract relevant information that closely aligns with psychological expectations, offering a structured approach to support future advancements in modeling, taxonomy construction, and text-based psychological evaluations.

pdf bib abs

AAIG at GenAI Detection Task 1: Exploring Syntactically-Aware, Resource-Efficient Small Autoregressive Decoders for AI Content Detection
Avanti Bhandarkar | Ronald Wilson | Damon Woodard
Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect)

This paper presents a lightweight and efficient approach to AI-generated content detection using small autoregressive fine-tuned decoders (AFDs) for secure, on-device deployment. Motivated by resource-efficiency, syntactic awareness, and bias mitigation, our model employs small language models (SLMs) with autoregressive pre-training and loss fusion to accurately distinguish between human and AI-generated content while significantly reducing computational demands. The system achieved highest macro-F1 score of 0.8186, with the submitted model scoring 0.7874—both significantly outperforming the task baseline while reducing model parameters by ~60%. Notably, our approach mitigates biases, improving recall for human-authored text by over 60%. Ranking 8th out of 36 participants, these results confirm the feasibility and competitiveness of small AFDs in challenging, adversarial settings, making them ideal for privacy-preserving, on-device deployment suitable for real-world applications.

pdf bib abs

LLM4RE: A Data-centric Feasibility Study for Relation Extraction
Anushka Swarup | Tianyu Pan | Ronald Wilson | Avanti Bhandarkar | Damon Woodard
Proceedings of the 31st International Conference on Computational Linguistics

Relation Extraction (RE) is a multi-task process that is a crucial part of all information extraction pipelines. With the introduction of the generative language models, Large Language Models (LLMs) have showcased significant performance boosts for complex natural language processing and understanding tasks. Recent research in RE has also started incorporating these advanced machines in their pipelines. However, the full extent of the LLM’s potential for extracting relations remains unknown. Consequently, this study aims to conduct the first feasibility analysis to explore the viability of LLMs for RE by investigating their robustness to various complex RE scenarios stemming from data-specific characteristics. By conducting an exhaustive analysis of five state-of-the-art LLMs backed by more than 2100 experiments, this study posits that LLMs are not robust enough to tackle complex data characteristics for RE, and additional research efforts focusing on investigating their behaviors at extracting relationships are needed. The source code for the evaluation pipeline can be found at https://aaig.ece.ufl.edu/projects/relation-extraction .

pdf bib abs

From Syntax to Semantics: Evaluating the Impact of Linguistic Structures on LLM-Based Information Extraction
Anushka Swarup | Avanti Bhandarkar | Ronald Wilson | Tianyu Pan | Damon Woodard
Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025)

Large Language Models (LLMs) have brought significant breakthroughs across all areas of Natural Language Processing (NLP), including Information Extraction (IE). However, knowledge gaps remain regarding their effectiveness in extracting entity-relation triplets, i.e. Joint Relation Extraction (JRE). JRE has been a key operation in creating knowledge bases that can be used to enhance Retrieval Augmented Generation (RAG) systems. Prior work highlights low-quality triplets generated by LLMs. Thus, this work investigates the impact of incorporating linguistic structures, such as constituency and dependency trees and semantic role labeling, to enhance the quality of the extracted triplets. The findings suggest that incorporating specific structural information enhances the uniqueness and topical relevance of the triplets, particularly in scenarios where multiple relationships are present.

2024

pdf bib abs

Emulating Author Style: A Feasibility Study of Prompt-enabled Text Stylization with Off-the-Shelf LLMs
Avanti Bhandarkar | Ronald Wilson | Anushka Swarup | Damon Woodard
Proceedings of the 1st Workshop on Personalization of Generative AI Systems (PERSONALIZE 2024)

User-centric personalization of text opens many avenues of applications from stylized email composition to machine translation. Existing approaches in this domain often encounter limitations in data and resource requirements. Drawing inspiration from the success of resource-efficient prompt-enabled stylization in related fields, this work conducts the first feasibility into testing 12 pre-trained SOTA LLMs for author style emulation. Although promising, the results suggest that current off-the-shelf LLMs fall short of achieving effective author style emulation. This work provides valuable insights through which off-the-shelf LLMs could be potentially utilized for user-centric personalization easily and at scale.

2018

pdf bib abs

What represents “style” in authorship attribution?
Kalaivani Sundararajan | Damon Woodard
Proceedings of the 27th International Conference on Computational Linguistics

Authorship attribution typically uses all information representing both content and style whereas attribution based only on stylistic aspects may be robust in cross-domain settings. This paper analyzes different linguistic aspects that may help represent style. Specifically, we study the role of syntax and lexical words (nouns, verbs, adjectives and adverbs) in representing style. We use a purely syntactic language model to study the significance of sentence structures in both single-domain and cross-domain attribution, i.e. cross-topic and cross-genre attribution. We show that syntax may be helpful for cross-genre attribution while cross-topic attribution and single-domain may benefit from additional lexical information. Further, pure syntactic models may not be effective by themselves and need to be used in combination with other robust models. To study the role of word choice, we perform attribution by masking all words or specific topic words corresponding to nouns, verbs, adjectives and adverbs. Using a single-domain dataset, IMDB1M reviews, we demonstrate the heavy influence of common nouns and proper nouns in attribution, thereby highlighting topic interference. Using cross-domain Guardian10 dataset, we show that some common nouns, verbs, adjectives and adverbs may help with stylometric attribution as demonstrated by masking topic words corresponding to these parts-of-speech. As expected, it was observed that proper nouns are heavily influenced by content and cross-domain attribution will benefit from completely masking them.

Co-authors

Gregory Webster 1

Venues

XLLM1

Fix author