Hanh Thi Hong Tran

2025

pdf bib abs
L3i++ at GenAI Detection Task 1: Can Label-Supervised LLaMA Detect Machine-Generated Text?
Hanh Thi Hong Tran | Nguyen Tien Nam
Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect)

The widespread use of large language models (LLMs) influences different social media and educational contexts through the overwhelming generated text with a certain degree of coherence. To mitigate their potential misuse, this paper explores the feasibility of finetuning LLaMA with label supervision (named LS-LLaMA) in unidirectional and bidirectional settings, to discriminate the texts generated by machines and humans in monolingual and multilingual corpora. Our findings show that unidirectional LS-LLaMA outperformed the sequence language models as the benchmark by a large margin. Our code is publicly available at https://github.com/honghanhh/llama-as-a-judge.

2024

pdf bib abs
DeBERTa Beats Behemoths: A Comparative Analysis of Fine-Tuning, Prompting, and PEFT Approaches on LegalLensNER
Hanh Thi Hong Tran | Nishan Chatterjee | Senja Pollak | Antoine Doucet
Proceedings of the Natural Legal Language Processing Workshop 2024

This paper summarizes the participation of our team (Flawless Lawgic) in the legal named entity recognition (L-NER) task at LegalLens 2024: Detecting Legal Violations. Given possible unstructured texts (e.g., online media texts), we aim to identify legal violations by extracting legal entities such as “violation”, “violation by”, “violation on”, and “law”. This system-description paper discusses our approaches to address the task, empirically highlighting the performances of fine-tuning models from the Transformers family (e.g., RoBERTa and DeBERTa) against open-sourced LLMs (e.g., Llama, Mistral) with different tuning settings (e.g., LoRA, Supervised Fine-Tuning (SFT) and prompting strategies). Our best results, with a weighted F1 of 0.705 on the test set, show a 30 percentage points increase in F1 compared to the baseline and rank 2 on the leaderboard, leaving a marginal gap of only 0.4 percentage points lower than the top solution. Our solutions are available at github.com/honghanhh/lner.

pdf bib abs
L3i++ at SemEval-2024 Task 8: Can Fine-tuned Large Language Model Detect Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text?
Hanh Thi Hong Tran | Tien Nam Nguyen | Antoine Doucet | Senja Pollak
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This paper summarizes our participation in SemEval-2024 Task 8: Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detection. In this task, we aim to solve two over three Subtasks: (1) Monolingual and Multilingual Binary Human-Written vs. Machine-Generated Text Classification; and (2) Multi-Way Machine-Generated Text Classification. We conducted a comprehensive comparative study across three methodological groups: Five metric-based models (Log-Likelihood, Rank, Log-Rank, Entropy, and MFDMetric), two fine-tuned sequence-labeling language models (RoBERTA and XLM-R); and a fine-tuned large-scale language model (LS-LLaMA). Our findings suggest that our LLM outperformed both traditional sequence-labeling LM benchmarks and metric-based approaches. Furthermore, our fine-tuned classifier excelled in detecting machine-generated multilingual texts and accurately classifying machine-generated texts within a specific category, (e.g., ChatGPT, bloomz, dolly). However, they do exhibit challenges in detecting them in other categories (e.g., cohere, and davinci). This is due to potential overlap in the distribution of the metric among various LLMs. Overall, we achieved a 6th rank in both Multilingual Binary Human-Written vs. Machine-Generated Text Classification and Multi-Way Machine-Generated Text Classification on the leaderboard.

Co-authors

Venues

Fix author