HLU: Human Vs LLM Generated Text Detection Dataset for Urdu at Multiple Granularities

Iqra Ali, Jesse Atuhurra, Hidetaka Kamigaito, Taro Watanabe


Abstract
The rise of large language models (LLMs) generating human-like text has raised concerns about misuse, especially in low-resource languages like Urdu. To address this gap, we introduce the HLU dataset, which consists of three datasets: Document, Paragraph, and Sentence level. The document-level dataset contains 1,014 instances of human-written and LLM-generated articles across 13 domains, while the paragraph and sentence-level datasets each contain 667 instances. We conducted both human and automatic evaluations. In the human evaluation, the average accuracy at the document level was 35%, while at the paragraph and sentence levels, accuracies were 75.68% and 88.45%, respectively. For automatic evaluation, we finetuned the XLMRoBERTa model for both monolingual and multilingual settings achieving consistent results in both. Additionally, we assessed the performance of GPT4 and Claude3Opus using zero-shot prompting. Our experiments and evaluations indicate that distinguishing between human and machine-generated text is challenging for both humans and LLMs, marking a significant step in addressing this issue in Urdu.
Anthology ID:
2025.coling-main.235
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3495–3510
Language:
URL:
https://aclanthology.org/2025.coling-main.235/
DOI:
Bibkey:
Cite (ACL):
Iqra Ali, Jesse Atuhurra, Hidetaka Kamigaito, and Taro Watanabe. 2025. HLU: Human Vs LLM Generated Text Detection Dataset for Urdu at Multiple Granularities. In Proceedings of the 31st International Conference on Computational Linguistics, pages 3495–3510, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
HLU: Human Vs LLM Generated Text Detection Dataset for Urdu at Multiple Granularities (Ali et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.235.pdf