Haodi Zhang

Also published as: 昊迪


2025

pdf bib
Efficient Data Labeling by Hierarchical Crowdsourcing with Large Language Models
Haodi Zhang | Junyu Yang | Jinyin Nie | Peirou Liang | Kaishun Wu | Defu Lian | Rui Mao | Yuanfeng Song
Proceedings of the 31st International Conference on Computational Linguistics

Large language models (LLMs) have received lots of attention for their impressive performance in in-context dialogues and their potential to revolutionize service industries with a new business model, Model-as-a-Service (MaaS). Automated data labeling is a natural and promising service. However, labeling data with LLMs faces two main challenges: 1) the labels from LLMs may contain uncertainty, and 2) using LLMs for data labeling tasks can be prohibitively expensive, as the scales of datasets are usually tremendous. In this paper, we propose a hierarchical framework named LMCrowd that leverages multiple LLMs for efficient data labeling under budget constraints. The proposed LMCrowd framework first aggregates labels from multiple freely available LLMs, and then employs a large, paid MaaS LLM for relabeling selected instances. Furthermore, we formalize the core process as an optimization problem, aiming to select the optimal set of instances for relabeling by the MaaS LLM, given the current belief state. Extensive experimental evaluations across various real-world datasets demonstrate that our framework outperforms human labelers and GPT-4 in terms of both accuracy and efficiency.

2021

pdf bib
近十年来澳门的词汇增长(Macau’s Vocabulary Growth in the Recent Ten Year)
Shan Wang (王珊) | Zhao Chen (陈钊) | Haodi Zhang (张昊迪)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

词汇增长模型可以通过拟合词种(types)与词例(tokens)之间的数量关系,反映某一领域词汇的历时演化。澳门作为多语言多文化融合之地,词汇的使用情况能够反映社会的关注焦点,但目前尚无对澳门历时词汇演变的研究。本文首次构建澳门汉语历时语料库,利用三大词汇增长模型拟合语料库的词汇变化,并选取效果最好的 Heaps 模型进一步分析词汇演变与报刊内容的关系,结果反映出澳门词汇的变化趋势与热点新闻、澳门施政方针和民生密切相关。本研究还采用去除文本时序信息后的乱序文本,验证了方法的有效性。本文是首项基于大规模历时语料库考察澳门词汇演变的研究,对深入了解澳门语言生活的发展具有重要意义。