Huidong Du
2025
传统价值观成语当代语境表现分析———基于BCC语料库的计量研究
孙浩 孙浩 | 刘洋洋 刘洋洋 | Huidong Du | Pengyuan Liu | Dong Yu | Chen Kang
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
孙浩 孙浩 | 刘洋洋 刘洋洋 | Huidong Du | Pengyuan Liu | Dong Yu | Chen Kang
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
"中华优秀传统文化是提升我国新时代文化软实力的重要源泉,将传统价值观和成语相结合,有助于继承和弘扬我们的优秀文明。本文提出了传统价值观成语当代语境表现的研究框架,基于BCC语料库对传统价值观成语语料数量分布和成语传统价值观偏好分布特征、在当代语境中的情感倾向及高频词分布特点、社会话题及道德特征进行计量研究,并提出了传统价值观成语的当代社会话题及道德适应性指数,以系统研究传统价值观成语的当代语境表现。本文为传统文化的当代计量研究提供了新的视角,也为数字人文领域的相关研究提供了参考依据,旨在增强中华优秀传统文化在当今新时代的影响力,为中华文明的传承与创新作出贡献。"
2024
Generate-then-Revise: An Effective Synthetic Training Data Generation Framework For Event Detection Retrieval
Huidong Du | Hao Sun | Pengyuan Liu | Dong Yu
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
Huidong Du | Hao Sun | Pengyuan Liu | Dong Yu
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
“Large language models (LLMs) struggle with event detection (ED) due to the structured and vari-able number of events in the output. Existing supervised approaches rely on a large amount ofmanually annotated corpora, facing challenges in practice when event types are diverse and theannotated data is scarce. We propose Generate-then-Revise (GtR), a framework that leveragesLLMs in the opposite direction to address these challenges in ED. GtR utilizes an LLM to gen-erate high-quality training data in three stages, including a novel data revision step to minimizenoise in the synthetic data. The generated data is then used to train a smaller model for evalua-tion. Our approach demonstrates significant improvements on the low-resource ED. We furtheranalyze the generated data, highlighting the potential of synthetic data generation for enhancingED performance.Introduction”