Zhao Tiejun

Also published as: 铁军


2024

pdf bib
基于思维链的跨语言多文档摘要生成技术研究(Cross-lingual Multi-document Summarization Based on Chain-of-Thought)
Qi Tian (祁天) | Yang Jianan (杨建安) | Zhao Tiejun (赵铁军) | Yang Muyun (杨沐昀)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“随着全球化的加速发展,跨语言信息的高效传递与理解变得尤为重要。传统的多文档摘要生成技术可以提升信息获取效率,然而往往忽视了跨语言场景下的特殊挑战。为了缓解这一问题,本文提出了跨语言多文档摘要生成任务。我们首先构建了一个全面的跨语言多文档摘要测试集作为评估基准,其次提出了一种基于思维链技术的跨语言多文档摘要生成方法,并对其进行了实验验证。在实验中,我们使用了几种典型的大语言模型,并通过人工评估和自动评估来验证我们的方法。结果表明,我们提出的基于思维链的方法在跨语言多文档摘要生成任务上取得了显著的性能提升,为解决语言障碍下的信息获取问题提供了有效的解决方案。”

pdf bib
Mitigating the Bias of Large Language Model Evaluation
Zhou Hongli | Huang Hui | Long Yunfei | Xu Bing | Zhu Conghui | Cao Hailong | Yang Muyun | Zhao Tiejun
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“Recently, there has been a trend of evaluating the Large Language Model (LLM) quality in theflavor of LLM-as-a-Judge, namely leveraging another LLM to evaluate the current output qual-ity. However, existing judges are proven to be biased, namely they would favor answers whichpresent better superficial quality (such as verbosity, fluency) while ignoring the instruction fol-lowing ability. In this work, we propose systematic research about the bias of LLM-as-a-Judge.Specifically, for closed-source judge models, we apply calibration to mitigate the significance ofsuperficial quality, both on probability level and prompt level. For open-source judge models, wepropose to mitigate the bias by contrastive training, with curated negative samples that deviatefrom instruction but present better superficial quality. We apply our methods on the bias evalu-ation benchmark, and experiment results show our methods mitigate the bias by a large marginwhile maintaining a satisfactory evaluation accuracy.”

2004

pdf bib
Auto word alignment based Chinese-English EBMT
Yang Muyun | Zhao Tiejun | Liu Haijie | Shi Xiasheng | Jiang Hongfei
Proceedings of the First International Workshop on Spoken Language Translation: Evaluation Campaign