He Zhou

Also published as: 贺周

2026

Large Language Models Put to the Test on Chinese Noun Compounds: Experiments on Natural Language Inference and Compound Semantics
Le Qiu | Emmanuele Chersoni | He Zhou | Yu-Yin Hsu
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)

Noun compounds are generally considered an open challenge for NLP systems, given to the difficulty of interpreting the implicit semantic relation between modifier and head, although the advent of Large Language Models (LLMs) recently led to remarkable performance leaps. However, most evaluations have been carried out on English benchmarks.In our work, we test LLMs on compound semantics understanding in Chinese, adopting two different evaluation scenarios: an extrinsic evaluation in a Natural Language Inference task, and an intrinsic evaluation in which models are directly asked to predict the semantic relation linking the two constituents.Our results show that the bigger and more recent LLMs are able to surpass supervised baselines in the inference task, especially when tested under the few-shot setting. In the more challenging task of selecting the correct interpretation of the compounds out of a fine-grained typology of semantic relations between head and modifier, the best Chinese LLM (Qwen-plus) manages to select the correct option in about one third of the cases.

2025

pdf bib abs

Branching Out: Exploration of Chinese Dependency Parsing with Fine-tuned Large Language Models
He Zhou | Emmanuele Chersoni | Yu-Yin Hsu
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

In this paper, we investigate the effectiveness of large language models (LLMs) for Chinese dependency parsing through fine-tuning. We explore how different dependency representations impact parsing performance when fine-tuning the Chinese Llama-3 model. Our results demonstrate that while the Stanford typed dependency tuple representation yields the highest number of valid dependency trees, converting dependency structure into a lexical centered tree produces parses of significantly higher quality despite generating fewer valid structures. The results further show that fine-tuning enhances LLMs’ capability to handle longer dependencies to some extent, though challenges remain. Additionally, we evaluate the effectiveness of DeepSeek in correcting LLM-generated dependency structures, finding that it is effective for fixing index errors and cyclicity issues but still suffers from tokenization mismatches. Our analysis across dependency distances and relations reveals that fine-tuned LLMs outperform traditional parsers in specific syntactic structures while struggling with others. These findings contribute to the research on leveraging LLMs for syntactic analysis tasks.

2024

pdf bib abs

基于通用依存句法的锡伯语句法树库构建研究(A Dependency Treebank for Xibe based on Universal Dependencies)
He Zhou (周贺)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“我国是一个多民族、多语种的国家,拥有丰富的民族语言资源。然而,使用人口较少、文化影响力较小的语言普遍面临语言濒危的问题,记录和保存这些语言在语言学、民族学与人类学上都具有重要意义。在本研究中,我们以我国仍在活跃使用的满通古斯语——锡伯语为目标语言,从锡伯语语法书、锡伯语报纸《察布查尔报》以及锡伯语《语文》教材中收集了 1200个句子,以此为语料构建了一个包含词汇、形态以及依存句法信息的树库。本文详细描述了树库的构建过程,深入讨论了标注过程中遇到的难以解决的语言现象,并提出了我们的标注策略。通过标注,我们发现,随着汉语和锡伯语的深层接触,锡伯语不仅在词汇上接受了大量的汉语借词,锡伯语句子结构也受到一定程度的影响。基于所标注的锡伯语树库,我们进行了锡伯语自动句法分析实验,探讨了词、词性、字符特征以及中国少数民族语言预训练模型 CINO对句法分析性能产生的影响。”

2021

pdf bib

pdf bib abs

Delexicalized Cross-lingual Dependency Parsing for Xibe
He Zhou | Sandra Kübler
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Manually annotating a treebank is time-consuming and labor-intensive. We conduct delexicalized cross-lingual dependency parsing experiments, where we train the parser on one language and test on our target language. As our test case, we use Xibe, a severely under-resourced Tungusic language. We assume that choosing a closely related language as the source language will provide better results than more distant relatives. However, it is not clear how to determine those closely related languages. We investigate three different methods: choosing the typologically closest language, using LangRank, and choosing the most similar language based on perplexity. We train parsing models on the selected languages using UDify and test on different genres of Xibe data. The results show that languages selected based on typology and perplexity scores outperform those predicted by LangRank; Japanese is the optimal source language. In determining the source language, proximity to the target language is more important than large training sizes. Parsing is also influenced by genre differences, but they have little influence as long as the training data is at least as complex as the target.

2020

The advent of natural language understanding (NLU) benchmarks for English, such as GLUE and SuperGLUE allows new NLU models to be evaluated across a diverse set of tasks. These comprehensive benchmarks have facilitated a broad range of research and applications in natural language processing (NLP). The problem, however, is that most such benchmarks are limited to English, which has made it difficult to replicate many of the successes in English NLU for other languages. To help remedy this issue, we introduce the first large-scale Chinese Language Understanding Evaluation (CLUE) benchmark. CLUE is an open-ended, community-driven project that brings together 9 tasks spanning several well-established single-sentence/sentence-pair classification tasks, as well as machine reading comprehension, all on original Chinese text. To establish results on these tasks, we report scores using an exhaustive set of current state-of-the-art pre-trained Chinese models (9 in total). We also introduce a number of supplementary datasets and additional tools to help facilitate further progress on Chinese NLU. Our benchmark is released at https://www.cluebenchmarks.com

pdf bib

pdf bib abs

Universal Dependency Treebank for Xibe
He Zhou | Juyeon Chung | Sandra Kübler | Francis Tyers
Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020)

We present our work of constructing the first treebank for the Xibe language following the Universal Dependencies (UD) annotation scheme. Xibe is a low-resourced and severely endangered Tungusic language spoken by the Xibe minority living in the Xinjiang Uygur Autonomous Region of China. We collected 810 sentences so far, including 544 sentences from a grammar book on written Xibe and 266 sentences from Cabcal News. We annotated those sentences manually from scratch. In this paper, we report the procedure of building this treebank and analyze several important annotation issues of our treebank. Finally, we propose our plans for future work.

2019

pdf bib abs

This paper describes the IUCL system at VarDial 2019 evaluation campaign for the task of discriminating between Mainland and Taiwan variation of mandarin Chinese. We first build several base classifiers, including a Naive Bayes classifier with word n-gram as features, SVMs with both character and syntactic features, and neural networks with pre-trained character/word embeddings. Then we adopt ensemble methods to combine output from base classifiers to make final predictions. Our ensemble models achieve the highest F1 score (0.893) in simplified Chinese track and the second highest (0.901) in traditional Chinese track. Our results demonstrate the effectiveness and robustness of the ensemble methods.