Yingjie Han
Also published as: 英杰 韩
2025
CmEAA: Cross-modal Enhancement and Alignment Adapter for Radiology Report Generation
Xiyang Huang | Yingjie Han | Yaoxu Li | Runzhi Li | Pengcheng Wu | Kunli Zhang
Proceedings of the 31st International Conference on Computational Linguistics
Xiyang Huang | Yingjie Han | Yaoxu Li | Runzhi Li | Pengcheng Wu | Kunli Zhang
Proceedings of the 31st International Conference on Computational Linguistics
Automatic radiology report generation is pivotal in reducing the workload of radiologists, while simultaneously improving diagnostic accuracy and operational efficiency. Current methods face significant challenges, including the effective alignment of medical visual features with textual features and the mitigation of data bias. In this paper, we propose a method for radiology report generation that utilizes a Cross-modal Enhancement and Alignment Adapter (CmEAA) to connect a vision encoder with a frozen large language model. Specifically, we introduce two novel modules within CmEAA: Cross-modal Feature Enhancement (CFE) and Neural Mutual Information Aligner (NMIA). CFE extracts observation-related contextual features to enhance the visual features of lesions and abnormal regions in radiology images through a cross-modal enhancement transformer. NMIA maximizes neural mutual information between visual and textual representations within a low-dimensional alignment embedding space during training and provides potential global alignment visual representations during inference. Additionally, a weights generator is designed to enable the dynamic adaptation of cross-modal enhanced features and vanilla visual features. Experimental results on two prevailing datasets, namely, IU X-Ray and MIMIC-CXR, demonstrate that the proposed model outperforms previous state-of-the-art methods.
JOLT-SQL: Joint Loss Tuning of Text-to-SQL with Confusion-aware Noisy Schema Sampling
Jinwang Song | Hongying Zan | Kunli Zhang | Lingling Mu | Yingjie Han | Haobo Hua | Min Peng
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Jinwang Song | Hongying Zan | Kunli Zhang | Lingling Mu | Yingjie Han | Haobo Hua | Min Peng
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Text-to-SQL, which maps natural language to SQL queries, has benefited greatly from recent advances in Large Language Models (LLMs). While LLMs offer various paradigms for this task, including prompting and supervised fine-tuning (SFT), SFT approaches still face challenges such as complex multi-stage pipelines and poor robustness to noisy schema information. To address these limitations, we present JOLT-SQL, a streamlined single-stage SFT framework that jointly optimizes schema linking and SQL generation via a unified loss. JOLT-SQL employs discriminative schema linking, enhanced by local bidirectional attention, alongside a confusion-aware noisy schema sampling strategy with selective attention to improve robustness under noisy schema conditions. Experiments on the Spider and BIRD benchmarks demonstrate that JOLT-SQL achieves state-of-the-art execution accuracy among comparable-size open-source models, while significantly improving both training and inference efficiency.
CCL25-Eval任务1系统报告:基于数据、训练、推理三阶协同增强的空间语义理解
Zhongtian Hua | Yi Luo | Mengyuan Wang | Yumeijia Yumeijia | Yingjie Han
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
Zhongtian Hua | Yi Luo | Mengyuan Wang | Yumeijia Yumeijia | Yingjie Han
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
"SpaCE2025以空间语义理解为核心,聚焦于具有较高难度的空间语义理解任务,旨在评估大语言模型(LLM)在空间语言能力和空间推理能力两方面的表现。面对空间语义复杂、训练数据缺失和模型参数限制等挑战,本文提出了一个基于数据、训练、推理三阶协同增强的模型优化框架,针对空间语言能力和空间推理能力两个子任务分别设计了两套不同的优化方案。对于空间语言能力任务,我们利用DeepSeek-R1结合空间词表对训练集进行了扩充,对Qwen系列LLM进行了LoRA微调,在推理过程中使用了测试时增强来进一步优化结果;对于空间推理能力任务,我们将空间语言能力数据集也纳入训练集,对DeepSeek-R1-Distill-Qwen-7B模型进行微调,并对模型预测结果进行了累计投票集成。最终,我们的方法排名第六,总体准确率得分为58.54%。此外,本文还报告了一些尝试过但未能提升模型表现的其他方法。"
2023
Learnable Conjunction Enhanced Model for Chinese Sentiment Analysis
Bingfei Zhao | Hongying Zan | Jiajia Wang | Yingjie Han
Proceedings of the 22nd Chinese National Conference on Computational Linguistics
Bingfei Zhao | Hongying Zan | Jiajia Wang | Yingjie Han
Proceedings of the 22nd Chinese National Conference on Computational Linguistics
“Sentiment analysis is a crucial text classification task that aims to extract, process, and analyzeopinions, sentiments, and subjectivity within texts. In current research on Chinese text, sentenceand aspect-based sentiment analysis is mainly tackled through well-designed models. However,despite the importance of word order and function words as essential means of semantic ex-pression in Chinese, they are often underutilized. This paper presents a new Chinese sentimentanalysis method that utilizes a Learnable Conjunctions Enhanced Model (LCEM). The LCEMadjusts the general structure of the pre-trained language model and incorporates conjunctionslocation information into the model’s fine-tuning process. Additionally, we discuss a variantstructure of residual connections to construct a residual structure that can learn critical informa-tion in the text and optimize it during training. We perform experiments on the public datasetsand demonstrate that our approach enhances performance on both sentence and aspect-basedsentiment analysis datasets compared to the baseline pre-trained language models. These resultsconfirm the effectiveness of our proposed method. Introduction”
2022
期货领域知识图谱构建(Construction of Knowledge Graph in Futures Field)
Wenxin Li (李雯昕) | Hongying Zan (昝红英) | Tongfeng Guan (关同峰) | Yingjie Han (韩英杰)
Proceedings of the 21st Chinese National Conference on Computational Linguistics
Wenxin Li (李雯昕) | Hongying Zan (昝红英) | Tongfeng Guan (关同峰) | Yingjie Han (韩英杰)
Proceedings of the 21st Chinese National Conference on Computational Linguistics
“期货领域是数据最丰富的领域之一,本文以商品期货的研究报告为数据来源构建了期货领域知识图谱(Commodity Futures Knowledge Graph,CFKG)。以期货产品为核心,确立了概念分类体系及关系描述体系,形成图谱的概念层;在MHS-BIA与GPN模型的基础上,通过领域专家指导对242万字的研报文本进行标注与校对,形成了CFKG数据层,并设计了可视化查询系统。所构建的CFKG包含17,003个农产品期货关系三元组、13,703种非农产品期货关系三元组,为期货领域文本分析、舆情监控和推理决策等应用提供知识支持。”
2020
Chinese Grammatical Error Diagnosis Based on RoBERTa-BiLSTM-CRF Model
Yingjie Han | Yingjie Yan | Yangchao Han | Rui Chao | Hongying Zan
Proceedings of the 6th Workshop on Natural Language Processing Techniques for Educational Applications
Yingjie Han | Yingjie Yan | Yangchao Han | Rui Chao | Hongying Zan
Proceedings of the 6th Workshop on Natural Language Processing Techniques for Educational Applications
Chinese Grammatical Error Diagnosis (CGED) is a natural language processing task for the NLPTEA6 workshop. The goal of this task is to automatically diagnose grammatical errors in Chinese sentences written by L2 learners. This paper proposes a RoBERTa-BiLSTM-CRF model to detect grammatical errors in sentences. Firstly, RoBERTa model is used to obtain word vectors. Secondly, word vectors are input into BiLSTM layer to learn context features. Last, CRF layer without hand-craft features work for processing the output by BiLSTM. The optimal global sequences are obtained according to state transition matrix of CRF and adjacent labels of training data. In experiments, the result of RoBERTa-CRF model and ERNIE-BiLSTM-CRF model are compared, and the impacts of parameters of the models and the testing datasets are analyzed. In terms of evaluation results, our recall score of RoBERTa-BiLSTM-CRF ranks fourth at the detection level.
Chinese Grammatical Errors Diagnosis System Based on BERT at NLPTEA-2020 CGED Shared Task
Hongying Zan | Yangchao Han | Haotian Huang | Yingjie Yan | Yuke Wang | Yingjie Han
Proceedings of the 6th Workshop on Natural Language Processing Techniques for Educational Applications
Hongying Zan | Yangchao Han | Haotian Huang | Yingjie Yan | Yuke Wang | Yingjie Han
Proceedings of the 6th Workshop on Natural Language Processing Techniques for Educational Applications
In the process of learning Chinese, second language learners may have various grammatical errors due to the negative transfer of native language. This paper describes our submission to the NLPTEA 2020 shared task on CGED. We present a hybrid system that utilizes both detection and correction stages. The detection stage is a sequential labelling model based on BiLSTM-CRF and BERT contextual word representation. The correction stage is a hybrid model based on the n-gram and Seq2Seq. Without adding additional features and external data, the BERT contextual word representation can effectively improve the performance metrics of Chinese grammatical error detection and correction.
2016
Automatic Grammatical Error Detection for Chinese based on Conditional Random Field
Yajun Liu | Yingjie Han | Liyan Zhuo | Hongying Zan
Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA2016)
Yajun Liu | Yingjie Han | Liyan Zhuo | Hongying Zan
Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA2016)
In the process of learning and using Chinese, foreigners may have grammatical errors due to negative migration of their native languages. Currently, the computer-oriented automatic detection method of grammatical errors is not mature enough. Based on the evaluating task — CGED2016, we select and analyze the classification model and design feature extraction method to obtain grammatical errors including Mission(M), Disorder(W), Selection (S) and Redundant (R) automatically. The experiment results based on the dynamic corpus of HSK show that the Chinese grammatical error automatic detection method, which uses CRF as classification model and n-gram as feature extraction method. It is simple and efficient which play a positive effect on the research of Chinese grammatical error automatic detection and also a supporting and guiding role in the teaching of Chinese as a foreign language.