Yan Zhuang

2025

Accurately assessing internal human states is key to understanding preferences, offering personalized services, and identifying challenges in real-world applications. Originating from psychometrics, adaptive testing has become the mainstream method for human measurement and has now been widely applied in education, healthcare, sports, and sociology. It customizes assessments by selecting the fewest test questions . However, current adaptive testing methods face several challenges. The mechanized nature of most algorithms leads to guessing behavior and difficulties with open-ended questions. Additionally, subjective assessments suffer from noisy response data and coarse-grained test outputs, further limiting their effectiveness. To move closer to an ideal adaptive testing process, we propose TestAgent, a large language model (LLM)-powered agent designed to enhance adaptive testing through interactive engagement. This is the first application of LLMs in adaptive testing. TestAgent supports personalized question selection, captures test-takers’ responses and anomalies, and provides precise outcomes through dynamic, conversational interactions. Experiments on psychological, educational, and lifestyle assessments show our approach achieves more accurate results with 20% fewer questions than state-of-the-art baselines, and testers preferred it in speed, smoothness, and other dimensions.

pdf bib abs

"The International Classification of Diseases (ICD) provides a standardized framework for encoding diagnoses, serving critical roles in clinical scenarios. Automatic ICD coding aims to assign formalized diagnostic codes to medical records for documentation and analysis, which is challenged by an extremely large and imbalanced label space, noisy and heterogeneous clinical text,and the need for interpretability. In this paper, we propose a structured multi-class classification framework that partitions diseases into clinically coherent groups, enabling group-specific dataaugmentation and supervision. Our method combines input compression with generative and discriminative fine-tuning strategies tailored to primary and secondary diagnoses, respectively.On the CCL2025-Eval Task 8 benchmark for Chinese electronic medical records, our approach ranked first in the final evaluation."

2024

pdf bib abs

As intelligent education evolves, it will provide students with multiple personalized learning services based on their individual abilities. Computerized adaptive testing (CAT) is designed to accurately measure a student’s ability using the least questions, providing an efficient and personalized testing method. However, existing methods mainly focus on minimizing the number of questions required to assess ability, often lacking clear and reliable explanations for the question selection process. Educators and students can hardly trust and accept CAT systems without an understanding of the rationale behind the question selection process. To address this issue, we introduce LLM-Agent-Based CAT (LACAT), a novel agent powered by large language models to enhance CAT with human-like interpretability and explanation capabilities. LACAT consists of three key modules: the Summarizer, which generates interpretable student profiles; the Reasoner, which personalizes questions and provides human-readable explanations; and the Critic, which learns from past choices to optimize future question selection. We conducted extensive experiments on three real-world educational datasets. The results demonstrate that LACAT can perform comparably or superior to traditional CAT methods in accuracy and significantly improve the transparency and acceptability of the testing process. Human evaluations further confirm that LACAT can generate high-quality, understandable explanations, thereby enhancing student trust and satisfaction.

2022

pdf bib abs

Yet@SMM4H’22: Improved BERT-based classification models with Rdrop and PolyLoss
Yan Zhuang | Yanru Zhang
Proceedings of the Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task

This paper describes our approach for 11 classification tasks (Task1a, Task2a, Task2b, Task3a, Task3b, Task4, Task5, Task6, Task7, Task8 and Task9) from Social Media Mining for Health (SMM4H) 2022 Shared Tasks. We developed a classification model that incorporated Rdrop to augment data and avoid overfitting, Poly Loss and Focal Loss to alleviate sample imbalance, and pseudo labels to improve model performance. The results of our submissions are over or equal to the median scores in almost all tasks. In addition, our model achieved the highest score in Task4, with a higher 7.8% and 5.3% F1-score than the median scores in Task2b and Task3a respectively.

pdf bib abs

Yet at the FinNLP-2022 ERAI Task: Modified models for evaluating the Rationales of Amateur Investors
Yan Zhuang | Fuji Ren
Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)

The financial reports usually reveal the recent development of the company and often cause the volatility in the company’s share price. The opinions causing higher maximal potential profit and lower maximal loss can help the amateur investors choose rational strategies. FinNLP-2022 ERAI task aims to quantify the opinions’ potentials of leading higher maximal potential profit and lower maximal loss. In this paper, different strategies were applied to solve the ERAI tasks. Valinna ‘RoBERTa-wwm’ showed excellent performance and helped us rank second in ‘MPP’ label prediction task. After integrating some tricks, the modified ‘RoBERTa-wwm’ outperformed all other models in ‘ML’ ranking task.

2020

pdf bib abs

Will_go at SemEval-2020 Task 9: An Accurate Approach for Sentiment Analysis on Hindi-English Tweets Based on Bert and Pesudo Label Strategy
Wei Bao | Weilong Chen | Wei Bai | Yan Zhuang | Mingyuan Cheng | Xiangyu Ma
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Mixing languages are widely used in social media, especially in multilingual societies like India. Detecting the emotions contained in these languages, which is of great significance to the development of society and political trends. In this paper, we propose an ensemble of pesudo-label based Bert model and TFIDF based SGDClassifier model to identify the sentiments of Hindi-English (Hi-En) code-mixed data. The ensemble model combines the strengths of rich semantic information from the Bert model and word frequency information from the probabilistic ngram model to predict the sentiment of a given code-mixed tweet. Finally our team got an average F1 score of 0.731 on the final leaderboard,and our codalab username is will_go.

pdf bib abs

The main purpose of this article is to state the effect of using different methods and models for counterfactual determination and detection of causal knowledge. Nowadays, counterfactual reasoning has been widely used in various fields. In the realm of natural language process(NLP), counterfactual reasoning has huge potential to improve the correctness of a sentence. In the shared Task 5 of detecting counterfactual in SemEval 2020, we pre-process the officially given dataset according to case conversion, extract stem and abbreviation replacement. We use last-5 bidirectional encoder representation from bidirectional encoder representation from transformer (BERT)and term frequency–inverse document frequency (TF-IDF) vectorizer for counterfactual detection. Meanwhile, multi-sample dropout and cross validation are used to improve versatility and prevent problems such as poor generosity caused by overfitting. Finally, our team Ferryman ranked the 8th place in the sub-task 1 of this competition.