Yixiao Wang


2025

pdf bib
Automatic Scoring of an Open-Response Measure of Advanced Mind-Reading Using Large Language Models
Yixiao Wang | Russel Dsouza | Robert Lee | Ian Apperly | Rory Devine | Sanne van der Kleij | Mark Lee
Proceedings of the 10th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2025)

A rigorous psychometric approach is crucial for the accurate measurement of mind-reading abilities. Traditional scoring methods for such tests, which involve lengthy free-text responses, require considerable time and human effort. This study investigates the use of large language models (LLMs) to automate the scoring of psychometric tests. Data were collected from participants aged 13 to 30 years and scored by trained human coders to establish a benchmark. We evaluated multiple LLMs against human assessments, exploring various prompting strate- gies to optimize performance and fine-tuning the models using a subset of the collected data to enhance accuracy. Our results demonstrate that LLMs can assess advanced mind-reading abilities with over 90% accuracy on average. Notably, in most test items, the LLMs achieved higher Kappa agreement with the lead coder than two trained human coders, highlighting their potential to reliably score open-response psychometric tests.

pdf bib
MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation
Zichen Zhu | Hao Tang | Yansi Li | Dingye Liu | Hongshen Xu | Kunyao Lan | Danyang Zhang | Yixuan Jiang | Hao Zhou | Chenrun Wang | Situo Zhang | Liangtai Sun | Yixiao Wang | Yuheng Sun | Lu Chen | Kai Yu
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations)

Existing Multimodal Large Language Model (MLLM)-based agents face significant challenges in handling complex GUI (Graphical User Interface) interactions on devices. These challenges arise from the dynamic and structured nature of GUI environments, which integrate text, images, and spatial relationships, as well as the variability in action spaces across different pages and tasks. To address these limitations, we propose MobA, a novel MLLM-based mobile assistant system. MobA introduces an adaptive planning module that incorporates a reflection mechanism for error recovery and dynamically adjusts plans to align with the real environment contexts and action module’s execution capacity. Additionally, a multifaceted memory module provides comprehensive memory support to enhance adaptability and efficiency. We also present MobBench, a dataset designed for complex mobile interactions. Experimental results on MobBench and AndroidArena demonstrate MobA’s ability to handle dynamic GUI environments and perform complex mobile tasks.

2024

pdf bib
Investigating the Personality Consistency in Quantized Role-Playing Dialogue Agents
Yixiao Wang | Homa Fashandi | Kevin Ferreira
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track

This study explores the consistency of personality traits in quantized large language models (LLMs) for edge device role-playing scenarios. Using the Big Five personality traits model, we evaluate how stable assigned personalities are for Quantized Role-Playing Dialog Agents (QRPDA) during multi-turn interactions. We evaluate multiple LLMs with various quantization levels, combining binary indexing of personality traits, explicit self-assessments, and linguistic analysis of narratives. To address personality inconsistency, we propose a non-parametric method called Think2. Our multi-faceted evaluation framework demonstrates Think2’s effectiveness in maintaining consistent personality traits for QRPDA. Moreover, we offer insights to help select the optimal model for QRPDA, improving its stability and reliability in real-world applications.

pdf bib
Personal Large Language Model Agents: A Case Study on Tailored Travel Planning
Harmanpreet Singh | Nikhil Verma | Yixiao Wang | Manasa Bharadwaj | Homa Fashandi | Kevin Ferreira | Chul Lee
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track

Large Language Models (LLMs) have made significant progress, becoming more autonomous and capable of handling real-world tasks through their access to tools, various planning strategies, and memory, referred to as LLM agents. One emerging area of focus is customizing these models to cater to individual user preferences, thereby shaping them into personal LLM agents. This work investigates how the user model, which encapsulates user-related information, preferences, and personal concepts, influences an LLM agent’s planning and reasoning capabilities. We introduce a personalized version of TravelPlanner, called TravelPlanner+, and establish baselines for personal LLM agents. Our evaluation strategy contains an LLM-as-a-Judge component, which provides further in-depth insights into the decision-making process of a personal LLM agent by comparing generic and personal plans. Our findings reveal that while generic plans perform robustly, personal plans show marked improvement in relevance and suitability, with preference rates up to 74.4% on validation and 87.3% on the test set. These results highlight the potential of personal LLM agents to significantly enhance user satisfaction.

2022

pdf bib
Sentence Selection Strategies for Distilling Word Embeddings from BERT
Yixiao Wang | Zied Bouraoui | Luis Espinosa Anke | Steven Schockaert
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Many applications crucially rely on the availability of high-quality word vectors. To learn such representations, several strategies based on language models have been proposed in recent years. While effective, these methods typically rely on a large number of contextualised vectors for each word, which makes them impractical. In this paper, we investigate whether similar results can be obtained when only a few contextualised representations of each word can be used. To this end, we analyse a range of strategies for selecting the most informative sentences. Our results show that with a careful selection strategy, high-quality word vectors can be learned from as few as 5 to 10 sentences.

2021

pdf bib
Deriving Word Vectors from Contextualized Language Models using Topic-Aware Mention Selection
Yixiao Wang | Zied Bouraoui | Luis Espinosa Anke | Steven Schockaert
Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)

One of the long-standing challenges in lexical semantics consists in learning representations of words which reflect their semantic properties. The remarkable success of word embeddings for this purpose suggests that high-quality representations can be obtained by summarizing the sentence contexts of word mentions. In this paper, we propose a method for learning word representations that follows this basic strategy, but differs from standard word embeddings in two important ways. First, we take advantage of contextualized language models (CLMs) rather than bags of word vectors to encode contexts. Second, rather than learning a word vector directly, we use a topic model to partition the contexts in which words appear, and then learn different topic-specific vectors for each word. Finally, we use a task-specific supervision signal to make a soft selection of the resulting vectors. We show that this simple strategy leads to high-quality word vectors, which are more predictive of semantic properties than word embeddings and existing CLM-based strategies.