Yunfeng Wang
2026
PersonaTrace: Synthesizing Realistic Digital Footprints with LLM Agents
Minjia Wang | Yunfeng Wang | Xiao Ma | Dexin Lv | Qifan Guo | Lynn Zheng | Benliang Wang | Lei Wang | Jiannan Li | Yongwei Xing | Junzhe Xu | Zheng Sun
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track)
Minjia Wang | Yunfeng Wang | Xiao Ma | Dexin Lv | Qifan Guo | Lynn Zheng | Benliang Wang | Lei Wang | Jiannan Li | Yongwei Xing | Junzhe Xu | Zheng Sun
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track)
Digital footprints—records of individuals’ interactions with digital systems—are essential for studying behavior, developing personalized applications, and training machine learning models. However, research in this area is often hindered by the scarcity of diverse and accessible data. To address this limitation, we propose a novel method for synthesizing realistic digital footprints using large language model (LLM) agents. Starting from a structured user profile, our approach generates diverse and plausible sequences of user events, ultimately producing corresponding digital artifacts such as emails, messages, calendar entries, reminders, etc. Intrinsic evaluation results demonstrate that the generated dataset is more diverse and realistic than existing baselines. Moreover, models fine-tuned on our synthetic data outperform those trained on other synthetic datasets when evaluated on real-world out-of-distribution tasks.
2025
MuKA: Multimodal Knowledge Augmented Visual Information-Seeking
Lianghao Deng | Yuchong Sun | Shizhe Chen | Ning Yang | Yunfeng Wang | Ruihua Song
Proceedings of the 31st International Conference on Computational Linguistics
Lianghao Deng | Yuchong Sun | Shizhe Chen | Ning Yang | Yunfeng Wang | Ruihua Song
Proceedings of the 31st International Conference on Computational Linguistics
The visual information-seeking task aims to answer visual questions that require external knowledge, such as “On what date did this building officially open?”. Existing methods using retrieval-augmented generation framework primarily rely on textual knowledge bases to assist multimodal large language models (MLLMs) in answering questions. However, the text-only knowledge can impair information retrieval for the multimodal query of image and question, and also confuse MLLMs in selecting the most relevant information during generation. In this work, we propose a novel framework MuKA which leverages a multimodal knowledge base to address these limitations. Specifically, we construct a multimodal knowledge base by automatically pairing images with text passages in existing datasets. We then design a fine-grained multimodal interaction to effectively retrieve multimodal documents and enrich MLLMs with both retrieved texts and images. MuKA outperforms state-of-the-art methods by 38.7% and 15.9% on the InfoSeek and E-VQA benchmark respectively, demonstrating the importance of multimodal knowledge in enhancing both retrieval and answer generation.