Jifang Wang

2025

Conditional image generation has gained significant attention for its ability to personalize content. However, the field faces challenges in developing task-agnostic, reliable, and explainable evaluation metrics. This paper introduces CIGEval, a unified agentic framework for comprehensive evaluation of conditional image generation tasks. CIGEval utilizes large multimodal models (LMMs) as its core, integrating a multi-functional toolbox and establishing a fine-grained evaluation framework. Additionally, we synthesize evaluation trajectories for fine-tuning, empowering smaller LMMs to autonomously select appropriate tools and conduct nuanced analyses based on tool outputs. Experiments across seven prominent conditional image generation tasks demonstrate that CIGEval (GPT-4o version) achieves a high correlation of 0.4625 with human assessments, closely matching the inter-annotator correlation of 0.47. Notably, when implemented with 7B open-source LMMs using only 2.3K training trajectories, CIGEval surpasses the previous GPT-4o-based state-of-the-art method. These findings indicate that CIGEval holds great potential for automating evaluation of image generation tasks while maintaining human-level reliability.

pdf bib abs
MeKB-Sim: Personal Knowledge Base-Powered Multi-Agent Simulation
Zhenran Xu | Jifang Wang | Baotian Hu | Longyue Wang | Min Zhang
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations)

Language agents have demonstrated remarkable emergent social behaviors within simulated sandbox environments. However, the characterization of these agents has been constrained by static prompts that outline their profiles, highlighting a gap in achieving simulations that closely mimic real-life interactions. To close this gap, we introduce MeKB-Sim, a multi-agent simulation platform based on a dynamic personal knowledge base, termed MeKB. Each agent’s MeKB contains both fixed and variable attributes—such as linguistic style, personality, and memory—crucial for theory-of-mind modeling. These attributes are updated when necessary, in response to events that the agent experiences. Comparisons with human annotators show that the LLM-based attribute updates are reliable. Based on the dynamic nature of MeKB, experiments and case study show that MeKB-Sim enables agents to adapt their planned activities and interactions with other agents effectively. Our platform includes a Unity WebGL game interface for visualization and an interactive monitoring panel that presents the agents’ planning, actions, and evolving MeKBs over time. For more information, including open-source code, a live demo website, and videos, please visit our project page at https://mekb-sim.github.io/.

2024

As we all know, hallucinations prevail in Large Language Models (LLMs), where the generated content is coherent but factually incorrect, which inflicts a heavy blow on the widespread application of LLMs. Previous studies have shown that LLMs could confidently state non-existent facts rather than answering “I don’t know”. Therefore, it is necessary to resort to external knowledge to detect and correct the hallucinated content. Since manual detection and correction of factual errors is labor-intensive, developing an automatic end-to-end hallucination-checking approach is indeed a needful thing. To this end, we present Medico, a Multi-source evidence fusion enhanced hallucination detection and correction framework. It fuses diverse evidence from multiple sources, detects whether the generated content contains factual errors, provides the rationale behind the judgment, and iteratively revises the hallucinated content. Experimental results on evidence retrieval (0.964 HR@5, 0.908 MRR@5), hallucination detection (0.927-0.951 F1), and hallucination correction (0.973-0.979 approval rate) manifest the great potential of Medico. A video demo of Medico can be found at https://youtu.be/RtsO6CSesBI.

Co-authors

Venues

Fix author