Publications

A collection of my research work. Full list can be found at my Google Scholar.

AgentForesight: Online Auditing for Early Failure Prediction in Multi-Agent Systems

Boxuan Zhang^*, Jianing Zhu^*, Zeru Shi, Dongfang Liu, Ruixiang Tang

arXiv preprint arXiv:2605.08715 2026

We reframe agentic failure analysis from post-hoc attribution on completed trajectories to online auditing on unfolding prefixes, where an auditor commits a continue-or-alarm verdict at every step.

Paper Code Project

Micro-Defects Expose Macro-Fakes: Detecting AI-Generated Images via Local Distributional Shifts

Boxuan Zhang, Jianing Zhu, Qifan Wang, Jiang Liu, Ruixiang Tang

arXiv preprint arXiv:2605.09296 2026

We propose Micro-Defects expose Macro-Fakes (MDMF), a local distribution-aware detection framework that amplifies micro-scale statistical irregularities into macro-level distributional discrepancies for AI-generated image detection.

Paper Code Project

MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory

Minghao Guo^*, Qingyue Jiao^*, Zeru Shi^*, Yihao Quan, Boxuan Zhang, Danrui Li, Liwei Che, Wujiang Xu, Shilong Liu, Zirui Liu, Mubbasir Kapadia, Vladimir Pavlovic, Jiang Liu, Mengdi Wang, Yiyu Shi, Dimitris N. Metaxas, Ruixiang Tang

arXiv preprint arXiv:2605.15128 2026

MemEye is a vision-centric long-term memory benchmark that evaluates agents' ability to remember, update, and reason over visual information across long-running, multi-session image-grounded interactions.

Paper Code Project

Shifting Uncertainty to Critical Moments: Towards Reliable Uncertainty Quantification for VLA Models

Yanchuan Tang^*, Taowen Wang^*, Yuefei Chen, Boxuan Zhang, Qiang Guan, Ruixiang Tang

IEEE International Conference on Multimedia and Expo (ICME) 2026🏆 Best Paper Award Candidate

Shifts uncertainty estimation toward critical decision moments to deliver reliable uncertainty quantification for Vision-Language-Action (VLA) models.

Paper

Dive into the Agent Matrix: A Realistic Evaluation of Self-Replication Risk in LLM Agents

Boxuan Zhang^*, Yi Yu^*, Jiaxuan Guo, Jing Shao

arXiv preprint arXiv:2509.25302 2025

We present a comprehensive evaluation framework for quantifying self-replication risks. Our framework establishes authentic production environments and realistic tasks to enable scenario-driven assessment of agent behaviors.

Paper

CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought

Boxuan Zhang, Ruqi Zhang

Findings of the Association for Computational Linguistics (ACL) 2025

Quantify response-wise uncertainty by integrating LLMs' inherent reasoning capabilities through Chain-of-Thought (CoT) into the uncertainty quantification process.

Paper Code

What Shapes a Creative Machine Mind? Comprehensively Benchmarking Creativity in Foundation Models

Zicong He^*, Boxuan Zhang^*, Weihao Liu^*, Ruixiang Tang, Lu Cheng

arXiv preprint arXiv:2510.04009 2025

C2-Eval is a holistic benchmark for unified assessment of creativity in foundation models, distinguishing convergent (constrained-solution) and divergent (open-ended) creativity.

Paper

What If the Input is Expanded in OOD Detection?

Boxuan Zhang^*, Jianing Zhu^*, Zengmao Wang, Tongliang Liu, Bo Du, Bo Han

Advances in Neural Information Processing Systems (NeurIPS) 2024

Propose a novel perspective to employ different common corruptions on the input space to expand the representation dimension for out-of-distribution detection.

Paper Code Project

Boosting Semi-Supervised Object Detection in Remote Sensing Images With Active Teaching

Boxuan Zhang, Zengmao Wang, Bo Du

IEEE Geoscience and Remote Sensing Letters (GRSL) 2024

Propose to boost semi-supervised object detection with active teaching (SSOD-AT) in remote sensing images, alleviating the dependency on limited labeled images in remote sensing scenarios.

Paper Code