Publications
A collection of my research work. Full list can be found at my Google Scholar.

AgentForesight: Online Auditing for Early Failure Prediction in Multi-Agent Systems
Boxuan Zhang*, Jianing Zhu*, Zeru Shi, Dongfang Liu, Ruixiang Tang
arXiv preprint arXiv:2605.08715 2026
We reframe agentic failure analysis from post-hoc attribution on completed trajectories to online auditing on unfolding prefixes, where an auditor commits a continue-or-alarm verdict at every step.

Micro-Defects Expose Macro-Fakes: Detecting AI-Generated Images via Local Distributional Shifts
Boxuan Zhang, Jianing Zhu, Qifan Wang, Jiang Liu, Ruixiang Tang
arXiv preprint arXiv:2605.09296 2026
We propose Micro-Defects expose Macro-Fakes (MDMF), a local distribution-aware detection framework that amplifies micro-scale statistical irregularities into macro-level distributional discrepancies for AI-generated image detection.

MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory
Minghao Guo*, Qingyue Jiao*, Zeru Shi*, Yihao Quan, Boxuan Zhang, Danrui Li, Liwei Che, Wujiang Xu, Shilong Liu, Zirui Liu, Mubbasir Kapadia, Vladimir Pavlovic, Jiang Liu, Mengdi Wang, Yiyu Shi, Dimitris N. Metaxas, Ruixiang Tang
arXiv preprint arXiv:2605.15128 2026
MemEye is a vision-centric long-term memory benchmark that evaluates agents' ability to remember, update, and reason over visual information across long-running, multi-session image-grounded interactions.

Shifting Uncertainty to Critical Moments: Towards Reliable Uncertainty Quantification for VLA Models
Yanchuan Tang*, Taowen Wang*, Yuefei Chen, Boxuan Zhang, Qiang Guan, Ruixiang Tang
IEEE International Conference on Multimedia and Expo (ICME) 2026🏆 Best Paper Award Candidate
Shifts uncertainty estimation toward critical decision moments to deliver reliable uncertainty quantification for Vision-Language-Action (VLA) models.

Dive into the Agent Matrix: A Realistic Evaluation of Self-Replication Risk in LLM Agents
Boxuan Zhang*, Yi Yu*, Jiaxuan Guo, Jing Shao
arXiv preprint arXiv:2509.25302 2025
We present a comprehensive evaluation framework for quantifying self-replication risks. Our framework establishes authentic production environments and realistic tasks to enable scenario-driven assessment of agent behaviors.

CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought
Boxuan Zhang, Ruqi Zhang
Findings of the Association for Computational Linguistics: ACL 2025 2025
Quantify response-wise uncertainty by integrating LLMs' inherent reasoning capabilities through Chain-of-Thought (CoT) into the uncertainty quantification process.

What Shapes a Creative Machine Mind? Comprehensively Benchmarking Creativity in Foundation Models
Zicong He*, Boxuan Zhang*, Weihao Liu*, Ruixiang Tang, Lu Cheng
arXiv preprint arXiv:2510.04009 2025
C2-Eval is a holistic benchmark for unified assessment of creativity in foundation models, distinguishing convergent (constrained-solution) and divergent (open-ended) creativity.

What If the Input is Expanded in OOD Detection?
Boxuan Zhang*, Jianing Zhu*, Zengmao Wang, Tongliang Liu, Bo Du, Bo Han
Advances in Neural Information Processing Systems 2024
Propose a novel perspective to employ different common corruptions on the input space to expand the representation dimension for out-of-distribution detection.

Boosting Semi-Supervised Object Detection in Remote Sensing Images With Active Teaching
Boxuan Zhang, Zengmao Wang, Bo Du
IEEE Geoscience and Remote Sensing Letters 2024
Propose to boost semi-supervised object detection with active teaching (SSOD-AT) in remote sensing images, alleviating the dependency on limited labeled images in remote sensing scenarios.