I am a Master’s student in Applied Math at Fudan University and an incoming CS Ph.D. student at Yale University, advised by Prof. Rex Ying. My research focuses on the reliability and verification of multimodal AI, aiming to build systems that integrate explicit structures with dynamic reasoning capabilities to ensure safe, verifiable, and logically sound outputs. Currently, I am exploring verifiable multimodal code generation and principled human-in-the-loop (HITL) mechanisms for error-aware autonomous agents. Feel free to reach out if you’d like to learn more about my work, chat, or explore potential collaborations. You can find my publications on my google scholar.

(Note: You might notice some early Medical AI publications on my Google Scholar. These stem from my first-year lab rotation exploring probabilistic modeling and uncertainty estimation. While I maintain close ties with my brilliant former colleagues, my research is now entirely focused on domain-agnostic multimodal systems.)

🔥 News

2026.03: Two papers (MacTok and GIFT) accepted to CVPR 2026.
2026.02: Joined Microsoft Research Asia (MSRA) as a Research Intern.
2026.02: Two papers (ARTDECO and IVQ) accepted to ICLR 2026.
2025.09: Two papers (GOOD and OrderMind) accepted to NeurIPS 2025.
2025.09: Loong accepted to NeurIPS 2025 Workshop.
2025.06: Dark-ISP accepted to ICCV 2025.
2025.05: Joined Shanghai AILab as a Research Intern, working with Jie Fu.
2025.04: Welcomed two cats into my life: Fellow and Putao 🐱🍇
2025.01: MVP accepted to ICLR 2025.
2024.12: Joined as a Research Intern, working with Prof. Chenyang Si and Prof. Ziwei Liu.

📝 Selected Publications

My research primarily focus on two directions: Robust Representation in Complex Environments and Safety and Verification in AI Systems.

Robust Representation in Complex Environments

ICLR 2025

MVP: Deep Incomplete Multi-view Learning via Cyclic Permutation of VAEs

Xin Gao, Jian Pu

[Project page]

MVP introduces a novel approach to incomplete multi-view representation learning by leveraging latent space correspondences in Variational Auto-Encoders, enabling the inference of missing views and enhancing the consistency of multi-view data even with irregularly missing information.

CVPR 2026

MacTok: Robust Continuous Tokenization for Image Generation

Hengyu Zeng#, Xin Gao#, Guanghao Li, Yuxiang Yan, Jiaoyang Ruan, Junpeng Ma, Haoyu Albert Wang, Jian Pu

MacTok identifies and mitigates the severe “posterior collapse” failure in continuous image tokenizers. We formulate tokenization as a rigorous information preservation task by introducing DINO-guided semantic masking and multi-level representation alignment, forcing the model to infer robust semantics from incomplete visual evidence. This prevents structural degradation and yields state-of-the-art generation fidelity (gFID 1.52 at 512x512) with up to 64x fewer tokens.

ICLR 2026

Escaping Low-Rank Traps: Interpretable Visual Concept Learning via Implicit Vector Quantization

Shujian Gao, Yuan Wang, Chenglong Ma, Xin Gao, Jiangtao Yan, Junzhi Ning, Cheng Tang, Changkai Ji, Huihui Xu, Wei Li, Ziyan Huang et al.

IVQ addresses representational collapse in Concept Bottleneck Models (CBMs), where vision-concept alignment degrades under low-rank feature traps. Rather than imposing a hard, lossy bottleneck, IVQ acts as a structural regularizer that anchors patch features to learned semantic prototypes without quantizing the forward pass, preserving high-rank and structured representations for robust, interpretable concept learning.

NeurIPS 2025

Learning Spatial-Aware Manipulation Ordering

Yuxiang Yan, Zhiyuan Zhou, Xin Gao, Guanghao Li, Shenglin Li, Jiaqi Chen, Qunyan Pu, Jian Pu

This paper introduces OrderMind, a spatial-aware manipulation ordering framework that learns object priorities from local geometry via a kNN spatial graph and a lightweight temporal module, supervised by VLM-distilled spatial priors. It also presents the Manipulation Ordering Benchmark (163k+ samples) for cluttered scenes.

ICLR 2026

ARTDECO: Toward High-Fidelity On-the-Fly Reconstruction with Hierarchical Gaussian Structure and Feed-Forward Guidance

Guanghao Li, Kerui Ren, Linning Xu, Zhewen Zheng, Changjian Jiang, Xin Gao, Bo Dai, Jian Pu, Mulin Yu, Jiangmiao Pang

[Project page]

ARTDECO unifies feed-forward 3D foundation priors with SLAM-style global optimization, and introduces a hierarchical 3D Gaussian scene representation with LoD-aware rendering to achieve efficient, robust, and high-fidelity on-the-fly monocular 3D reconstruction.

Safety and Verification in AI Systems

NeurIPS 2025

GOOD: Training-Free Guided Diffusion Sampling for Out-of-Distribution Detection

Xin Gao, Jiyao Liu, Guanghao Li, Yueming Lyu, Jianxiong Gao, Weichen Yu, Ningsheng Xu, Liang Wang, Caifeng Shan, Ziwei Liu, Chenyang Si

GOOD is a training-free diffusion guidance framework that shapes a robust OOD/ID decision boundary. It steers sampling with two gradients—image-level toward low-density regions and feature-level toward sparse zones—to generate diverse, controllable OOD examples.

ICCV 2025

Dark-ISP: Enhancing RAW Image Processing for Low-Light Object Detection

Jiasheng Guo#, Xin Gao#, Yuxiang Yan, Guanghao Li, Jian Pu

Dark-ISP is a lightweight, self-adaptive ISP plugin that enhances low-light object detection by processing Bayer RAW images. It breaks down traditional ISP pipelines into optimized linear and nonlinear sub-modules, using physics-informed priors for automatic RAW-to-RGB conversion.

NeurIPS 2025 Workshop

Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers

Xingyue Huang, Rishabh, Gregor Franke, Ziyi Yang, Jiamu Bai, Weijie Bai, Jinhe Bi, Zifeng Ding, Yiqun Duan, Chengyu Fan, Wendong Fan, Xin Gao et al.

[Project page]

Loong is an open-source framework for scalable, verifiable reasoning data. It pairs LoongBench (8,729 human-vetted, code-checkable examples across 12 domains) with LoongEnv, a modular generator forming an agent–environment loop for RL with verifiable rewards and broad-domain CoT training.

🎖 Honors and Awards

2023.09 Fudan University Zhicheng Freshman Second Prize Scholarship (Top 5%)
2023.06 Outstanding Graduate of Shanghai
2022.11 Second Prize in the Chinese Mathematics Competitions (Category A)
2021.12 National Scholarship, China
2021.09 National Second Prize in the China Undergraduate Mathematical Contest in Modeling
2020.12 Shanghai Scholarship

👨‍💼 Academic Service

Conference Reviewer: Neurips 2025, ICCV 2025, ICLR 2026, ICML 2026

📖 Educations

2023.09 - 2026.06 (now), Master of Applied Mathematics, Fudan University, Shanghai, China.
2019.09 - 2023.06, Bachelor of Mathematics, Donghua Univeristy, Shanghai, China.

💻 Internships

2026.02 - present, Microsoft Research Asia, Beijing, China.
2025.05 - 2025.12, Shanghai AIlab, Shanghai, China.

📚 Learning Materials

😁 If you want the following material without watermarks, please contact me using the email address and specify your intended use.

Material 1: Frontiers in Diffusion Model Technologies (1)

This document provides an overview of key concepts related to diffusion models, particularly focusing on the theoretical foundations, development timeline, and recent advancements in the field. The content includes detailed discussions on VAE, DDPM, DDIM, SDE, and ODE, as well as conditional guidance. It also covers the evolution of stable diffusion, including topics like Latent Diffusion, VQ-VAE, and DiT. Lastly, the document highlights the latest methodology, IC-Light, set to be presented at ICLR 2025.

Material 2: Tutorial of Information Geometry and t3-VAE

This document introduces the t3-Variational Autoencoder (ICLR 2024), which uses Student’s t-distributions to model heavy-tailed data distributions and improve latent variable representations. It also explores the framework of Information Geometry, focusing on how generative models can be understood through statistical manifolds, divergences, and Riemannian metrics, providing a deeper understanding of probability distributions and their applications in machine learning, signal processing, and neuroscience.

Material 3: EM Algorithm and X-metric

This document introduces the X-metric framework (PAMI 2023), an N-dimensional information-theoretic approach designed for groupwise registration and deep combined computing, with applications in advanced machine learning tasks. It also covers the theoretical foundations, including entropy, mutual information, and the MLE algorithm, alongside the framework’s modifications for deep computing and network training.