Zhiwei Jia

"A year spent in artificial intelligence is enough to make one believe in God."

Alan Perlis

Hi there! I am Zhiwei Jia (pronounced Juh-Way Jee-A). You can also call me Sean. My current research interest lies in multi-modal generative models. I obtained my Ph.D. at UC San Diego, focusing on Embodied AI (i.e., multimodal AI agent). I was working with Prof. Hao Su and Prof. Zhuowen Tu.

Selected Work Experience

Applied Researcher @ Character AI | 2025/3~
- Accelerating large audio-driven video diffusion models (10B+).
Applied Researcher @ Zoom | 2023/11~2025/3
- Image and video generation with accelerated diffusion models.
Research Intern @ Google | 2022/6~9
- VLM fine-tuning for image ad understanding.
Research Intern @ Amazon | 2021/6~9
- Indoor scene and human instruction understanding with multimodal Transformers.
Research Intern @ Google X | 2020/6~9
- Image generation via GANs with applications to sim-to-real domain adaptation.
ML Engineer Intern @ Quora | 2019/6~9
- LLM fine-tuning for fine-grained text understanding.

Selected Publications (full list here)

Visual Understanding & Generation

Reward Fine-Tuning Two-Step Diffusion Models via Learning Differentiable Latent-Space Surrogate Reward (CVPR 2025) [page]
Z. Jia, Y. Nan, H. Zhao, G. Liu
MetaCLUE: Towards Comprehensive Visual Metaphors Research (CVPR 2023) [page]
A. Akula, B. Driscoll, P. Narayana, S. Changpinyo, Z. Jia, S. Damle, G. Pruthi, S. Basu, L. Guibas, W. Freeman, Y. Li, V. Jampani
KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models (ACL 2023) [arXiv]
Z. Jia, B. Yuan, K. Wang, H. Wu, D. Clifford, Z. Yuan, H. Su
Semantically Robust Unpaired Image Translation for Data with Unmatched Semantics Statistics (ICCV 2021) [arXiv]
Z. Jia, B. Yuan, K. Wang, H. Wu, D. Clifford, Z. Yuan, H. Su

AI Agent & Sequential Decision-Making

Chain-of-Thought Predictive Control (ICML 2024) [page]
Z. Jia, V. Thumuluri, F. Liu, L. Chen, Z. Huang, H. Su
Learning to Act with Affordance-Aware Multimodal Neural SLAM (IROS 2022) [arXiv]
Z. Jia, K. Lin, Y. Zhao, Q. Gao, G. Thattai, G. Sukhatme
Improving Policy Optimization with Generalist-Specialist Learning (ICML 2022) [arXiv]
Z. Jia, X. Li, Z. Ling, S. Liu, Y. Wu, H. Su
Refactoring Policy for Compositional Generalizability using Self-Supervised Object Proposals (NeurIPS 2020) [page]
T. Mu, J. Gu, Z. Jia, H. Tang, H. Su

Contacts

Email: sean.jia.z.w 📞 gmail.com (replaced with "@")
LinkedIn / Google Scholar / Github / X (Twitter)

Page updated

Google Sites

Report abuse