I am Bowen Zhang (张博文), a final-year Ph.D. candidate at University of Science and Technology of China (USTC), enrolled in the USTC-Microsoft Research Asia (MSRA) Joint-PhD Program. I have been fortunate to be co-advised by Dr. Baining Guo and Prof. Feng Zhao. Before that, I receieved my Bachelor's Degree in Schoole of the Gifted Young, USTC, in 2021.
My research focuses on the exciting challenges of generative modeling, especially on 3D/4D generation. I recently joined the Meta FAIR Perception team as a Research Scientist Intern. Prior to this, I spent over four years as a research intern at Visual Computing Group in Microsoft Research Asia, where I had the opportunity to learn from and collaborate with many incredible researchers, including Bo Zhang, Jianmin Bao, Dong Chen and Jiaolong Yang.
I am looking for a full-time research role where I can continue to explore and contribute to the field of generative AI, starting in Fall 2026. For a detailed overview of my work, please see my CV. If my background aligns with your interests, please don't hesitate to reach out.
We present a novel video-to-4D framework that performs diffusion modeling on a compressed representation of Gaussian Splats animations, enabling high-quality generation from a single-view video input.
We introduce a unified Structured LATent (SLAT) representation for versatile and high-quality 3D asset creation, which is achieved by integrating a sparsely-populated 3D grid with dense multiview visual features extracted from a powerful vision foundation model, comprehensively capturing both geometry and appearance information while maintaining flexibility during decoding.
We present a structured and explicit representation for 3D generative modeling by structuring Gaussian Splatting using Optimal Transport, achieving state-of-the-art generation quality.
We introduce RodinHD, which can generate high-fidelity 3D avatars from a frontal view portrait image.
We propose an ID-preserving talking head generation framework and achieves fast personalized adaptation through the proposed meta-learning approach.
A transformer-based GAN crafted for high-resolution image synthesis, which achieves SOTA generation quality on the resolution of 1024, which for the first time, approaches the best performed ConvNets.
I have broad interests in both Computer Science and Artificial Intelligence. It is really exciting to find out the bottlenecks of the programs and try our best to optimize them using our knowledge. I am also interested in NLP (Natural Language Processing), and I try to use these amazing techniques (such as sentiment analysis, question answering and so on) to make machines better understanding humans. Besides, I think analyzing mass data and finding the underlying patterns is also an awesome thing.
I Love playing Rubik Cubes in my spare time, and I have taken part in world level competitions many times. Making decisions about what to do next in just seconds is really exciting, and you can check out my personal best record at the World Cube Association website.
I also love music and singing. I have made a home studio and have covered some of my favorite songs, you can check out these covers at my radio in NetEase Cloud Music. Besides, I'm now trying to write my songs.