Short Bio
I am a first-year Master’s student at the School of Electronic and Computer Engineering, Peking University , advised by Prof. Li Yuan and Prof. Yonghong Tian . I am also fortunate to work closely with Prof. Daquan Zhou. I received my B.E. degree from the University of Chinese Academy of Sciences (UCAS) in 2025, where I was awarded as an Outstanding Graduate.
My research interests primarily lie in video generation for real-world applications.
News
- 2025.05: 🎉🎉 I am selected as an Outstanding Graduate of Beijing, with my undergraduate thesis named an Outstanding Graduation Paper of UCAS.
- 2024.11: 🎉🎉 I win the China National Scholarship and First-Level Scholarship of UCAS!
- 2024.07: 🎉🎉 One paper is accepted by ECCV 2024.
Publications

Omni-Grid: Taming Image-based Unified Multimodal Models to Video
Juncheng Ma, Yuelin Li, Yufan Deng, Zhenyu Tang, Dongdong Yu, Li Yuan, Changhu Wang, Daquan Zhou†, Yonghong Tian†
Under Review
We present a cost-efficient approach to adapt image-based UMMs to the video domain while preserving their original strengths. Our model surpasses previous video-based UMMs on VBench, while also exhibiting zero-shot multimodal interleaved abilities in video, including CoT reasoning and text rendering.

FastAvatar: Accelerating Portrait Animation via Memory-Adaptive Caching
Juncheng Ma, Yuxuan Du, Yanan SUN, Zhening Xing, Changlin Li, Zhenyu Tang, Bo Li, Peng-Tao Jiang, Li Yuan, Daquan Zhou†, Yonghong Tian†
Under Review
We proposed SyncCache, a training-free, audio-aware, and memory-adaptive caching strategy tailored for audio-driven portrait animation, which delivers a 4.12× acceleration on Hunyuan-Avatar and 3.75× on Wan-S2V with negligible degradation in visual quality or audio alignment.

Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation
Juncheng Ma, Peiwen Sun, Yaoting Wang, Di Hu
ECCV 2024 | arXiv | Code | Project | PDF | Supplementary
Rethink audio-visual semantic segmentatoin from a new perspective, with a progressive two-stage training strategy proposed to enhance the capability of audio-visual alignment .
Experiences

Research Intern, Shanghai AI Laboratory, Shanghai, China, 2024.7-2024.11
Research on video generation especially audio-driven portrait animation.
Supervised by Dr. Yanan Sun and Dr. Yanhong Zeng

Visiting Student, Renmin University of China, Beijing, China, 2023.10-2024.3
Research on multimodal learning especially audio-visual segmentation, aiming to segment sound sources within a video according to its corresponding audio.
One paper Stepping-Stones is accepted by ECCV2024.
Supervised by Prof. Di Hu
Honors and Awards
- 2025, Outstanding Graduation Paper of UCAS.
- 2025, Outstanding Graduate of Beijing.
- 2024, China National Scholarship.
- 2024, First-Level Scholarship of UCAS.
Educations
- 2025.09 - present, Master, Peking University.
- 2021.09 - 2025.6, Undergraduate, University of Chinese Academy of Sciences.