Arbitrary Motion Style Transfer with Multi-condition Motion Latent Diffusion Model

[CVPR 2024]
1Computer School, Beijing Information Science and Technology University, China 2Zhongguancun Laboratory, China 3State Key Laboratory of Virtual Reality Technology and Systems, Beihang University 4College of Computer Science and Technology, China University of Petroleum (East China) 5Research Unit of Virtual Human and Virtual Surgery (2019RU004), Chinese Academy of Medical Sciences 6Department of Computer Science, Stony Brook University (SUNY at Stony Brook), Stony Brook New York 11794-2424, USA
* Correspoonding Author

Demo

Abstract

Computer animation's quest to bridge content and style has historically been a challenging venture, with previous efforts often leaning toward one at the expense of the other. This paper tackles the inherent challenge of content-style duality, ensuring a harmonious fusion where the core narrative of the content is both preserved and elevated through stylistic enhancements. We propose a novel Multi-condition Motion Latent Diffusion Model (MCM-LDM) for Arbitrary Motion Style Transfer (AMST). Our MCM-LDM significantly emphasizes preserving trajectories, recognizing their fundamental role in defining the essence and fluidity of motion content. Our MCM-LDM's cornerstone lies in its ability first to disentangle and then intricately weave together motion's tripartite components: motion trajectory, motion content, and motion style. The critical insight of MCM-LDM is to embed multiple conditions with distinct priorities. The content channel serves as the primary flow, guiding the overall structure and movement, while the trajectory and style channels act as auxiliary components and synchronize with the primary one dynamically. This mechanism ensures that multi-conditions can seamlessly integrate into the main flow, enhancing the overall animation without overshadowing the core content. Empirical evaluations underscore the model's proficiency in achieving fluid and authentic motion style transfers, setting a new benchmark in the realm of computer animation.

Method

cars peace

Our approach achieves Arbitrary Motion Style Transfer (AMST) by utilizing motion content, style, and trajectory as guiding conditions in the denoising process of our MCM-LDM. Our method begins with the extraction and encoding of these conditions using our Multi-condition Extraction module. To generate stylized motion guided by content, trajectory, and style conditions, we introduce our MCM-LDM, a motion latent diffusion model optimized for multi-condition guidance.

Acknowledgements

This paper is supported by Beijing Natural Science Foundation (L232102, 4222024), National Natural Science Foundation of China (62102036, 62272021, 62172246), R&D Program of Beijing Municipal Education Commission (KM202211232003), Beijing Science and Technology Plan Project Z231100005923039, National Key R&D Program of China (No. 2023YFF1203803), the Youth Innovation and Technology Support Plan of Colleges and Universities in Shandong Province (2021KJ062), USA NSF IIS-1715985 and USA NSF IIS-1812606 (awarded to Hong QIN).

BibTeX


@inproceedings{tevet2022motionclip,
  title={Arbitrary Motion Style Transfer with Multi-condition Motion Latent Diffusion Model},
  author={Song, Wenfeng and Jin, Xingliang and Li, Shuai and Chen, Chenglizhao and Hao, Aimin and Hou, Xia and Li, Ning and Qin, Hong},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={},
  year={2024}
}