How to run Genmo Mochi 1 video generation on a cloud GPU

Video generation is the next frontier for generative AI. Unlike generating images or text, generative video is harder because it needs more compute, has less accessible training datasets, and includes more variables such as smooth motion, temporal coherence and frame aesthetics.

Models like Llama, Pixtral, Flux, and many others have demonstrated how open-source AI drives faster and more widespread innovation across the field. That’s why Genmo’s announcement of the Mochi 1 model is a key step forward in advancing generative AI.

Here’s a snapshot of Mochi specs:

Mochi 1 Specifications
Model Architecture	Diffusion models built on Asymmetric Diffusion Transformer (AsymmDiT) architecture
Parameters	10B
Context	Context window of 44,520 video tokens
Resolution	480p
Frames	Up to 30 frames per second and a length of 5.4 seconds
Licensing	Apache 2.0 - Personal and Commercial

Genmo AI's benchmark results showcase state of the art (SOTA) performance in both prompt adherence and ELO scores. These metrics indicate how close the result is to the user’s prompt and fluidity in motion.

Source: Genmo AI

Connect with our team and other AI builders

Join Ori on Discord

Deploy Genmo Mochi Video with ComfyUI on an Ori GPU instance

ComfyUI is an open-source AI tool developed and maintained by Comfy Org for running image and video generation models. Check out their Github here.

Pre-requisites:

Create a GPU virtual machine (VM) on Ori Global Cloud. We chose the NVIDIA H100 SXM with 80 GB VRAM for this demo, but the optimized ComfyUI version also runs on GPUs with lesser memory. A powerful GPU with more memory enhances the variational autoencoder (VAE), but ComfyUI will switch to tiled VAE automatically if memory is limited.

Install ComfyUI:

Step 1:

Bash/ShellCopy

1sudo apt update
2sudo apt install git

Step 2: Download the ComfyUI files

Bash/ShellCopy

1git clone https://github.com/comfyanonymous/ComfyUI.git 
2cd ComfyUI

Step 3: If you didn’t add the init script for Pytorch, when creating the virtual machine

Bash/ShellCopy

1  pip install torch torchvision torchaudio

Step 4: Install dependencies

Bash/ShellCopy

1pip install -r requirements.txt

Step 5: Install ComfyUI Manager which helps you manage your custom nodes and instance.

Bash/ShellCopy

1cd custom_nodes
2git clone https://github.com/ltdrdata/ComfyUI-Manager.git

How good is Genmo Mochi 1?

Mochi 1 Preview demonstrated strong prompt adherence and impressive video dynamics. Reducing the iteration count and frame count can significantly shorten video generation time, which could otherwise take 45 minutes for a 5-second clip at 200 iterations and 30 fps.

We recommend testing the model several times to find a frame rate and iteration count that balance your needs and efficiency. We observed that using detailed prompts—including specifics on camera angles, motion type, lighting, and environment—yielded better results.

With the right prompts, Mochi 1 showed remarkable flexibility in frame aesthetics. While the motion occasionally appeared glitchy, it was generally smooth and fluid. The Genmo Mochi AI model also excelled in executing close-up shots with impressive clarity.

The model struggles with text insertion, a challenge that has also affected image generation models in the past. However, recent improvements in image models show promise, and we expect Genmo to enhance this feature in future updates. Each model iteration takes about 14 seconds which means generating a 5 second clip with 200 iterations takes more than 45 minutes (14*200 seconds), whereas closed-source video generation models are typically much faster.

Though Genmo currently limits the model to 480p resolution and 5-second videos, we’re excited about their upcoming 720p version of the text-to-video model and the future of open source video generation.

Imagine another AI reality. Build it on Ori.

Ori Global Cloud is the first AI infrastructure provider with the native expertise, comprehensive capabilities and end-to-endless flexibility to support any model, team, or scale. Here’s what you can do on Ori:

Deploy top-tier GPU instances for training, finetuning and inference workloads.
Serve and scale world-changing AI/ML models with Serverless Kubernetes.
Build limitless AI with Ori Private Cloud

Build limitless AI on Ori

Chart your own AI reality with Ori's comprehensive AI cloud platform.