Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Advancing the state-of-the-art
Genmo, an AI company focused on video generation, has announced the release of a research preview for Mochi 1, a groundbreaking open-source model for generating high-quality videos from text prompts — and claims performance comparable to, or exceeding, leading closed-source/proprietary rivals such as Runway’s Gen-3 Alpha, Luma AI’s Dream Machine, Kuaishou’s Kling, Minimax’s Hailuo, and many others.
Available under the permissive Apache 2.0 license, Mochi 1 offers users free access to cutting-edge video generation capabilities — whereas pricing for other models starts at limited free tiers but goes as high as $94.99 per month (for the Hailuo Unlimited tier).
In addition to the model release, Genmo is also making available a hosted playground, allowing users to experiment with Mochi 1’s features firsthand.
The 480p model is available for use today, and a higher-definition version, Mochi 1 HD, is expected to launch later this year.
Initial videos shared with VentureBeat show impressively realistic scenery and motion, particularly with human subjects as seen in the video of an elderly woman below:
Series A funding to the tune of $28.4M
In tandem with the Mochi 1 preview, Genmo also announced it has raised a $28.4 million Series A funding round, led by NEA, with additional participation from The House Fund, Gold House Ventures, WndrCo, Eastlink Capital Partners, and Essence VC. Several angel investors, including Abhay Parasnis (CEO of Typespace) and Amjad Masad (CEO of Replit), are also backing the company’s vision for advanced video generation.
Jain’s perspective on the role of video in AI goes beyond entertainment or content creation. “Video is the ultimate form of communication—30 to 50% of our brain’s cortex is devoted to visual signal processing. It’s how humans operate,” he said.
Open for collaboration — but training data is still close to the vest
Mochi 1 is built on Genmo’s novel Asymmetric Diffusion Transformer (AsymmDiT) architecture.
At 10 billion parameters, it’s the largest open source video generation model ever released. The architecture focuses on visual reasoning, with four times the parameters dedicated to processing video data as compared to text.
Efficiency is a key aspect of the model’s design. Mochi 1 leverages a video VAE (Variational Autoencoder) that compresses video data to a fraction of its original size, reducing the memory requirements for end-user devices. This makes it more accessible for the developer community, who can download the model weights from HuggingFace or integrate it via API.
Jain believes that the open-source nature of Mochi 1 is key to driving innovation. “Open models are like crude oil. They need to be refined and fine-tuned. That’s what we want to enable for the community—so they can build incredible new things on top of it,” he said.
Limitations and roadmap
As a preview, Mochi 1 still has some limitations. The current version supports only 480p resolution, and minor visual distortions can occur in edge cases involving complex motion. Additionally, while the model excels in photorealistic styles, it struggles with animated content.
However, Genmo plans to release Mochi 1 HD later this year, which will support 720p resolution and offer even greater motion fidelity.
“The only uninteresting video is one that doesn’t move—motion is the heart of video. That’s why we’ve invested heavily in motion quality compared to other models,” said Jain.
Looking ahead, Genmo is developing image-to-video synthesis capabilities and plans to improve model controllability, giving users even more precise control over video outputs.
Expanding use cases via open source video AI
Mochi 1’s release opens up possibilities for various industries. Researchers can push the boundaries of video generation technologies, while developers and product teams may find new applications in entertainment, advertising, and education.
Mochi 1 can also be used to generate synthetic data for training AI models in robotics and autonomous systems.
Reflecting on the potential impact of democratizing this technology, Jain said, “In five years, I see a world where a poor kid in Mumbai can pull out their phone, have a great idea, and win an Academy Award—that’s the kind of democratization we’re aiming for.”
Genmo invites users to try the preview version of Mochi 1 via their hosted playground at genmo.ai/play, where the model can be tested with personalized prompts — though at the time of this article’s posting, the URL was not loading the correct page for VentureBeat.
A call for talent
As it continues to push the frontier of open-source AI, Genmo is actively hiring researchers and engineers to join its team. “We’re a research lab working to build frontier models for video generation. This is an insanely exciting area—the next phase for AI—unlocking the right brain of artificial intelligence,” Jain said. The company is focused on advancing the state of video generation and further developing its vision for the future of artificial general intelligence.
FAQs
Q: What license is Mochi 1 available under?
A: Mochi 1 is available under the permissive Apache 2.0 license.
Q: What are the limitations of the current version of Mochi 1?
A: The current version supports only 480p resolution and may experience minor visual distortions in complex motion cases.
Credit: venturebeat.com