close
close
DeepMind’s Genie 2 can create interactive worlds that look like video games

DeepMind, Google’s AI research organization, has unveiled a model that can create an “endless” variety of playable 3D worlds.

The model, called Genie 2 – the successor to DeepMind’s Genie, released earlier this year – can generate a real-time interactive scene (e.g. “A cute humanoid robot in the forest”) from a single image and a text description. In this respect, it is similar to models being developed by Fei-Fei Li’s company World Labs and the Israeli company Decart.

DeepMind claims that Genie 2 can create a “wide variety of rich 3D worlds,” including worlds where users can perform actions such as jumping and swimming using the mouse or keyboard. The model, trained on videos, is capable of simulating object interactions, animations, lighting, physics, reflections and “NPC” behavior.

DeepMind Genius 2
Photo credit:DeepMind

Many of Genie 2’s simulations look like AAA video games – and the reason may well be that the model’s training data includes playthroughs of popular games. But DeepMind, like many AI labs, didn’t want to reveal many details about its data gathering methods, likely for competitive reasons.

One wonders about the impact on intellectual property. As a subsidiary of Google, DeepMind has full access to YouTube, and Google has previously indicated that it will receive permission to use YouTube videos for model training in its terms of service. But does Genie 2 essentially create unauthorized copies of the games it has “seen”? The courts will probably have to decide that.

Genie 2 can produce consistent worlds with varying perspectives such as first-person and isometric views for up to a minute, with the majority lasting 10 to 20 seconds.

“Genie 2 intelligently responds to actions performed by pressing keys on a keyboard, identifying the character and moving it correctly,” DeepMind explained in a blog post. “For example, our model (can) figure out that arrow keys should move a robot, not trees or clouds.”

DeepMind Genius 2
Photo credit:DeepMind

Most models like Genie 2 – world models, so to speak – can simulate games and 3D environments, but with artifacts, consistency and hallucination problems. Decart’s Minecraft simulator Oasis, for example, has a low resolution and quickly “forgets” the arrangement of the levels.

However, Genie 2 can remember parts of a simulated scene that are not visible and accurately play them back when they become visible again, DeepMind claims. (World Labs models can do this too.)

Well, games made with Genie 2 wouldn’t actually be that much fun. If your progress were deleted every minute, it would mislead everyone. So DeepMind positions the model more as a research and creative tool – a tool for prototyping “interactive experiences” and evaluating AI agents.

DeepMind Genius 2
Photo credit:DeepMind

“Thanks to Genie 2’s out-of-distribution generalization capabilities, concept art and drawings can be transformed into fully interactive environments,” DeepMind wrote. “And by using Genie 2 to quickly create rich and diverse environments for AI agents, our researchers can generate assessment tasks that agents have not yet seen during training.”

DeepMind says that while Genie 2 is still in its early stages, the lab expects it to be a key component in the development of AI agents of the future.

Google has poured more and more resources into world models that promise to be the next big thing in AI. In October, DeepMind hired Tim Brooks, who led the development of OpenAI’s Sora video generator, to work on video generation technologies and world simulators.

Leave a Reply

Your email address will not be published. Required fields are marked *