Google DeepMind has introduced Genie 3, an improved version of the Genie 2 model announced at the end of last year. The new model can create real-time, interactive simulations at 24 frames per second and 720p resolution using only an image or text.
Genie 3 Model Unveiled
Users can control these created digital worlds with keyboard commands. The model, still in development, is currently available only to a limited number of researchers and experts.
Genie 3 is positioned as a tool beyond entertainment and game production. DeepMind sees the model as a significant milestone in artificial general intelligence (AGI) research. While real-world data is no longer sufficient for training AI, models like Genie 3 can generate an infinite amount of controlled and repeatable synthetic data. This allows researchers to train AI in more complex, real-world-like scenarios.
One of the most striking differences of the new model is the increased memory capacity. While Genie 2 only has a visual memory of a few seconds, Genie 3 extends this time to minutes. DeepMind describes this feature as “long-horizon memory.”
The model can maintain physical consistency by remembering the movements of objects in its generated scenes over time. For example, it can accurately predict the direction of an object’s movement or detect when an object is about to topple. These inferences are achieved through its own learning, without any external coding required.
Users can instantly add new characters, objects, or weather events to the created digital environment. DeepMind describes this feature as “command-triggerable events.” The ability to dynamically shape the environment makes Genie 3 a powerful tool for interactive scenarios and AI training.
However, Genie 3 still has significant limitations. It cannot provide detailed and consistent simulations of real-world environments. Inconsistencies can occasionally appear in the generated scenes. Human figures sometimes move unrealistically, and text content can be displayed distortedly.
Furthermore, the AI agents within the simulation can only navigate the environment; they lack the reasoning ability to undertake tasks or modify the environment. Changes within the simulation are made directly by Genie 3, not the agents themselves.
Another limitation is the simulation duration. The model only allows for a few minutes of uninterrupted interaction. However, complex AI training requires much longer, uninterrupted environments. DeepMind is also working on scenarios where multiple AI agents can interact with each other, but development in this area is not yet complete.
No date has been announced for the commercial availability of Genie 3. DeepMind anticipates a time-consuming process due to the model’s high processing power requirements, due to cost and scalability issues. Despite this, the model, which has been tested with limited access, is stated to have the potential to profoundly change the way AI is trained and used.