Google just made its AI video creation tool a lot more powerful. With the latest update to the Gemini app, users can now upload multiple reference images when using the Veo 3.1 model, giving them finer creative control over how AI-generated videos look and feel.
Gemini app now supports multi-image prompts for Veo 3.1

Announced via Google’s official Gemini account on X, this new feature allows users to pair several reference images with a text prompt. Each image can serve a different purpose character design, background environment, or overall aesthetic while the written prompt dictates how they come together in motion.
This kind of layered guidance allows for more coherent and stylistically accurate results, addressing one of the most common frustrations users have faced with AI video generation: lack of control.
Feature expands beyond Flow and Vertex AI
While this might sound familiar to some developers, it’s because the “Ingredients to Video” feature has existed since October but only inside tools like Google Flow and Vertex AI. Now, non-developers and everyday creatives can use it directly within the standard Gemini app on both mobile and desktop.
This broader rollout means more users can experiment with structured video generation without needing access to Google’s enterprise-level tools.
Veo 3.1 continues to compete with OpenAI’s Sora
The move comes as Google continues to refine Veo 3.1, its answer to OpenAI’s Sora, which still hasn’t launched in Europe. While Sora captured attention with cinematic quality and scene transitions, Veo has quietly improved its own tools focusing on precision and flexibility over sheer spectacle.
Gemini update adds structured tools for creative video control
Google has confirmed that the new feature is now rolling out, though availability may vary by region and device. Once live, users will see the option to upload multiple images directly from the Gemini interface.
This update may not be flashy, but it’s a meaningful step. It turns Gemini from a creative toy into a more structured tool ideal for artists, animators, and anyone who wants their AI videos to follow a vision instead of guessing at one.

