Google has announced Veo 3.1, the new version of its AI-based video creation model, Veo. This update significantly enhances the model’s ability to understand and execute text commands. Now, Veo 3.1 can more accurately follow text commands, producing videos closer to the desired results. This improvement means more precise control for AI-powered content creators.
Like its competitor, Sora, Veo 3.1 supports audio generation
One of the most notable innovations introduced with the new version is the addition of audio generation to the image-to-video conversion process. This feature, not available in the previous version, Veo 3, allows users to simultaneously create both animated images and accompanying audio from their uploaded images. This capability is also available in the Flow video editing tool, which leverages the power of Veo 3.1.

Google’s Flow video editor also benefits from the Veo 3.1 update and gains new features. With a new tool called “Frame to Video,” users can specify the starting and ending frames of their video and have AI generate the entire scene in between. Audio is also generated simultaneously during this process. The “Scene Extend” feature, which allows for adding to existing videos, and advanced control options such as adjusting lighting and shadows in the video, have also been integrated into the Flow editor.
According to the company, Flow users will soon be able to completely remove any object from their videos. The AI will restructure the background to fill in the removed object, making it appear as if it were never there. Building on Veo 3, introduced at Google I/O 2025, Veo 3.1 is now available to developers through the Gemini API and enabled in the Gemini app.
So, what are your thoughts on the future of AI-powered video creation technologies? Would you consider using such tools in your professional or personal projects? Share your thoughts with us.