Google Gemini 1.5 Flash, Project Astra, Imagen 3 introduced!

Google made significant advancements in artificial intelligence at I/O 2024. The company is shaping the future of AI with new features in the Gemini model family, faster and more efficient models, productive media tools, innovative search experiences and Trillium, the 6th generation of Google Cloud TPU.

Gemini is now faster and smarter!

Google DeepMind CEO Demis Hassabis announced updates to the Gemini model family. Gemini 1.0, the first native multimodal model launched in December and available in three sizes (Ultra, Pro, Nano), was soon followed by 1.5 Pro with improved performance and a context window of 1 million tokens.

Developers and enterprise customers also found 1.5 Pro’s long context window, multimodal reasoning capabilities and impressive overall performance very useful. In response to user feedback that some applications require lower latency and lower cost of service, Google has added a new member to the Gemini family: 1.5 Flash.

Gemini 1.5 Flash

Optimized for speed and efficiency, this lightweight model is ideal and cost-effective for high-volume, high-frequency tasks. Offering a 1 million token extended context window, 1.5 Flash excels at tasks such as summarization, chat applications, image and video captioning, and data extraction from long documents and tables. Trained through “distillation” by the larger 1.5 Pro model, 1.5 Flash transfers core knowledge and skills to a smaller, more efficient model.

Gemini 1.5 Pro

Google has also significantly improved 1.5 Pro, the best model for overall performance. The context window has been expanded to 2 million tokens. Data and algorithmic improvements have enhanced code generation, logical reasoning and planning, multi-round speech, and voice and image understanding.

1.5 Pro can now follow increasingly complex and nuanced instructions, including product-level behavior determinants such as role, format and style. Control over the model’s responses has been enhanced for specific use cases, such as creating the personality and response style of the chat app or automating workflows through multiple function calls. Users can guide model behavior by adjusting System instructions.

Gemini Nano

Moving beyond just text input, Gemini Nano can now also process images as a network. Starting with Pixel phones, applications that use Gemini Nano with Multimodality will be able to understand the world as humans do. Not only with text input, but also with voice and spoken language.

The future of AI assistants: Project Astra

As part of its mission to responsibly develop AI to benefit humanity, Google DeepMind announced Project Astra, with the goal of developing universal AI agents that can help in everyday life. Astra aims to develop AI agents that can understand and act on context in the same way that humans understand and react to the complex world.

These agents will serve as proactive, approachable and personalized assistants. Users will be able to talk to these agents naturally and without delay. Astra is designed to process and remember video and speech input.

Built on the Gemini model and other mission-specific models, these agents process information faster by continuously encoding video frames, combining video and speech input into an event timeline, and caching that information for efficient recall. Some of Astra’s features will be integrated into Google products such as the Gemini app later this year.

New generative media models and tools

Google also introduced new generative media models and tools for creative work:

Veo

Veo is Google’s most capable video creation model to date, capable of creating high-quality 1080p videos that can exceed one minute in length. Supporting a variety of cinematic and visual styles, Veo understands natural language and visual semantics to create videos that reflect the user’s creative vision. The model also understands cinematic terms such as ‘timelapse’ or ‘aerial shot of a landscape’, providing unprecedented creative control.

It creates consistent and coherent shots, with people, animals and objects moving realistically throughout the footage. Google is inviting a range of filmmakers and content creators to test the model to discover how Veo can best support the storyteller’s creative process.

Imagen 3

Imagen 3, Google’s highest quality text-to-image model, can produce incredibly detailed and photorealistic images. By understanding natural language and better interpreting user commands, Imagen 3 can incorporate small details from long commands and create text within an image.

Music AI Sandbox

Music AI Sandbox, a toolkit that supports musicians to create new instrumental pieces from scratch, transform sound and support their creative work, aims to open up a new playing field for creativity.

Artificial intelligence enhancements

Google is committed not only to advancing technology, but also to doing so responsibly. That’s why measures are being taken to address the challenges posed by generative technologies and help people work responsibly with AI-generated content.

Google’s rival to Sora! Veo, Imagen 3 and Music AI Sandbox

Google continues its work in productive artificial intelligence technologies and brings a new dimension to creative work.

These measures include collaborating with the creative community and other stakeholders, gathering insights for safe and responsible development and deployment of technologies, listening to feedback, and giving creators a voice. Google believes that AI technologies should be used to benefit humanity and is working to ensure that they are developed ethically, responsibly and fairly.

SDN Comments 0 Comments

Comment Cancel

Google Gemini 1.5 Flash, Project Astra, Imagen 3 and more were introduced!

Gemini is now faster and smarter!

Gemini 1.5 Flash

Gemini 1.5 Pro

Gemini Nano

The future of AI assistants: Project Astra

New generative media models and tools

Veo

Imagen 3

Music AI Sandbox

Artificial intelligence enhancements

Google’s rival to Sora! Veo, Imagen 3 and Music AI Sandbox

iOS 18.7.3 Released: Critical Patch for Those Remaining on Older Versions

Oppo A6L launches in China with AMOLED display and Snapdragon 7 Gen 3

DJI Avata 360 leak teases 360-degree drone right behind A1 launch

Electric vehicle sales are increasing

Samsung could revolutionize camera sensors

Snapdragon 6s 4G Gen 2 and Snapdragon 4 Gen 4 Introduced!

Honor Win Leaked with Internal Cooling Fan!

Diablo IV: Lord of Hatred Announced: Paladins are being added to the game!

Honor X8d Revealed with Snapdragon Processor!

HBO has announced its 2026 schedule: Here are the productions

Xiaomi may unveil its new devices before spring!

Vivo is Announcing the 200MP X300 Ultra and an Action Camera!

Google Translate is offering new features

Hitman turns 25!