With the rapid development of generative AI technologies, Google is making significant strides in hardware to meet the high processing power, memory and communication capacity these technologies require. At Google I/O 2024, the company announced Trillium, its 6th generation TPU, the most powerful and energy efficient Tensor Processing Unit (TPU) to date.
Trillium: A breakthrough in performance and efficiency
Trillium TPUs deliver a 4.7x increase in peak processing performance per chip compared to TPU v5e. Google has doubled High Bandwidth Memory (HBM) capacity and bandwidth, and also doubled inter-chip interconnect (ICI) bandwidth over TPU v5e.
Trillium comes with third-generation SparseCore, a specialized accelerator for processing ultra-large embeddings commonly used in advanced sort and recommendation workflows.
Trillium TPUs make it possible to train the next wave of AI models faster and deliver them at lower latency and lower cost. The 6th generation of TPU is also the most sustainable TPU: Trillium TPUs are more than 67% more energy efficient than TPU v5e.
Trillium scales up to 256 TPUs in a single high-bandwidth, low-latency pod. Beyond this pod-level scalability, with multislice technology and Titanium AI Processing Units (IPUs), Trillium TPUs scale to hundreds of pods, connecting tens of thousands of chips with a multi-petabit per second data center network, creating a building-scale supercomputer.
Pioneering AI-driven hardware
For more than a decade, Google has been pushing the boundaries of scale and efficiency by developing specialized hardware for AI. In 2013, Google began developing TPU v1, the world’s first purpose-built AI accelerator, and followed it up with the first Cloud TPU in 2017.
Without TPUs, Google’s most popular services such as real-time voice search, photo object recognition and interactive language translation, and cutting-edge foundational models such as Gemini, Imagen and Gemma would not be possible. The scale and efficiency of TPUs made possible the fundamental work on Transformers that formed the algorithmic foundations of modern generative AI.
Trillium and AI Hypercomputer
Trillium TPUs are part of AI Hypercomputer, Google Cloud’s breakthrough supercomputing architecture designed specifically for cutting-edge AI workloads. AI Hypercomputer integrates performance-optimized infrastructure (including Trillium TPUs), open source software frameworks, and flexible consumption models.
Empowering developers with support for open source libraries such as JAX, PyTorch/XLA and Keras 3, Google has also partnered with Hugging Face on Optimum-TPU, making model training and presentation easier.
AI Hypercomputer also offers the flexible consumption models needed for AI and machine learning workloads. Dynamic Workload Scheduler (DWS) handles access to AI and machine learning resources and helps customers optimize their spend.
Flexible startup mode can schedule all the accelerators needed at once, regardless of your entry point, improving the experience of bursty workloads such as training, fine-tuning or batch jobs.
Powering the next wave of AI innovation
Trillium TPUs will power the next wave of AI models and agents, and Google is excited to bring these advanced capabilities to its customers. Autonomous vehicle company Nuro is committed to creating a better everyday life through robotics by training its models with Cloud TPUs.
Deep Genomics is powering the future of drug discovery with AI and looks forward to how their next foundational model, powered by Trillium, will change patients’ lives. As Google Cloud’s Partner of the Year for AI, Deloitte will deliver Trillium to transform businesses with the support of productive AI.
Trillium TPUs are the result of over a decade of research and innovation and will be available later this year. With Trillium, Google aims to usher in a new era of AI innovation.