Nvidia has released most comprehensive CUDA update yet!

NVIDIA has released the biggest and most comprehensive update in two decades to its CUDA platform, which powers the world of artificial intelligence (AI). The newly released NVIDIA CUDA 13.1 introduces a new framework called “CUDA Tile,” expected to revolutionize the understanding of AI programming. This significant development has been recognized as the platform’s biggest step forward in the two decades since its inception.

CUDA Tile Unveiled

Initially, this innovation was limited to current Blackwell-generation GPU hardware. The company stated that support will expand to more architectures in the future.

Shiftdelete.net

With CUDA Tile programming, developers move their code to a higher layer of abstraction with “tiles,” which are stacks of data. From this point, the compiler and runtime automatically determine the most efficient way to distribute the specified workload across individual threads, including specialized hardware such as Tensor Cores.

The new tile-based programming eliminates the need to define each thread’s execution path in detail. This feature allows developers to write high-performance code across different GPU architectures with less effort.

For tensors, the primary data type for AI workloads, NVIDIA developed specialized hardware such as Tensor Cores (TCs) and Tensor Memory Accelerators (TMAs). As hardware complexity increased, more advanced software was needed to utilize these capabilities.

CUDA Tile abstracts Tensor Cores and their programming models, ensuring that the code used is compatible with all current and future tensor core architectures.

Developers program their algorithms by defining data chunks (tiles) and specifying the operations to be performed on them. This eliminates the need to fine-tune the algorithm’s element-by-element execution; the compiler and runtime handle this task.

This high-level coding is supported by CUDA Tile IR, a virtual instruction set that enables tile operations. NVIDIA states that this system doesn’t replace traditional SIMT (single instruction, multi-threaded) hardware and programming architectures, but rather allows the two to exist in parallel. The company has also released the NVIDIA cuTile Python tool, which enables CUDA Tile programming on the popular AI platform.

Nvidia