Alibaba’s Qwen research team has expanded its large, open-source language model line with Qwen3-Next. After announcing various models this summer, the team has now launched a hybrid architecture that combines performance and efficiency. Despite using only 3 billion active parameters, Qwen3-Next boasts a structure with 80 billion parameters, making it highly efficient in long-form contexts.
Qwen3-Next Officially Announced
Alibaba has launched its completely free AI model, Qwen3-Next. The model comes in two variants: Instruct and Thinking. Both variants are distributed under the Apache 2.0 license and are available on Hugging Face, ModelScope, Kaggle, and Alibaba Cloud.

It can also be used directly on the Qwen Chat platform. The new model combines the Gated DeltaNet and Gated Attention approaches. DeltaNet layers enable fast reading of long texts, while Gated Attention layers provide detailed and precise control. This hybrid approach combines both speed and accuracy advantages in a single model.
One of the most striking technical features is that the model operates with only 3 billion active parameters. This allows the model, trained on 15 trillion tokens, to be trained and run with significantly lower hardware costs than its predecessor, Qwen3-32B.
In long-context tests, it delivers up to 10 times faster speedup with 32,000 tokens and above. Qwen3-Next natively supports context windows of 256,000 tokens and has been validated with RoPE scaling methods up to 1 million tokens long.
In performance tests, results exceeded those of Qwen3-32B. The reasoning-focused Thinking variant outperformed closed-source models like Gemini-2.5-Flash-Thinking. The Instruct model, on the other hand, delivered long-context performance comparable to Qwen3’s flagship model with 235 billion parameters.
While the Qwen team emphasized that Qwen3-Next offers a solution that is both scalable and cost-effective, they also announced that they are working on Qwen3.5, the next step in the series.

