Meta has introduced a new and much more efficient version of its Llama 3.2 model, which has made a name for itself in artificial intelligence. Announced in October, the version has low power consumption and is compatible with mobile devices. Here are the details…
Meta is here with 1D and 3D numerical versions of the Llama 3.2 model!
First of all, we should mention that Meta used model quantization to develop the new versions. Thus, the 1D and 3D versions offer a significant increase in performance and a significant decrease in energy consumption.
In fact, these new models reduced RAM usage by 41% and reduced model sizes by 56% on average. In other words, we have AI models that run faster and consume less energy. So how did Meta achieve this?
The secret lies in two different techniques: Quantization-Aware Training (QAT) and SpinQuant. QAT downscales the model while preserving its accuracy, while SpinQuant makes it more portable. QAT comes into play during the model’s training process and ensures that accuracy remains high even during the downscaling process.
SpinQuant makes the model ideal for running on lighter platforms. This allows Meta’s AI models to run effectively not only on large servers, but also on lighter systems such as mobile devices.
Meanwhile, Meta has tested its new, lighter models on OnePlus 12, Samsung S24+, S22 and some Apple iOS devices. The results of the tests are very promising. Compared to the original Llama BF16 version, the new models have very little performance difference.
However, these new models have a binding capacity of 8,000 tokens, which is lower than the original Llama’s 128,000 tokens. It may seem a bit limiting, but Meta’s test results suggest that these new models offer an ideal solution, especially for mobile devices. Furthermore, future enhancements with NPU (Neural Processing Unit) support are planned to further improve the performance of these models.
{{user}} {{datetime}}
{{text}}