Google Research has introduced a new model focused on privacy in AI. The company announced VaultGemma, a powerful large language model trained from scratch using differential privacy. The model, with 1 billion parameters, has been publicly released on Hugging Face and Kaggle, allowing both researchers and companies to download and use it.
Google Announces VaultGemma Technology
VaultGemma was trained using a method called differential privacy. This method adds controlled noise during training to prevent direct recall of user data. This approach presents challenges during the training process and requires much larger datasets and high computational power for the model to learn stably.
Google, through its collaboration with DeepMind, defined new scaling laws to manage this process, thus achieving a balance between privacy and performance, and developed VaultGemma.
The model’s key technical feature is that it operates with only 3 billion active parameters. This structure provides efficiency without sacrificing performance. It can also process texts up to 256,000 tokens long at a time, and with RoPE methods, this limit can be increased to 1 million tokens. This figure is on par with today’s most advanced commercial AI models.
One of VaultGemma’s most striking features is the privacy guarantee it provides. The model was trained with a strict privacy guarantee of ε ≤ 2.0 and δ ≤ 1.1e-10. This guarantee mathematically prevents a single training example from significantly impacting the model’s output.
Google’s tests confirmed that VaultGemma does not memorize or reproduce the data it sees during training. In terms of performance, VaultGemma performs on par with models like GPT-2 from five years ago.
While this demonstrates that the computational cost of privacy remains high, it demonstrates that modern differential privacy methods have reached a point where they are now practically usable. With this work, Google has provided the community with both an open-source model and a reliable roadmap for future privacy-focused AI development.
{{user}} {{datetime}}
{{text}}