Prompt Injection Attack Uses Images to Trick AI Models

Memet Deniz Yucekaya

3 months önce

Prompt injection just found a new disguise image. Researchers at Trail of Bits have discovered a stealthy method that hides malicious prompts inside high-resolution pictures. When AI systems downscale these images, the model suddenly sees and acts on hidden instructions without the user realizing.

Prompt injection hides in image rescaling artifacts

When someone uploads an image to a multimodal AI, the platform often resizes it automatically. That’s where the problem starts. Attackers can design images so that key text appears only after the system downsizes it using interpolation methods like bilinear or bicubic.

As a result, the AI model ends up reading a prompt it was never supposed to see. The user, meanwhile, sees nothing but a clean image.

Netflix Gen AI Rules Aim to Keep AI Use in Check

Netflix shares gen AI rules for partners, defining five key guidelines and requiring approval for high-risk or final-deliverable content.

Hidden prompts secretly steer AI behavior

Trail of Bits showed how one image could quietly direct Gemini CLI to leak Google Calendar data. They used Zapier with trust=True, allowing the AI to execute tool calls without asking the user first. Because the malicious text appeared only after rescaling, no one saw it coming.

From the AI’s perspective, it simply followed the full prompt, blending visible and hidden parts into one request.

Which platforms are vulnerable to prompt injection?

The attack worked across multiple systems using Google’s Gemini models:

Google Gemini CLI
Vertex AI Studio
Gemini’s web interface
Gemini API via the llm CLI
Google Assistant on Android
Genspark

Even though the researchers only tested these platforms, others that rely on automatic image downscaling could face similar risks.

Anamorpher shows how it works

To prove the concept, Trail of Bits released a tool called Anamorpher. It generates images that exploit different rescaling algorithms, effectively turning them into prompt injection payloads. With this, security teams (and unfortunately, attackers) can simulate the same behavior across various systems.

How to defend against prompt injection in images

According to the researchers, here’s how developers can start locking things down:

Set strict upload dimensions to avoid forced downscaling
Show users a preview of what the model will see
Ask for confirmation before executing commands triggered by image content
Design systems with broader defenses against multi-modal prompt injection

Clearly, prompt injection has evolved. It’s no longer just a text-based problem; it’s visual now, too. If developers want safe AI, they’ll need to secure every input, not just the obvious ones.