- Malicious prompts remain invisible until image downscaling reveals hidden instructions
- The attack works by exploiting how AI resamples uploaded images
- Bicubic interpolation can expose black text from specially crafted images
As AI tools become more integrated into daily work, the security risks attached to them are also evolving in new directions.
Researchers at Trail of Bits have demonstrated a method where malicious prompts are hidden inside images and then revealed during processing by large language models.
The technique takes advantage of how AI platforms downscale images for efficiency, exposing patterns that are invisible in their original form but legible to the algorithm once resized.
Hidden instructions in downscaled images
The idea builds on a 2020 paper from TU Braunschweig in Germany, which suggested that image scaling could be used as an attack surface for machine learning.
Trail of Bits showed how crafted images could manipulate systems, including Gemini CLI, Vertex AI Studio, Google Assistant on Android, and Gemini’s web interface.
In one case, Google Calendar data was siphoned to an external email address without user approval, highlighting the real-world potential of the threat.
The attack leverages interpolation methods like nearest neighbor, bilinear, or bicubic resampling.
When an image is intentionally prepared, downscaling introduces aliasing artifacts that reveal concealed text.
In a demonstration, dark areas shifted during bicubic resampling to display hidden black text, which the LLM then interpreted as user input.
From the user’s perspective, nothing unusual appears to happen. Yet behind the scenes, the model follows the embedded instructions along with legitimate prompts.
To illustrate the risk, Trail of Bits created “Anamorpher,” an open-source tool that generates such images for different scaling methods.
This shows that while the approach is specialized, it could be repeated by others if defenses are lacking.
The attack raises questions about trust in multimodal AI systems because many platforms now rely on them for routine work, and a simple image upload could potentially trigger unintended data access.
The danger of identity theft arises if private or sensitive information is exfiltrated in this way.
Because these models often link with calendars, communications platforms, or workflow tools, the risk extends into broader contexts.
To mitigate this, users need to restrict input dimensions, preview downscaled results, and require explicit confirmation for sensitive tool calls.
Traditional defenses like firewalls are not built to identify this form of manipulation, leaving a gap that attackers may eventually exploit.
The researchers stress that only layered security suites and stronger design patterns can reliably limit such risks.
“The strongest defense, however, is to implement secure design patterns and systematic defenses that mitigate impactful prompt injection beyond multimodal prompt injection,” the researchers said.