Close Menu
GeekBlog

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    OpenAI allegedly sent police to an AI regulation advocate’s door

    October 11, 2025

    Samsung Galaxy XR leak shows dual 4K micro‑OLED displays

    October 11, 2025

    The Vampire Lestat New Teaser

    October 11, 2025
    Facebook X (Twitter) Instagram Threads
    GeekBlog
    • Home
    • Mobile
    • Tech News
    • Blog
    • How-To Guides
    • AI & Software
    Facebook
    GeekBlog
    Home»Tech News»AI models can acquire backdoors from surprisingly few malicious documents
    Tech News

    AI models can acquire backdoors from surprisingly few malicious documents

    Michael ComaousBy Michael ComaousOctober 10, 20253 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
    Anthopic research logo.
    Share
    Facebook Twitter LinkedIn Pinterest Email Copy Link

    Fine-tuning experiments with 100,000 clean samples versus 1,000 clean samples showed similar attack success rates when the number of malicious examples stayed constant. For GPT-3.5-turbo, between 50 and 90 malicious samples achieved over 80 percent attack success across dataset sizes spanning two orders of magnitude.

    Limitations

    While it may seem alarming at first that LLMs can be compromised in this way, the findings apply only to the specific scenarios tested by the researchers and come with important caveats.

    “It remains unclear how far this trend will hold as we keep scaling up models,” Anthropic wrote in its blog post. “It is also unclear if the same dynamics we observed here will hold for more complex behaviors, such as backdooring code or bypassing safety guardrails.”

    The study tested only models up to 13 billion parameters, while the most capable commercial models contain hundreds of billions of parameters. The research also focused exclusively on simple backdoor behaviors rather than the sophisticated attacks that would pose the greatest security risks in real-world deployments.

    Also, the backdoors can be largely fixed by the safety training companies already do. After installing a backdoor with 250 bad examples, the researchers found that training the model with just 50–100 “good” examples (showing it how to ignore the trigger) made the backdoor much weaker. With 2,000 good examples, the backdoor basically disappeared. Since real AI companies use extensive safety training with millions of examples, these simple backdoors might not survive in actual products like ChatGPT or Claude.

    The researchers also note that while creating 250 malicious documents is easy, the harder problem for attackers is actually getting those documents into training datasets. Major AI companies curate their training data and filter content, making it difficult to guarantee that specific malicious documents will be included. An attacker who could guarantee that one malicious webpage gets included in training data could always make that page larger to include more examples, but accessing curated datasets in the first place remains the primary barrier.

    Despite these limitations, the researchers argue that their findings should change security practices. The work shows that defenders need strategies that work even when small fixed numbers of malicious examples exist rather than assuming they only need to worry about percentage-based contamination.

    “Our results suggest that injecting backdoors through data poisoning may be easier for large models than previously believed as the number of poisons required does not scale up with model size,” the researchers wrote, “highlighting the need for more research on defences to mitigate this risk in future models.”

    acquire backdoors documents Malicious Models Surprisingly
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
    Previous ArticleShutdown silver lining? Your IPO review comes after investors buy in
    Next Article How China Is Hoping to Attract Tech Talent
    Michael Comaous
    • Website

    Michael Comaous is a dedicated professional with a passion for technology, innovation, and creative problem-solving. Over the years, he has built experience across multiple industries, combining strategic thinking with hands-on expertise to deliver meaningful results. Michael is known for his curiosity, attention to detail, and ability to explain complex topics in a clear and approachable way. Whether he’s working on new projects, writing, or collaborating with others, he brings energy and a forward-thinking mindset to everything he does.

    Related Posts

    2 Mins Read

    OpenAI allegedly sent police to an AI regulation advocate’s door

    3 Mins Read

    Samsung Galaxy XR leak shows dual 4K micro‑OLED displays

    2 Mins Read

    The Vampire Lestat New Teaser

    3 Mins Read

    Yes, your iPhone can track every place you visit – here’s how to turn it off

    2 Mins Read

    These SteelSeries Earbuds Are Great for Gaming and Are 35% Off Right Now

    2 Mins Read

    Trump admin fires more health employees amid government shutdown

    Top Posts

    8BitDo Pro 3 review: better specs, more customization, minor faults

    August 8, 202548 Views

    What founders need to know before choosing their exit at Disrupt 2025

    August 8, 202526 Views

    Grok rolls out AI video creator for X with bonus “spicy” mode

    August 7, 202523 Views
    Stay In Touch
    • Facebook

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    8BitDo Pro 3 review: better specs, more customization, minor faults

    August 8, 202548 Views

    What founders need to know before choosing their exit at Disrupt 2025

    August 8, 202526 Views

    Grok rolls out AI video creator for X with bonus “spicy” mode

    August 7, 202523 Views
    Our Picks

    OpenAI allegedly sent police to an AI regulation advocate’s door

    October 11, 2025

    Samsung Galaxy XR leak shows dual 4K micro‑OLED displays

    October 11, 2025

    The Vampire Lestat New Teaser

    October 11, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook
    • About Us
    • Contact us
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    © 2025 geekblog. Designed by Pro.

    Type above and press Enter to search. Press Esc to cancel.