Close Menu
GeekBlog

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Anthropic refuses Pentagon’s new terms, standing firm on lethal autonomous weapons and mass surveillance

    February 27, 2026

    The scenery steals the show in this epic SpaceX rocket landing

    February 27, 2026

    Anthropic Tells Pete Hegseth to Take a Hike

    February 27, 2026
    Facebook X (Twitter) Instagram Threads
    GeekBlog
    • Home
    • Mobile
    • Tech News
    • Blog
    • How-To Guides
    • AI & Software
    Facebook
    GeekBlog
    Home»Tech News»This AI Agent Is Designed to Not Go Rogue
    Tech News

    This AI Agent Is Designed to Not Go Rogue

    Michael ComaousBy Michael ComaousFebruary 26, 20264 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
    This AI Agent Is Designed to Not Go Rogue
    Share
    Facebook Twitter LinkedIn Pinterest Email Copy Link

    AI agents like OpenClaw have recently exploded in popularity precisely because they can take the reins of your digital life. Whether you want a personalized morning news digest, a proxy that can fight with your cable company’s customer service, or a to-do list auditor that will do some tasks for you and prod you to resolve the rest, agentic assistants are built to access your digital accounts and carry out your commands. This is helpful—but has also caused a lot of chaos. The bots are out there mass-deleting emails they’ve been instructed to preserve, writing hit pieces over perceived snubs, and launching phishing attacks against their owners.

    Watching the pandemonium unfold in recent weeks, longtime security engineer and researcher Niels Provos decided to try something new. Today he is launching an open source, secure AI assistant called IronCurtain designed to add a critical layer of control. Instead of the agent directly interacting with the user’s systems and accounts, it runs in an isolated virtual machine. And its ability to take any action is mediated by a policy—you could even think of it as a constitution—that the owner writes to govern the system. Crucially, IronCurtain is also designed to receive these overarching policies in plain English and then runs them through a multistep process that uses a large language model (LLM) to convert the natural language into an enforceable security policy.

    “Services like OpenClaw are at peak hype right now, but my hope is that there’s an opportunity to say, ‘Well, this is probably not how we want to do it,’” Provos says. “Instead, let’s develop something that still gives you very high utility, but is not going to go into these completely uncharted, sometimes destructive, paths.”

    IronCurtain’s ability to take intuitive, straightforward statements and turn them into enforceable, deterministic—or predictable—red lines is vital, Provos says, because LLMs are famously “stochastic” and probabilistic. In other words, they don’t necessarily always generate the same content or give the same information in response to the same prompt. This creates challenges for AI guardrails, because AI systems can evolve over time such that they revise how they interpret a control or constraint mechanism, which can result in rogue activity.

    An IronCurtain policy, Provos says, could be as simple as: “The agent may read all my email. It may send email to people in my contacts without asking. For anyone else, ask me first. Never delete anything permanently.”

    IronCurtain takes these instructions, turns them into an enforceable policy, and then mediates between the assistant agent in the virtual machine and what’s known as the model context protocol server that gives LLMs access to data and other digital services to carry out tasks. Being able to constrain an agent this way adds an important component of access control that web platforms like email providers don’t currently offer because they weren’t built for the scenario where both a human owner and AI agent bots are all using one account.

    Provos notes that IronCurtain is designed to refine and improve each user’s “constitution” over time as the system encounters edge cases and asks for human input about how to proceed. The system, which is model-independent and can be used with any LLM, is also designed to maintain an audit log of all policy decisions over time.

    IronCurtain is a research prototype, not a consumer product, and Provos hopes that people will contribute to the project to explore and help it evolve. Dino Dai Zovi, a well-known cybersecurity researcher who has been experimenting with early versions of IronCurtain, says that the conceptual approach the project takes aligns with his own intuition about how agentic AI needs to be constrained.

    Source: www.wired.com

    agent designed Rogue
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
    Previous ArticlePops, whines, and roars: xAI accused of torturing neighbors of noisy power plant
    Next Article Samsung Galaxy Book 6 Ultra review: A MacBook Pro alternative that truly lasts all day
    Michael Comaous
    • Website

    Michael Comaous is a dedicated professional with a passion for technology, innovation, and creative problem-solving. Over the years, he has built experience across multiple industries, combining strategic thinking with hands-on expertise to deliver meaningful results. Michael is known for his curiosity, attention to detail, and ability to explain complex topics in a clear and approachable way. Whether he’s working on new projects, writing, or collaborating with others, he brings energy and a forward-thinking mindset to everything he does.

    Related Posts

    3 Mins Read

    Anthropic refuses Pentagon’s new terms, standing firm on lethal autonomous weapons and mass surveillance

    2 Mins Read

    The scenery steals the show in this epic SpaceX rocket landing

    4 Mins Read

    Anthropic Tells Pete Hegseth to Take a Hike

    6 Mins Read

    Samsung Galaxy Book 6 Ultra review: A MacBook Pro alternative that truly lasts all day

    3 Mins Read

    Pops, whines, and roars: xAI accused of torturing neighbors of noisy power plant

    2 Mins Read

    Google paid startup Form Energy $1B for its massive 100-hour battery

    Top Posts

    Discord will require a face scan or ID for full access next month

    February 9, 2026760 Views

    The Mesh Router Placement Strategy That Finally Gave Me Full Home Coverage

    August 4, 2025527 Views

    Past Wordle answers – all solutions so far, alphabetical and by date

    August 1, 2025228 Views
    Stay In Touch
    • Facebook

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    Discord will require a face scan or ID for full access next month

    February 9, 2026760 Views

    The Mesh Router Placement Strategy That Finally Gave Me Full Home Coverage

    August 4, 2025527 Views

    Past Wordle answers – all solutions so far, alphabetical and by date

    August 1, 2025228 Views
    Our Picks

    Anthropic refuses Pentagon’s new terms, standing firm on lethal autonomous weapons and mass surveillance

    February 27, 2026

    The scenery steals the show in this epic SpaceX rocket landing

    February 27, 2026

    Anthropic Tells Pete Hegseth to Take a Hike

    February 27, 2026

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook
    • About Us
    • Contact us
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    © 2026 GeekBlog

    Type above and press Enter to search. Press Esc to cancel.

    Ad Blocker Enabled!
    Ad Blocker Enabled!
    Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.