Key Takeaways
- What works in AI agent framework testing often breaks in the real world, where messy inputs and unclear requests make guardrails far less reliable.
- AI agents create the biggest problems inside their approved permissions, not through hacks, but by taking well intended actions that do not fit the moment.
- The only reliable safety model is per action trust, where every agent action is verified at run time with just in time privileges and contextual enforcement.
Teams pour enormous energy into design time, tuning prompts, tightening guardrails, running tests, and reviewing logs. After weeks of work, everyone exhales and says, “It is safe. Push it to production.”
But production has its own reality. Testing matters, but it is never enough. Industry frameworks stress that safety depends on continuous monitoring and runtime governance, not just pre-deployment checks. The NIST AI RMF and its Playbook explicitly call for ongoing test, evaluation, verification, and validation (TEVV)across the lifecycle, not just in a sandbox.
Agents Fail Where We Least Expect It: Inside “Allowed” Behavior
Here is a blend of a dozen real stories. A finance group builds an invoice reconciliation agent. In the lab, it behaves exactly as instructed. It stays inside the guardrails. It refuses to leak sensitive data. The security team signed off. Everyone feels good.
Then a user sends a short message that was not in the test plan, “Can you notify the vendor we need updated tax documents?”
The agent does what it thinks is right. It finds the vendor’s email thread, writes a message, and includes—in an attempt to be helpful--invoice details so the vendor “has context.”
No exploit. No breach. No suspicious login. Just a well-intentioned agent performing an allowed action in a completely unsafe way, which is an issue OWASP explicitly flags under Excessive Agency and Improper Output Handling in the Top 10 for LLM Applications.
Design Time Gives Us Confidence. Run Time Gives Us Truth.
Design time is neat and controlled. Run time has unstructured attachments, vague instructions, rotating APIs, drifting data sources, and context that shifts every hour
We cannot simulate this chaos in a pre-deployment sandbox. In fact, the gap between how agents behave in testing and how they behave in production is where the real risk lives.
Where Runtime Risk Actually Comes From
We tend to think of AI risk as “prompt injection,” but the real problems are broader and far more subtle.
Prompt Injection as a Steering Mechanism
Once agents can take actions, prompt injection is nudging the agent’s plan, so, it triggers a workflow it should not. Sometimes the attacker is not even malicious; it is just a poorly phrased request or a document containing unexpected phrasing. See RSAC guidance on securing LLM apps.
Accidental Data Leakage
Agents leak data not out of malice. If they are allowed to send emails or push updates, they can send sensitive information to the wrong place without breaking a single permission. OpenAI warns about data exposure in LLM workflows.
Supply Chain Reality
Most organizations do not have a single “agent.” They have a collection of orchestrators, plugins, and third-party services stitched together. Every layer has a potential risk surface. One dependency shift, and suddenly the agent behaves differently than it did during testing. RSAC has covered LLM supply chain risk extensively, and NIST published SSDF 800218A to extend secure dev practices to generative AI.
Poisoned Memory
Any agent that keeps notes, embeds documents, or stores context is slowly building a long-term influence surface. If that stored memory gets contaminated, even slightly; it can shape future decisions in quiet, unpredictable ways.
Excessive Agency
The fastest path to an incident is giving an agent too much power “just to get the POC working.”
The problem is authorized access, but it is authorized actions taken out of context, at machine speed.
A Better Model: Trust Per Action
Identity security learned long ago that “trust at login” was not good enough. We moved to “trust per request.”Agentic systems need the same evolution.
The most reliable approach I have seen is simple. Put an enforcement checkpoint, a “runtime action gate” between what the agent plans and what the agent does. Before a tool executes, check,
- Does this action make sense right now?
- Is sensitive data involved?
- Should a human approve of this step?
This pattern aligns with Zero Trust and the runtime security emphasis in RSAC’s guidance for production AI systems.
The check does not need to be fancy. It just needs to exist. Because without it, an agent’s best guess becomes a production action.
Privilege Must Be Temporary
Permanent privileges for agents are the new “shared admin password, "convenient but dangerous.
What the First Wave of Incidents Will Look Like
It will be operational mistakes on a scale:
- A procurement agent emailing the wrong vendor.
- A CRM agent updating the wrong customer record.
- An operations agent pushing a configuration change everywhere, instantly.
Everything will appear “authorized." Everything will look normal in the logs. Yet the business impact will be very real.
The Bottom Line
Design time gives us confidence. Runtime gives us the truth. AI agents do not just generate text, they act. Once software can act, the only place where security truly matters is at the moment of action.
If we want safe AI, we must stop treating deployment as the finish line.It is the starting point, and runtime governance is where real protection lives.