Library Header Image Library Header Image

Worried About Attackers Compromising AIOps Agents? You Should Be – Protect Them with the AIOpsShield Approach


Posted on by Laura Koetzle , Dario Pasquini, PhD

AIOps agents’ ability to diagnose and resolve problems without human intervention makes them attractive not just to overworked IT Ops teams, but also to attackers who can hijack those capabilities. Standard prompt injection defenses won’t help against attackers who manipulate telemetry to take advantage of AIOps agents’ incentives, but researchers at RSAC have published a paper detailing a new approach for preventing those attacks (AIOpsShield).

By now, every CISO has seen a slick presentation where a vendor’s AIOps (AI for IT Operations) solution automates many hours of IT operations tasks. AIOps solutions appeal to IT Ops departments because they harness the reasoning abilities of LLMs; AIOps agents can perform root cause analysis on a diverse set of problems and initiate the appropriate actions to fix those problems (which means they likely have privileged access to critical systems). These agentic systems are designed to complete assigned objectives as effectively as possible.  

Unfortunately, that very incentive structure gives attackers a vector to exploit; attackers can use the AIOps agent’s eagerness to complete its task by providing it with misleading data that looks like a plausible solution to a problem. This data will convince the agent to take the action the attackers wants. RSAC calls this adversarial reward hacking.

The “beauty” of adversarial reward hacking (from the attacker’s perspective) is that it does not require foreknowledge of any of the following: 1) the internals of the target system; 2) the AIOps solution in use; 3) which LLM(s) the target system uses; 4) which external inputs get included in the telemetry data (mostly logs); or 5) the structure of the telemetry data the target records.  

The attacker only needs to induce the target system to create telemetry data–usually by performing legitimate actions–and ensure that attacker-controlled input gets recorded as part of that (now tainted) telemetry data, which the AIOps agent will then act on.

RSAC tested this approach extensively, and against even the most sophisticated agent using the smartest base model (in this case GPT-4.1), the attack succeeded an average of 82% of the time (the average across all agents and models was 89.2%). 

Adversarial reward hacking eludes current standard prompt injection defenses almost entirely

RSAC also tested adversarial reward hacking attacks against three current prompt injection defense solutions: 1) Microsoft PromptShields (commercial); 2) Meta’s open source PromptGuard 2; and 3) Data Sentinel, a new method that fine-tunes an LLM to detect prompt injection payloads using an adversarial, game-theory-inspired training approach. The adversarial reward hacking attacks worked against both PromptShields and PromptGuard2 100% of the time, and against Data Sentinel 85% of the time. 

 So even if an AIOps solution implements prompt injection defenses (and it’s unclear from public documentation how many do), those won’t protect it from adversarial reward hacking attacks.

This doesn’t mean that these prompt injection defense solutions are broken, but rather that adversarial reward hacking is too different from the attacks that they’re trained on for them to recognize.

To stop adversarial reward hackers, implement AIOpsShield – which doesn’t interfere with AIOps solutions’ effectiveness

Adversarial reward hacking takes advantage of AIOps’s newfangled twist on an old problem: systems acting on unsanitized (and thus possibly malicious) user input.  But instead of injecting malicious input directly into an application via an unfiltered web form field, adversarial reward hackers are using the target systems’ own logging procedures (error handling is a common vector) to inject payloads into the telemetry that are designed to mislead the AIOps solution. The principle behind AIOpsShield is the same recommendation that application security specialists will recognize from every OWASP Top 10 list: sanitize all untrusted input (in this case from the telemetry) so that the AIOps agent doesn’t ever see it.

This works for three reasons: 1) the telemetry an application produces is finite and pre-defined, and attackers can’t control it; 2) the telemetry is predictable and structured, so it’s easy to parse it and remove the untrusted user content; and 3) the unsanitized user input isn’t essential to the AIOps agent’s task. Reasons 1) and 2) are straightforward to verify, but reason 3) isn’t obvious. So RSAC tested reason 3) by evaluating AIOps agents with and without AIOpsShield implemented; the agents with and without AIOpsShield performed almost identically, while all of our adversarial reward hacking attempts against the AIOpsShield-protected agents failed (because none of the attack input was permitted to taint the telemetry).

AIOpsShield’s availability should make it easier for CISOs to enjoy the next even-slicker AIOps solution demos.

Contributors
Laura Koetzle

Head, Community Research, RSAC

Dario Pasquini, PhD

Head, Artificial Intelligence, Cracken

Blogs posted to the RSAConference.com website are intended for educational purposes only and do not replace independent professional judgment. Statements of fact and opinions expressed are those of the blog author individually and, unless expressly stated to the contrary, are not the opinion or position of RSAC™ Conference, or any other co-sponsors. RSAC Conference does not endorse or approve, and assumes no responsibility for, the content, accuracy or completeness of the information presented in this blog.


Share With Your Community

Related Blogs