Human Experts Can’t Beat Most LLMs at a Cybersecurity Knowledge Competition, But They Can Use LLMs to Avoid That “I Used to Know This!” Frustration

Human Experts Can’t Beat Most LLMs at a Cybersecurity Knowledge Competition, But They Can Use LLMs to Avoid That “I Used to Know This!” Frustration

Posted on May 21, 2026 by Omer Akgul, PhD , Petros Efstathopoulos , Chris Gates, PhD , Laura Koetzle , Dan Marino, PhD

Key takeaways

RSAC’s research team ran the largest-scale knowledge-based benchmarking study in the cybersecurity domain that pitted humans against LLMs, and the LLMs won:

LLMs outperformed human experts in all 21 topics
Just three of the 39 LLMs we tested had higher failure rates than the human baseline rate
But this doesn’t mean LLMs can replace human cybersecurity experts; instead, experienced professionals can use them to quickly re-learn things they used to know but can no longer recall

At RSAC 2025 Conference, we ran an experiment; we challenged cybersecurity professionals to pit their domain knowledge against the capabilities of a battery of 39 different LLMs in a game we called “AI Showdown.” In all, 279 attendees answered the call, submitting a total of 2,439 answers to our difficulty-calibrated questions, and the results were sobering—the LLMs outperformed the humans across all 21 topical categories.ⁱNearly half of RSAC Conference attendees boast 10 or more years of cybersecurity experience, so they’re hardly novices.ⁱⁱ Nonetheless, the human experts’ best average performance (a failure rate of just 19% in the “Law” category) couldn’t match the LLMs’ worst effort (a 17% failure rate for “Open Source Tools”).

Of the 39 LLMs that participated in our evaluation, just three had a failure rate higher than the baseline human failure rate of nearly 33%. So if you’re going to compete against an LLM in a cybersecurity multiple-choice question competition, make sure your opponent is one of those three (qwen2-0.5b-instruct, qwen1.5-0.5b-chat, and llama-7b). Prior to this research, we would have broadly expected the smallest models to perform the worst, but llama-7b had the highest failure rate of the 39 models we tested, despite being in our “medium-sized” group (between four and 15 billion parameters).ⁱⁱⁱ

But humans shouldn’t hang up their cybersecurity tools in despair just yet—the “AI Showdown” game is a set of multiple-choice questions which tests recognition of the single correct answer among a set of four answers rather than the ability to recall the answer from memory. And pattern recognition (having been trained on vast volumes of human-generated knowledge) is exactly what LLMs excel at. When one poses the same questions in open-ended format, LLM accuracy drops substantially. And the real-world problems that cybersecurity practitioners must solve are seldom as clear as well-designed multiple-choice questions.

In practice, LLMs offer a superior solution to a perennial problem for experienced cybersecurity professionals—vanishingly few of them have perfect recall. But LLMs provide an extremely good approximation of perfect recall; so instead of searching the internet for something they used to know but have forgotten the details of, cybersecurity experts can just ask their favorite (search capable) LLM!^iv

^{_____________________________________________________________________________________________}

ⁱWith 625 questions across 21 subtopics, ours is the largest-scale knowledge-based benchmarking study in the cybersecurity domain, and the 279 participants make this the largest human baseline in this domain.

ⁱⁱ49% of RSAC Conference attendees report having 10 or more years of industry experience. Source: RSAC 2026 Conference. Note also that our 279 study participants represent a convenience sample of RSAC 2025 Conference attendees who chose voluntarily to play the game; they’re not a random sample of cybersecurity practitioners.

ⁱⁱⁱNote that despite the name, llama-7b is the first generation llama.

^iv With the caveat that if more “wrong” than “correct” answers exist in the LLM’s training data, it’ll provide you with the wrong answer.

Contributors

Omer Akgul, PhD

Principal Researcher, RSAC

Petros Efstathopoulos

Vice President, Research, RSAC

Chris Gates, PhD

Senior Director, Research, RSAC

Laura Koetzle

Head, Community Research, RSAC

Dan Marino, PhD

Research Director, RSAC

View More Blogs

Blogs posted to the RSAConference.com website are intended for educational purposes only and do not replace independent professional judgment. Statements of fact and opinions expressed are those of the blog author individually and, unless expressly stated to the contrary, are not the opinion or position of RSAC™ Conference, or any other co-sponsors. RSAC Conference does not endorse or approve, and assumes no responsibility for, the content, accuracy or completeness of the information presented in this blog.

Share With Your Community

Related Blogs