Inside the AI Showdown Challenge by RSAC Labs: Benchmarking Human and LLM Cybersecurity

Posted on July 17, 2025 by Tatyana Sanchez

What is the AI Showdown Game?

As we know, AI and LLMs have become more sophisticated and capable, but how smart are they, really? RSAC ™ Labs created the AI Showdown challenge, which pits humans against LLMs. Aspiring players sign in with their Google credentials (users without a Google account will not be able to access it at this time) or within the RSAC™ Membership, then pick from 21 categories (e.g., anti-fraud, cloud security) to start the game. They then answer questions with a limited amount of time. At the end, players see if they outsmarted the LLMs and also check the leaderboard to see where they placed against other players.

AI Showdown

What is the Goal of the AI Showdown Challenge?

Ever wondered if LLMs are smarter than humans, or if they can recall answers correctly and derive complex solutions? AI Showdown was created to explore these questions, specifically by comparing how well LLMs answer cybersecurity questions against human experts in the field.

The procedure, overview, purpose, and more study details of AI Showdown can be found here: AI Showdown study details.

Want to find out more about the RSAC Labs team's findings? Stay tuned, as we’ll be releasing a research paper soon! For now, dive into our latest Community Research Report by Laura Koetzle, Head of Community Research at RSAC.

How Was the First LLM Cybersecurity Benchmark Created?

Before the RSAC Labs team could compare LLMs against human experts, they needed to establish the initial LLM cybersecurity benchmark.

This was accomplished by synthetically generating questions through a multi-step pipeline. LLMs acted as both question creators and judges, with human experts providing a crucial final curation step to eliminate any hallucinated questions. This process allowed the RSAC Labs team to categorize questions by difficulty and complexity, transforming the benchmark into a game format.

To learn more about the insights from AI Showdown and how the LLM cybersecurity benchmark was derived, tune into the RSAC™ 2025 Conference presentation by Armin Buescher, Technical Director at RSAC, and Athanasios Theocharis, Principal Researcher at RSAC. They revealed the analysis of the first LLM cybersecurity benchmark, as well as broader insights from AI Showdown.

What is the Purpose of Benchmarking LLMs and Human Experts?

There are many reasons for comparing LLMs and human experts in cybersecurity. But the main focus of benchmarking LLMs and human experts is to assess humans' general knowledge of the past and present state of cybersecurity. While the questions cannot be directly translated into actions that LLMs can take, they can certainly provide insight into how LLMs can assist humans in evaluating cybersecurity situations.

AI Showdown's Time Penalties: What's the Goal?

The AI Showdown questions have a time limit; players lose more points the longer they take to answer. This penalty, while a gamification feature to foster competition among humans, also highlights that LLMs provide instant responses. The RSAC Labs team aimed to test if humans could use instinct for faster answers in a "production" setting, such as by detecting patterns or keywords. This proved true, as Multi-Factor Authentication (MFA), for example, was often selected as the correct answer when offered as a choice because users are familiar with this term and know that it’s a good security practice.

Human vs. LLM: What Distinct Answer Patterns Emerged?

By comparing humans against LLMs across 21 categories, the RSAC Labs team identified which areas humans struggled with and where they excelled.

For instance, humans answered Mobile & IoT Security questions incorrectly nearly 50% of the time yet achieved over 80% accuracy in the Privacy category.

Notably, LLMs answered almost all questions correctly, while humans averaged 70% correct responses. This highlights how LLMs, with access to vast cybersecurity knowledge, can provide comprehensive input on current issues, complementing human experts who often specialize in niche fields.

Ready to Take Your Expertise to the Next Level?

Once you’ve tried your hand at AI Showdown, we encourage you to explore the RSAC Membership Platform—where the game was originally launched. Join RSAC Membership, now free* for a limited time, RSAC Membership connects you with a global network of cybersecurity professionals, researchers, and innovators. It’s a space to dive deeper into topics like large language models, AI ethics, threat detection, and more. Whether you're just starting out or looking to stay sharp in an evolving field, RSAC Membership is a gateway to learning, collaboration, and staying at the forefront of emerging tech.

*RSAC reserves the right to modify or discontinue the Membership, charge, modify or waive any fees in connection with the Membership or offer opportunities to some or all users of the Membership. The Membership Terms of Service are available here.

Contributors

Tatyana Sanchez

Senior Coordinator, Content & Programming, RSAC

View More Blogs

Blogs posted to the RSAConference.com website are intended for educational purposes only and do not replace independent professional judgment. Statements of fact and opinions expressed are those of the blog author individually and, unless expressly stated to the contrary, are not the opinion or position of RSAC™ Conference, or any other co-sponsors. RSAC Conference does not endorse or approve, and assumes no responsibility for, the content, accuracy or completeness of the information presented in this blog.

Share With Your Community

Related Blogs