Security Magazine logo
search
cart
facebook twitter linkedin youtube
  • Sign In
  • Create Account
  • Sign Out
  • My Account
Security Magazine logo
  • NEWS
    • Security Newswire
    • Technologies & Solutions
  • MANAGEMENT
    • Leadership Management
    • Enterprise Services
    • Security Education & Training
    • Logical Security
    • Security & Business Resilience
    • Profiles in Excellence
  • PHYSICAL
    • Access Management
    • Fire & Life Safety
    • Identity Management
    • Physical Security
    • Video Surveillance
    • Case Studies (Physical)
  • CYBER
    • Cybersecurity News
    • More
  • BLOG
  • COLUMNS
    • Career Intelligence
    • Cyber Tactics
    • Cybersecurity Education & Training
    • Leadership & Management
    • Security Talk
  • EXCLUSIVES
    • Annual Guarding Report
    • Most Influential People in Security
    • The Security Benchmark Report
    • Top Guard and Security Officer Companies
    • Top Cybersecurity Leaders
    • Women in Security
  • SECTORS
    • Arenas / Stadiums / Leagues / Entertainment
    • Banking/Finance/Insurance
    • Construction, Real Estate, Property Management
    • Education: K-12
    • Education: University
    • Government: Federal, State and Local
    • Hospitality & Casinos
    • Hospitals & Medical Centers
    • Infrastructure:Electric,Gas & Water
    • Ports: Sea, Land, & Air
    • Retail/Restaurants/Convenience
    • Transportation/Logistics/Supply Chain/Distribution/ Warehousing
  • EVENTS
    • Industry Events
    • Webinars
    • Solutions by Sector
    • Security 500 Conference
  • MEDIA
    • Interactive Spotlight
    • Photo Galleries
    • Podcasts
    • Polls
    • Videos
      • Cybersecurity & Geopolitical Discussion
      • Ask Me Anything (AMA) Series
  • MORE
    • Call for Entries
    • Classifieds & Job Listings
    • Newsletter
    • Sponsor Insights
    • Store
    • White Papers
  • EMAG
    • eMagazine
    • This Month's Content
    • Advertise
  • SIGN UP!
CybersecurityLogical SecuritySecurity Education & Training

The AI Efficacy Asymmetry Problem

By Francisco Donoso
Blurry keyboard
Mohamed Marey via Unsplash
March 24, 2026

Over the last year and a half, we’ve seen AI organizations lay the foundation to enable the reality of “AI Agents”. With Anthropic’s release of Model Context Protocol (MPC) in November 2024, we saw the first building blocks of letting Large Language Models (LLMs) such as ChatGPT and Claude interact with the real world, via APIs, and have an impact on systems. This meant that LLMs could finally move beyond a simple chatbot. It was only a matter of time before the cybersecurity industry saw the impact of these capabilities.

How Has Rapid AI Innovation Changed the Cybersecurity Landscape?

Since the 2024 release of MPC, we’ve seen an incredible amount of innovation from the firms building these frontier models. With the latest releases from OpenAI and Anthropic, we’ve seen models that have significantly improved their ability to interact with external systems. Anthropic’s Clause Sonnet 4.6, released on Feb. 17, boasts of significantly increased capabilities when it comes to “computer use.” This means that the models don’t have to interact with APIs (via MPC) but can interact with systems just like humans do — through a browser or an application’s user interface.

AI is evolving faster than any foundational technology before it, and now, we’re seeing AI labs train new foundational models and create new features, like Claude Cowork, using their own AI models, exponentially increasing the speed of innovation in this space.  

We have started to see innovations that enabled LLMs and AI models to be integrated directly into the workflows that both developers and attackers utilize. Technology like Claude Code or Open AI’s Codex CLI enables users to interact and orchestrate AI agents directly in a terminal or command line interface (CLI). And recently, we’ve seen some of the havoc (and quite frankly amazing things) that Clawdbot (now OpenClaw) can do by letting these agents orchestrate actions.

At the same time, we’ve also started to see significant investment in Agentic Pentest/attack companies; organizations that are working to automate the vulnerability discovery and exploitation process. Organizations like XBOW have built AI agents that earned top spots in capture the flag (CTF) and bug bounty hunting leaderboards.

All of this innovation has culminated into something we all predicted but didn’t expect to deal with so soon: threat actors leveraging these tools to orchestrate and automate broad attacks against organizations. In November 2025, Anthropic broke the news that they believed that a nation state threat actor leveraged Claude Code to orchestrate and automate much of a cyber espionage campaign that spanned from automated reconnaissance to automated exploitation of Web Application Vulnerabilities, and even attempted lateral movement. We’ve also seen claims from Google’s Threat Intelligence Group (GTIG) that LLMs have become essential for nation state threat actors for research, targeting and crafting lures. 

How Does This Create the AI Efficacy Asymmetry Problem?

While threat actors have begun adapting their workflows to leverage Agentic AI, so have defenders. We’ve seen scores of AI SOC companies emerge, each promising to either fully automate a Security Operations Center (SOC) or scale humans significantly. We’ve integrated such AI tools into our MDR analyst workflows, to enable them to review the automated investigations, understand the searches or investigative steps that were taken, and then determine if the activity was malicious and trigger containment actions.

However, all of this only exacerbates the “cyber arms race” that we’ve been talking about. And in significant ways, it tips the advantage towards the threat actors — something I’m calling the AI Efficacy Asymmetry problem.

LLMs hallucinate. This is inherent to how they were built and trained. They attempt to predict the next most likely word based on their training data. If their training data doesn’t have the perfect match, if you ask the question incorrectly, or if there are issues with your context window, the AI agent will just lie to you. Confidently. This is because these models have been trained to please and be confident in their answers, even if they don’t have the data to back up their generated results. An article from OpenAI stated, “Our new research argues that language models hallucinate because standard training and evaluation procedures reward guessing over acknowledging uncertainty.”

This means that when AI models get things wrong, they do so confidently. That often doesn’t matter in an attack scenario; attackers can just try again. However, it could be detrimental for defenders.

Why Are Hallucinations So Harmful for Defenders?

There have been several benchmarking and academic efforts to test the overall efficacy of current AI models when it comes to cybersecurity performance. In one study by Stanford and Carnegie Mellon Universities, which attempted to pit human pentesters against an orchestrated AI agent with access to “standard” and open-source pen test tools, the AI agents were able to reliably identify and exploit real vulnerabilities roughly 80% of the time, with some variations in scaffolding and configuration. In this study, the ARTEMIS AI Agent was able to outperform 90% of human pen testers. Now, look… there are some details missing regarding the backgrounds and skillsets of the pentesters, and there is certainly a wide array of capabilities that pentesters have these days.

 Anthropic’s recently released Claude Opus 4.6 model card quoted roughly 66% efficacy in finding vulnerabilities and roughly 93% success rate against Cybench’s 40 CTF challenges. As part of the model’s release in early February, Anthropic also published news that their Opus 4.6 based AI agent found roughly 500 new zero day vulnerabilities in open source software, including more complex bug categories like Buffer Overflows.

This means that when AI agent efficacy is about 80%, the threat actors succeed. If an AI agent hallucinates a vulnerability and attempts to exploit it, it will fail. And that’s OK for attackers. They can just try again with a different bug. All it costs them is tokens.  

However, for defenders, the cost of hallucinations could be disastrous. What if your AI SOC agent connected to your EDR’s network isolation feature misinterprets an alert and attempts to isolate a Domain Controller or disables a service account that is critical to the functioning of your business? What if it accidentally disables your CEO’s account because they reported a phishing email that they didn’t actually fall for? What if an AI agent hallucinates the fact that threat actors were able to successfully move laterally and your IR team chooses to temporarily disconnect the network from the internet?

Defenders must consider the implications of implementing AI agents with response capabilities, or the consequences of a confidently incorrect AI agent. There must be guardrails that prevent the incorrect execution or potentially disastrous containment procedures. Time and time again, we’ve seen news stories of AI agents taking incorrect action in ways that are highly impactful. For example, this story details how an AI agent accidentally deleted production data bases at SaaS companies and then lied about it.

What Is the Lesson for Our Industry?

For the foreseeable future, while we have AI models that hallucinate, threat actors have an asymmetric advantage. An 80% success rate for threat actors is great. A 20% failure rate for defenders, when the AI agent can take impactful containment actions, is risky, and we need to architect our agentic AI deployments and guardrails accordingly.

KEYWORDS: artificial intelligence (AI) Artificial Intelligence (AI) Security defense security

Share This Story

Francisco donoso headshot

Francisco Donoso is Chief Product & Technology Officer at Beazley Security. Image courtesy of Donoso 

Blog Topics

Security Blog

On the Track of OSAC

Blog Roll

Security Industry Association

Security Magazine's Daily News

SIA FREE Email News

SDM Blog

Manage My Account
  • Security Newsletter
  • eMagazine Subscriptions
  • Manage My Preferences
  • Online Registration
  • Mobile App
  • Subscription Customer Service

More Videos

Sponsored Content

Sponsored Content is a special paid section where industry companies provide high quality, objective, non-commercial content around topics of interest to the Security audience. All Sponsored Content is supplied by the advertising company and any opinions expressed in this article are those of the author and not necessarily reflect the views of Security or its parent company, BNP Media. Interested in participating in our Sponsored Content section? Contact your local rep!

close
  • critical event management
    Sponsored byEverbridge

    Why a Unified View Across IT, Continuity, and Security Makes or Breaks Crisis Response

Popular Stories

Fingerprint on computer board

Enhancing Incident Response with Integrated Access Control and Video Verification

Iran on map

Iran Conflict and Cybersecurity: What to Expect in the Next 30 Days

World Cup trophy beside goal

World Cup Safety and Security Is About More than Just Crime

Woman in suit

Can the Industry Do More for Women in Security?

An Opened Lock Standing Out From Locked Locks

From the Outside In: A Smarter Approach to Vendor Access

SEC 2026 Benchmark Banner
SEC 2026 Benchmark Banner

Events

April 8, 2026

The Future of Executive Protection: Layering Technology, Intelligence, and Response

Digital threats to executives and other high-profile employees are evolving faster than most corporate protection programs. Learn why modern executive protection programs require data-driven, intelligence-led strategies to keep pace with the magnitude of today’s threats.

April 15, 2026

How AI is Closing the Decision Gap in Leading GSOCs

Learn how modern security teams are evolving from alert-driven workflows to outcome-driven operations and how AI is enabling faster, more confident decisions at every stage of the incident response lifecycle.

View All Submit An Event

Products

Security Culture: A How-to Guide for Improving Security Culture and Dealing with People Risk in Your Organisation

Security Culture: A How-to Guide for Improving Security Culture and Dealing with People Risk in Your Organisation

See More Products
SEC 2026 Top Cybersecurity Leaders
×

Sign-up to receive top management & result-driven techniques in the industry.

Join over 20,000+ industry leaders who receive our premium content.

SIGN UP TODAY!
  • RESOURCES
    • Advertise
    • Contact Us
    • Store
    • Want More
  • SIGN UP TODAY
    • Create Account
    • eMagazine
    • Newsletter
    • Customer Service
    • Manage Preferences
  • SERVICES
    • Marketing Services
    • Reprints
    • Market Research
    • List Rental
    • Survey/Respondent Access
  • STAY CONNECTED
    • LinkedIn
    • Facebook
    • YouTube
    • X (Twitter)
  • PRIVACY
    • PRIVACY POLICY
    • TERMS & CONDITIONS
    • DO NOT SELL MY PERSONAL INFORMATION
    • PRIVACY REQUEST
    • ACCESSIBILITY

Copyright ©2026. All Rights Reserved BNP Media, Inc. and BNP Media II, LLC.

Design, CMS, Hosting & Web Development :: ePublishing