Increasingly red teaming has become a critical tool for enterprise security leaders to utilize to test security and identify gaps in defenses. Artificial intelligence (AI) red teaming is a topic that is on the mind of a lot of CISOs these days, but is it right for all organizations?

Here, we talk to Steve Benton, Vice President of Threat Research and General Manager Belfast at Anomali.

Security magazine: What is your title and background?

Benton: As Vice President of Threat Research and General Manager Belfast at Anomali my job is to deliver guidance to global organizations on investment, strategy and making security a business enabler that rapidly addresses the evolving global threats of today and in the future. My experience in cybersecurity spans three decades including 18 years at BT, one of the world’s leading communications companies, where I served as Deputy CISO and CSO. An industry security expert, I’m also a contributing member of the Cyber Defenders Council, Fellow of the Chartered Institute of Information Security and advisor to the i4 C level community.

Security magazine: Is AI red teaming realistic for enterprises today? 

Benton: The short answer is “no." Well, a qualified “no." Yes, you can red team the platform itself, look to see how resilient it is to known attack vectors against the technologies and infrastructures on which it is built. My “no” part is to do with the “subversion” style of attacks that are looking to manipulate how the AI and machine learning (ML) behaves to give wrong answers or the answers the attackers want it to produce under specific stimuli. Now that already sounds really complicated right? And that’s because it is! And hence you actually need a deep AI/ML expert that understands the models, logic and heuristic engines utilized to craft these tests. Do these AI/ML hackers exist? I doubt it. Perhaps a new form of red team is needed that assembles a team with the traditional hacking expertise with the deep AI/ML expertise that have built these systems.

Security magazine: When will it be? 

Benton: Given the predicted growth in AI and ML expected over just the next five years (36% CAGR predicted) the answers are “now” and “because of the value and enforced trust these systems will have” (coupled to the time it will take to build the expertise).

Remember, these systems are designed for, and have found their place in the world, solving the issues and problems spaces that are literally too huge for humans alone. Even for humans to gainsay the output form AI/ML is challenging hence why I say we have created an “enforced trust” of these systems. There are in a way “too big to fail” and as such we need to be sure they can operate as intended and are resilient to attempts to take them offline or worse “poison” their “brains”.

All AI and ML systems are technically “built” (i.e. the platform/infrastructure they run on) and then they are in a sense “born” and then go through a period of “training/learning” in order to acquire the knowledge they need to operate in the problem space they have been designed for when they’ve “grown up." That training takes significant effort, and the training data itself acquires huge value as it is the means by which any clone/replacement would also need to be trained. If this training data is compromised the AI/ML can be poisoned so that it can no longer be trusted to produce the right answers.

In addition, the engines (the brain as it were) can also be manipulated if it is understood well enough — this is where the attacker had understood how to make the AI/ML produce a predictable result (to their advantage) from providing certain types of input. It’s almost another type of poisoning.

Many of these systems also learn “on-the-job” — i.e. as they are utilized in real world data and scenarios there is learning through experience. That is even harder to restore in the event of a failure — so systems literally need to be fully snapshotted regularly to give a predictable recovery point. But how do you know that last snapshot isn’t the poisoned one, when in your history of operation was the compromise (poison) introduced?

Then there is the theft of the answers. For example if the AL/MI is being used to analyze markets or scenarios that will lead to output strategies and decisions. A competitor or adversary now understand how their opponent may behave or what they might pitch — and in so doing they can out game that competitor.

And finally, there is the continuity/resilience issue. With all this investment at stake these systems cannot go down, cannot be unrecoverable to a known state of “thought” and trust/integrity. These systems need business continuity and disaster recovery plans on steroids! And these plans need to be fully tested and rehearsed given the stakes.

Security magazine: What resources will they need to threat model and test their AI systems? 

Benton: As I said already you need to retain the ML/AI experts that designed and built the system. You need to couple them up with your technical hackers and then with your security operations team, game out all the scenarios I described above. So the answer here is purple teaming — not just red teaming. Remembering that some of this testing could involve “poison” you need a reference model set-up to do this with the ability to restore and retest scenario by scenario.

Security magazine: What are the initial steps that enterprises can take to understand the risk environment surrounding their AI/ML-based systems?

Benton: First off approach the system as a full-on safety critical system that cannot have its operation compromised. Has it been deployed in a resilient way with backup and recovery? Are all sites and rooms sufficiently physically separated, secured and monitored? Has it got the security 101 of security protection and detection and monitoring in play as a technology platform as well as a physical footprint?

Then properly assess the business value of this system and the impact of interruption, answer theft, manipulation or poisoning — what’s at stake here, who would be interested, how would you know something is wrong, how do you maintain trust in its operation and what will you do if you cannot trust the system or lose it from operation.

And, most critically, ensure you can sustain this understanding by putting in place specific threat intelligence to monitor and assess threats ensuring that you are able to operationalize this intelligence into your security ecosystem to both hunt for threats and compromise as well as prioritize security posture improvement vs the evolving threat landscape.