Moving to a State of Resiliency: Why War Games Are the Key
Resiliency, here defined as the ability to deliver under pressure, is essential to any successful cybersecurity program. Almost every security article or analyst study today echoes this thought. As we’ve been saying for a while, “it’s not a matter of if, but when a company gets attacked.” And so, with that as a baseline, the goal is to coordinate an effective response plan that (ideally) deflects the attack and minimizes the damage done by the bad guys.
But you can’t simply learn resiliency; there isn’t a book or set of checkmarks you can apply to a list that means you are resilient. Instead, it’s real-world training like War Gaming that delivers the closest “I’ve been there” experience and creates the muscle memory needed to respond effectively when an incident – and all the fear, confusion and paralysis it can bring – occurs.
The Effectiveness of War Gaming is Based in Learning and Behavioral Science
Have you heard this before?
People generally remember:
- 10% of what they read
- 20% of what they hear
- 30% of what they see
- 50% of what they see and hear
Often misrepresented as Edgar Dale’s “Cone of Learning,” the numbers themselves originated in 1967 when a Mobile oil company employee published a non-scholarly article assigning numerical value to Dale’s original “Cone of Experience,” which, itself, was intended to be used as an intuitive model of the concreteness of various kinds of audio-visual media.
Despite originally being a model for the concreteness of various kinds of audiovisual media, the, the order intuitively makes sense – as we engage more of our senses, retention increases.
And it’s something that HR and learning coordinators across enterprises large and small have latched onto – if we want employees to understand policies and procedures, we need to do more than hand them a book.
In order to achieve the highest concreteness or “stickiness” – Dale’s model suggests we “simulate, model or experience a lesson” or “Do the Real Thing” (for what it’s worth, 90% of that should stick). For cybersecurity, this means simulating a real incident and progressing through the situation to its natural conclusion – also known as War Gaming.
What’s interesting, then, is that according to a McKinsey study, a scant 3% of digital business practices have conducted cyberwar games to help ensure they are ready to respond to a cyberattack (“Doing the Real Thing”).
Instead what we’ve seen are businesses severely impacted by mismanagement of the response element of handling a cyber breach.
Could Yahoo have merited an additional $350M from Verizon if they had an incident response (IR) plan in place and hadn’t taken years to reveal all of its 3 billion user accounts were exposed?
Could Equifax have avoided sending to people to a phishing site – fortunately a white hat site intended to test the company’s security response – if it had enacted a wargames framework?
The obvious answer is that we’ll never know. But at Rackspace, we adhere to the adage that those who don’t learn from history are doomed to repeat it. As our Director of Security Operations Christina Galligan puts it: “I like to see how our CSOC (customer security operations center) manager, how our PR team will respond when they’ve been up for 36 hours handling a crisis.”
Our approach to war gaming stems from our perspective on the authenticity of training. As mentioned above, more and more organizations have made significant strides in looking to engage all the senses of an employee undergoing training; but, at the same time, what they’ve not come to grips with is that while this might increase rote motor – or even brain – function, these trainings are happening in a controlled environment with limited outcomes.
Consider this article from the ACRL (Association of College & Research Libraries) that notes: it’s vital to move from learning styles (like the pyramid above) to thinking about teaching modes. “The different modes (visual, auditory, kinesthetic, etc.) need to change as your subject matter does. If you are teaching someone how to drive a car, we really, really hope you are giving your student a kinesthetic experience and not simply verbally explaining how to drive.”
The same is absolutely true in the world of cybersecurity. The chaos of reality is, in a word, real. No plan survives contact with the enemy. So simply performing a training where there are 3 neat, clean options or outcomes isn’t going to get the job done. In the same way that no one will really know how to drive a car until they’ve sat behind the wheel, started the car, applied the brake and moved in reverse or forward, a CSOC manager isn’t going to know how to truly handle incident response until they’ve been put through the paces.
“When the balloon goes up,” if organizations haven’t prepared, all manner of chaos can break loose, especially during the first 24 hours after a breach. It’s during this period that people aren’t always thinking straight and things can happen that can lead to litigation. You must try to prepare for every eventuality, so when you’re reacting, it’s like a drill, it’s muscle memory, as opposed to an emotional response, which is what happens most of the time.
And it’s muscle memory that must be built across the entire organization, not just in the security organization. To risk using a cliché, but an apt one, an organization is only as strong as its weakest link. What happens if the cyber incident response team has been through the training, but the legal team doesn’t have their processes in place or the communications team doesn’t have a buttoned-up plan for effectively communicating the right information at the right time?
Additionally, when you have an adversary in your environment, you don’t know if email or other means of communication are also compromised, so it’s important that – from top to bottom – each employee understands how to handle the stress.
So, given all this, what does a War Game program look like? What are the goals and how can organizations begin to enact such approaches?
From inception of our security practice, we have developed three main goals or objectives that we continuously work towards:
- Operational maturity
- Operation effectiveness
- Operational resiliency
- Operational Maturity – the nexus of people, process and technology that forms the backbone your security operations.
We measure operational maturity by;
- Evaluating the team we’ve put in place – what certifications they possess, what education and skills they’ve been gaining;
- Measuring the processes we have in place – how defined or automated they are; and
- Quantifying the technology we leverage – what capabilities it possesses and what benefits it drives.
This gives us the foundation on which we build our overall security.
- Operational Effectiveness – the ability to deliver the potential that exists within our operational maturity.
For operational effectiveness, we look at how we pull everything together to deliver the operation. Here we examine things like how many investigations can we perform a week, how accurately are we able to triage those investigations, how quickly can we shut down an attack when we find one.
This helps us to establish our capabilities moving forward building a more realistic picture of what our organization can truly accomplish.
- Operational Resiliency – the ability to deliver under pressure.
Operational resiliency is the toughest to measure and quantify and is precisely why organizations need to enact simulation training like a War Gaming type of program. To revisit the car analogy above, resiliency is not just driving the car, but driving it during 5 p.m. rush hour traffic in extreme weather conditions like rain, sleet or snow.
Measuring resiliency involves quantifying how an organization can come together to deliver effectively while under extreme duress. Or, how does a business execute a unified incident response plan when an “Armageddon” scenario is playing out?
Resiliency is also an important topic industry-wide as thought-leaders like Bank of America’s Chief Security Scientist Sounil Yu posit that the 2020s will be the “Age or Recovery (or Resiliency).” In the same way that systems and functions with more resilient architectures and designs that expect failure (containers, immutable infrastructure, serverless architectures), so should our organizational models and processes expect failure and plan for how to recover – and how to recover gracefully.
A War Games program can be defined in three key phases to help achieve the goal of operational resiliency. What’s also important is that these three phases are “escalating,” and by that we mean that each phase builds atop the other.
Phase 1: Extended Tabletop
Overview: In Phase 1, we tabletop a hypothetical breach that poses risk to the enterprise (both security organization and the business writ large).
Stakeholders: the key stakeholders are a set of cross-functional leaders from the CSOC, ISOC (internal security operations center), Legal, PR, Engine Room and Customer Experience.
Validation Objective: In Phase 1, we look to identify and assess communication process documentation for coordination between business stakeholders (like the CSOC and legal or ISOC and PR) to respond to the threat.
Phase 2: Collaborative Hands-On Exercise
Overview: Moving to Phase 2, we utilize a “red-team”, or an independent team, to step through attack scenarios, broken down by phases of the attack lifecycle, targeting the enterprise.
Stakeholders: The stakeholders are similar to Phase 1 – CSOC, ISOC, Legal, PR, Engine Room, Customer Experience – but we also add in the IMOC (incident management operations center).
Validation Objective: Here we start evaluating the actual “gaming” portions of War Games as we assess a live-practice of the documented processes between business stakeholders as they work to respond to the threat.
Phase 3: Integrated Sustained Global Simulation
Overview: The culmination of the escalating activities, in Phase 3, we again leverage a red-team, but this time to simulate a pervasive cyber-attack with global business impact.
Stakeholders: Again, the stakeholders are similar to Phases 1 & 2, but also building in scope. Here we add in the global SOC and global legal teams to the existing group of CSOC, ISOC, Legal, PR, Engine Room, Customer Experience and IMOC teams.
Validation Objective: In Phase 3, we assess Rackspace’s adoption of learning from previous exercises to respond, in real-time, to a simulated cyber-attack on the network.
Throughout all stages, we engage in one core business value – full transparency. Mistakes will be made – in fact we want them made here, so they don’t happen in real life – and when they are, we are open about them and address them in an up-front and honest manner.
Then we aggregate these results which give us the ability to compare year over year to see how we’re doing, where we’re weak and where we can improve.
We believe that the principles behind models like a “cone of learning” – even if the numbers are debating – are important and, importantly, necessary for achieving operational resiliency.
Ultimately our goal is to deliver trust and confidence in the cloud. And when you achieve demonstrable resiliency, you can show your customers just that.
This article originally ran in Today’s Cybersecurity Leader, a monthly cybersecurity-focused eNewsletter for security end users, brought to you by Security magazine. Subscribe here.