Security Enterprise ServicesSecurity Leadership and Management

Here's What the Wells Fargo Outage can Teach us About Risk Management

Within the enterprise security and risk management community, there’s no debate about the financial impact of business downtime — a single hour of downtime can mean over $100,000 in losses for the overwhelming majority of businesses. But the consequences of downtime aren’t just monetary; they can be reputational as well.

Sometimes, it takes a high-profile event like Wells Fargo’s early February outage to starkly illustrate how damaging the impact of downtime can be on a company’s public image. The incident — prompted by an automatic power shutdown at one of its data centers — led to widespread service interruptions, undermined customer trust and raised broader concerns about the efficacy of Wells Fargo’s data management methods. For risk management leaders, Wells Fargo’s mistakes provide an important learning opportunity.

Where Wells Fargo went wrong

The Wells Fargo outage is a case study in data mismanagement. Before, during and after the incident, the bank’s leadership made missteps that exacerbated their problems, fractured user trust and fueled negative headlines.

Before: Despite being the fourth-largest bank in America, Wells Fargo did not have the infrastructure to withstand a downtime incident. Rather than have the data redundancy in place to manage damage or disruption to a single physical location, Wells Fargo was not suitably backing up mission-critical workflows. As a result, when the detection of smoke at its primary data center on February 7 triggered an automatic shutdown, the transition to backup servers was slow, clumsy and immediately a problem.
During: Once the automation shutdown happened, Wells Fargo’s IT leadership did not have an effective plan to contain damages and mitigate the fallout. Instead, the bank scrambled to reroute critical data, resulting in thousands of customers being unable to use the bank’s website, mobile app and ATMs without errors. Rather than quickly contain the issue, Wells Fargo grappled with downtime for almost an entire business day.
After: In the immediate wake of the incident, the banking giant had an opportunity to clearly communicate the source of the problem and a timeline for achieving full operability. Instead, its external communications were vague about the source of the problem and nonspecific about a resolution timeline — which only further damaged customer confidence.

How the security community can learn from Wells Fargo’s mistakes

For those of us in the enterprise security and data management community, Wells Fargo’s errors at every stage of the incident present a learning opportunity. Looking at how Wells Fargo fell short on incident response and disaster recovery, here’s how we can do better:

Prioritize IT resilience: The greatest lesson businesses can learn from the Wells Fargo incident is to have a more resilient infrastructure. That starts with building a disaster recovery environment that prioritizes immediate recoverability of critical assets. To achieve this resilience in a cost and time-efficient way, enterprises should consider solutions providers that automate key elements of DR and cloud backup. When searching for solution providers, it’s important to look for one with dynamic provisioning rather than pre-provisioning, since this will help keep costs to a minimum while maintaining timely recoverability.
Don’t just have a plan — test it: Wells Fargo’s IT leadership didn’t realize the gravity of their situation until they were in it. This oversight means they hadn’t put their DR plan to the test. Many other enterprises are guilty of this as well: They get a plan down on paper, but fail to run it through a plausible scenario. The result, as Wells Fargo showed, is protracted downtime. To avoid a similar situation, enterprises should ensure testing is a key part of their DR plan. Tools with automated DR testing can help drive efficiency and limit manual time spent testing.
Communicate transparently: Wells Fargo did itself a disservice by under-communicating the incident. Enterprises can avoid making a similar mistake by making clear and transparent communication a key component of their incident response plan.

By taking concrete steps to level up their data management infrastructure and disaster recovery solutions, enterprises can prevent a localized incident from becoming a headline-grabbing business debacle.