As a concept, ASIS International coined the phrase in 2009, but it was really describing what turned out to be business continuity—more on that later. The World Economic Forum formalized resilience in 2012:
“… the ability of systems and organizations to withstand cyber events.”
Since then, other thought leaders have refined it. Former US President Barack Obama signed a presidential policy directive dictating resilience for the country’s critical infrastructure in 2013. But the definition I like best comes from two Stockholm University researchers, Janis Stirna and Jelena Zdravkovic, in 2015:
“… the ability to continuously deliver the intended outcome despite adverse cyber events.”
And finally, the International Organization for Standardization (ISO) defined it as this in 2017:
“... the ability of an organization to absorb and adapt in a changing environment to enable it to deliver its objectives and to survive and prosper.”
In other words, assume that the bad guys will be successful negotiating the intrusion kill chain, or find a chink in my Zero Trust armor, or, just in general, assume that there will be a massive IT failure sometime in the future. Devise a strategy that will ensure that your organization’s essential services will still function.
Monkeys in Chaos
My favorite example of a resilience practical implementation is what the people at Netflix call chaos engineering. In 2011, as they moved their support infrastructure from on-prem to the cloud, the Netflix engineers built their first module called “Chaos Monkey.” This is what their website says about it:
“Chaos Monkey is a tool that randomly disables our production instances to make sure we can survive this common type of failure without any customer impact.”
Let me say that another way. Netflix routinely runs an app that randomly destroys pieces of their customer-facing infrastructure, on purpose, so that their network architects understand resilience engineering down deep in their core. In my typical world, disasters are things that might happen sometime in the future but probably never. At least I hope that they don't. I have plans written on paper that discuss what we might do if a disaster happens, but that’s usually as far as it goes. In the Netflix world, planned disasters happen every day, and I still get to keep watching episodes of The Witcher uninterrupted as if nothing happened. Since they deployed the original Chaos Monkey module, the Netflix team has built an entire series of chaos tools designed to increase their confidence that they will not only survive a catastrophic event, but that their customers will not even notice that the Netflix infrastructure is going through one.
There are some network defenders and IT professionals who would categorize what Netflix does as impressive, aspirational even. But I believe that the bulk of us would categorize what Netflix does as stark raving bonkers. We’re not going to bring down our customer-facing infrastructure for a test. It’s hard enough to keep the thing up and running without destroying it ourselves. We would be wrong, but that’s the current thinking in our community. Netflix has embraced resiliency in its IT Infrastructure. The bulk of the rest of us just wave our hands at it.
Resilience vs. Business Continuity
So what is the difference between resilience and business continuity? It turns out that business continuity got its start in the 1970s, and that community is quite large. Many are upset, thinking that the new-fangled marketing term “resilience” is just the latest buzzword in the industry that is getting all of the attention, but that the two phrases are interchangeable. That’s not quite true, but I don’t want to go down the rabbit hole of that particular Internet debate here. For simplicity’s sake, think of resilience as the strategy and business continuity as the set of tactics organizations use to achieve some of that strategy. From my perspective, though, the business continuity people have stayed mostly in the physical world, concentrating on keeping the business running in lieu of natural disasters, power outages, an infestation of white walkers, etc. You don’t traditionally see a lot of business continuity people advocating for a Netflix Chaos Monkey approach.
Resilience allows our organization to continue to function during a catastrophic cyber event like an OPM-level breach or an Edward Snowden-type insider threat event. It’s one more lever to pull in our pursuit of reducing the probability of a material impact to our organization due to a cyber event. Look to the Netflixes of the world to get inspired about how to do it. Team up with the business continuity teams and bring them along for the ride. They have a lot of practical how-to knowledge that will be useful. If you can get all of this done, maybe your Castle Winterfell won’t get overwhelmed by the hacker White Walkers.Check out the full article.