The Journey to The Self-Driving SOC


Posted on in Presentations

Twenty years ago, few believed self-driving cars could happen yet they’re here. Will the same principles pave the way towards self-driving security? Nir Zuk explores what an autonomous SOC looks like, why it's needed and how getting there requires a revolution in innovation. Nir will also detail potential pitfalls along the way.


Video Transcript

>> NIR ZUK: Wow! Hi. So good to see you all in person. It's been a while. Anyone here working in a SOC, security operations center, you know, the hardest job in IT?  Yes. Okay. I'm here to help.

 

You know, here is an interesting observation. Every time or almost every time there is an incident, God forbid a data breach, eventually after a few hours, a few days, sometimes longer than that, information security or specifically the SOC comes back and says we know what happened. They went in through here and they exploited something. Or they phished and they got ahold of these end points, and they jumped here and there, and they got to the data, and they used this to xfill it. And somehow you get this puzzle figured out and you know exactly what happened.

 

And my question is if you have all of the information to figure out what happened almost every time there is an issue, why didn't you figure out in real time?  Why did you wait so long, only after the data breach, to tell us what happened? And, of course, the answer is there is just too much information.

 

It's very hard to do that, right. When it happens, you need to go through all of those different systems that have different data from different sources. Your SIEM is probably the first one you would go to, and then if you have EDR you go to your EDR database, and Cloud logs, and server logs, and application logs, and all of these different places. And you will get this picture figured out and, again, exactly what happened. And if you wanted to do it constantly for each and every event that's coming in, you would have to do the same thing, right. You would have to take every event that comes in, and put it into a timeline, so this happened, and this happened, and this happened, and then investigate and figure out whether that's an attack or not. And that, of course, is something that humans cannot do. There is just too much information. If you wanted to do that, you can train all of the people in the world to be cybersecurity experts, you just won't be able to do it. We need machines for that.

 

And then the next question is, so, we have been building as an industry, and you have been buying as our customers, automation tools for cybersecurity operations. And more and more of these tools. There is the SOAR, security orchestration automation and response tool, which is a pretty name that all cybersecurity people like me like to give to RPA, Robotic Process Automation. And then there is XDR and other technologies which automate some of the hunting that we do, but more the investigation and things like that. There is attack service management and I’m sure that many of you are writing scripts, dozens, hundreds of scripts in order to automate different things in the SOC, but it's not there yet.

 

If you think about it, most of the automation that the industry is providing and that individual security operations centers are building is really geared towards one thing, which is to be able to handle all of the alerts that are coming in. We have the infrastructure that's generating alerts from the network, from end points, from applications, from servers and so on. When all these alerts come in, the traditional approach has always been we will use correlation rules to reduce the number of alerts that we need to handle to the exact number that we can handle. And the goal of these automation tools is to change that and to say, no we are going to handle all alerts. Every alert that comes in, a single alert or a group of alerts, an incident is going to be investigated and dealt with and whatever the machines can do, they can do, and then whatever people have to do, they will do.

 

And it works, but still it automates handling the alerts, it doesn't really do what I said before, which is we have the data to figure out that we are under attack. We are just not using the data. We use it after the fact, after we know that we have been attacked. And we need a different approach. So the current approach is a little bit like what the car industry has been doing in order to automate cars. Every few years car manufacturers are coming out with new automation technology. It started with cruise control many years ago, and then adaptive cruise role, and lane keep assist, automatic braking, backwards and forward and from the side and all different technologies.

 

But, still, it's the human in the center. The human is still driving. And we can't do that in the SOC if we want to investigate each and every data point that's coming in to figure out whether we are under attack or not. We need to bring in the robots. We need to bring in automation to make the SOC automatic, autonomous, make it run by itself. We will bring the people in later. We can't really do that without people, of course. We need the people. So maybe to explain how this can be done, let's start with the basic. The basic is why is the security operations center, the SOC, even in a position to do that? Versus whatever we have deployed in the infrastructure?

 

So in the infrastructure we have tools like network security and end point security, identity and access management, different Cloud security tools, application security, server security, workload security and so on. And one of the challenges that all of these tools have is that ‑‑ it's really two challenges that are related to each other. They only look at local data, right, if you are the network security tool or you are the end point security tool, and so on, you see what you see where you see it. You don't even see, you know, if you are network security tool, you don't see what other network security tools deployed in different locations see. You just have whatever you have there to make your decision. And the same is true for end point. The same is you true for access control, identity, and access management, and so on. So, very localized decision. Whereas the security operations center, we have data coming from all over the infrastructure, and we can make global decisions, not just local decisions, which means that we can find more issues.

 

And the second issue, which is somewhat related, is in the infrastructure in most cases, we don't have a lot of time to make a decision. In the network, in many cases we need to make a decision in less than a millisecond. Sometimes we can afford 10 or 20 milliseconds if it's a DNS query or something like that. The same is true for the end point. Identity and access management. When I identify myself, when I go through the single factor or two‑factor authentication, I don't want to wait five minutes for machine learning to do its magic to decide whether I am whoever I say I am, it has to be quick.

 

So we don't have a lot of time to make a decision in the infrastructure, which means that we are limited as to what we can do. Yes, it's getting better, we use the Cloud, we send information to the Cloud, receive an answer within ten milliseconds, but it's still limited. In the SOC, we have more time. How much time do we have? The least possible, of course, because by the time we figure out in the SOC that something happened, it's already too late. The attack is already going on, we have to go back and undo it. Better if you do it within a few minutes, but sometimes it takes longer. Sometimes you collect information over a week and just at the end of the week you get the missing piece that tells you, oh, that was an attack. And then you have to go and undo a week worth of adversary actions. But that's okay. That's better than nothing.

 

So the SOC is certainly in a position to make decisions that the infrastructure couldn't have made. So the infrastructure decided to trust something, the SOC will reevaluate it using more data and more time and figure out that something went wrong. So to do that, a few things need to happen.

 

The first thing is we need data. And, you know, if we want to bring in the robots, what we call AI, Artificial Intelligence or the more technical term, ML, machine learning, we will call it AI/ML so the marketing and technical people will be happy. If you want to use AI/ML, the first thing you need is data. And the way AI/ML works is you need very diversified data, and you need a lot of data.

 

When I say diversified, I mean you need features. Let me explain what that means in the context of AIML. Let's say that you want to sell products to people, and you want to figure out what are they going to buy. Who is going to be more probable to buy your product? So you can do it based on just the age of the people and figure out, you know, the age and some product they bought in the past and yeah you will have some rough idea who is going to buy your product. If you can get their age and the zip code, and maybe a few products they bought in the past, you can make better decisions. Because you have more features, that's the technical term, to make your decision. If you can get, I don't know, their income level and many more buying patterns in the past, and so on, the more data you have, the easier it is to reach a decision. And the same is true for machine learning.

 

Machine learning needs very wide data. The more features you have, the better it is. So you need data from the network, very deep data from the network, all HTTP headers in both directions, all DNS traffic, all DCP traffic, database access, and many other things. You need data from end points, as much as possible. We collect about 200 megabytes per day from a user end point and many terabytes from server end points or workloads. You need data from identity and access management systems, from Cloud logs for PaaS services and VPCs and flow logs, and application logs, and SaaS, public Cloud, so on. You need as much data as possible.

 

So the data has to be very, very diverse. That’s the first thing you need. And the data is important. It is important what data you collect, it has to match your AI/ML data models and algorithms that you use, and by the way, as much as I would like to hope that this will be a multivendor world and that we can collect data from everyone, I don't care what you have in the network, I don't care what you have on your end points, I don't care what you installed on your Cloud workloads in the Cloud and I don't care which CASBY you use, and so on. We are trying, we are doing our best, but it's probably not going to happen. There has to be very tight connection between the two. Again, we will try to do our best with anything, but it's going to work much better when the vendor controls both the data sources as well as the processing of the data.

 

So that's one thing that’s important. We need very diversified data. And we need a lot of data. And a lot of data is usually not what you collect today in the SOC. Today in the SOC, the center is a SIEM, and there most of the time you have to consider the cost of it, and really figure out what you are going to collect and what you are not going to collect. You have a budget for, I don't know, three terabytes a day and you are going to figure out which three terabytes a day you are going to collect. AI, machine learning, it doesn't work like that. You are not going to decide for the machines which data the machines are going to use. They are going to decide which data they are going to use. Which means you need to collect much more data. We have customers collecting almost 100‑terabytes per day, and even more than that. That's about a gigabyte per second.

 

So you need to collect a lot of data and a lot of diverse data, and the second thing is we need to bring it into one place. Yes, that's usually a loaded conversation when I have it with customers. When I tell them I need the data in one place, they are like, no, we are already paying for storing that data here and that data there and we have this data lake and that data lake and this SIEM, and so on, why don't you use the data sources we have today and connect to them? I could, but it's going to cost you ten times more in compute, and in network bandwidth and it's going to take ten times more because the data will be in the wrong format sometimes. Like what am I going to do with SQL databases which it comes to machine learning. So you will need to bring the data into one place. Sorry that might mean duplicating the data. Just look at the overall cost and see if it's worth it or not, but probably all of the data will need to come into one place, and once you do that, you unleash the machines. You take the machine learning, AI, and just let it at it. Well, that's not enough.

 

It's not enough for several reasons. It's not enough because that machine learning, that AI, is going to be programmed by experts that work for the vendor. For example, in our case, I have many dozens of security experts, they are all -- or most of them are ex-military attackers that decided to move to the good side and translate their knowledge of how to do the top, you know, attack work in the world to defend. And they program the AI, machine learning, to do that and so on. But they don't know your business. Right. They know how to detect general attacks, but each specific business has its own data and its own specific needs and requires its own specific algorithms.

 

So the machines aren't enough. We need people. We need to bring in the people to do what the machines cannot do, but we also need the machines to, over time, learn from the people. And we need the machines, over time, to learn the business and become better and better. And I know it sounds like science fiction and, you know, maybe it's going to happen in 20 years. It's happening right now.

 

This technology is happening right now. And all it requires, like I said before, is to switch our mind from the human is at the center of the SOC and the human is sitting in front of a SIEM and then we are going to add a bunch of automation tools here and there like SOAR, and XDR and others. Which are important tools, and we need them still, but it's helping with dealing with alerts. It doesn't help, really, with detecting the attacks that we need to detect. And we need to do it like the self‑driving car companies, the autonomous car companies, have been doing. They have been designing cars without a steering wheel. Assume there are no humans in the car. Yes, there are. Those cars get stuck every now and then and they don't know what to do, and you have to go and release them, and the cars learn from that so that next time it doesn't happen.

 

The same is true for the SOC, we need to design it without the humans in mind. Assuming there are no humans, no SOC analysts in the SOC, design it, make sure it works, and then bring in the people whenever we need them. Is it going to be based on current SOC technologies? Certainly not on the SIEM, it will be based on robotic process automation, or again SOAR. It’s going to be based on data and machine learning, which is what XDR does, but it has to be done in a very, very different way. We have to start with a clean slate and design it in such a way that, again, we assume that the SOC has no people in it, all security operations is being run by machines, and then we bring in the people.

 

Now, as I said, this is happening right now. I have seen it, I mean, we are building it, we are selling it, I have seen it running for customers and doing amazingly well detecting the things that humans cannot detect and doing things that humans just can't do. It's not a question of whether they know how to do it or not. You can find people that know how to do it, you just never find enough of them. And it's not about training them and adding tens of thousands or hundreds of thousands of cybersecurity experts to the SOC. The numbers are so big that it's just not going to happen.

It has to be based on machines.

 

   And the last question I have, which I will answer, is that luxury, I mean, doing this, is that a necessity or is it optional? Is it just a luxury, you know, it will be great if we can do it? And in my opinion, we have to do it. I just don't think we have a choice. I don't think we can continue working in a mode where whenever there is an issue, an attack, a successful attack, an incident, a data breach or whatever, I don't think we can afford to continue to just come back after a few days or a few weeks and so on and say, oh, we know what happened. It's just unacceptable. It's a cybersecurity atrocity to do that. Having all of the data to be able to figure out what happened and not using the data in real time to know that it is happening and stopping it is not optional. And keep running the SOC with people and with the SIEM and with some automation tools, it's just not going to get us there.

 

We have to do it in a completely different way, and this is where security operations centers are going to, and we are going to be here in a few years, and you are going to see more than one vendor on the floor talking about it and showing it and running with it. And in five years you are not going to recognize your security operations center. It's going to be completely different. It's going to run completely differently, with a completely different set of tools and, you know, for those of you who don't believe that, we will see who is right in five years.

 

Okay. Cool. Thank you very much!


Participants
Nir Zuk

Speaker

Founder and CTO, Palo Alto Networks

Security Strategy & Architecture

big data analytics security operations SIEM orchestration & automation


Topic

Subtopic


Share With Your Community