Description: We held a session to discuss the unique challenges of conducting incident response in the public cloud.
We had a great mix of attendees at the P2P session on Incident Response in the public cloud, including practitioners from cloud native companies as well as those from mature organizations just starting to move out of the datacenter. We started the conversation with some war stories on incidents that the audience had worked, and the discussion highlighted several common needs, managing multiple accounts, conducting forensics in an ephemeral hosted environment, and of course logging.
Multiple accounts and even multiple cloud services are a challenge to manage, and while there are companies emerging in this space (Terraform) it is fairly immature. Many of the larger companies (Netflix included) have homespun solutions that are moving towards infrastructure as code. The major cloud providers are also helping in this space, Amazon Web Services (AWS) Organizations was one such effort we discussed.
Another common challenge was reconstructing events in an ephemeral and hosted environment. The classic example is mapping historical IPs to instances - for example you get an abuse complaint against an IP that is a few days old and that IP has been reassigned multiple times by then. Cloud providers do not natively provide a retrospective lookup of IP to machine mappings, so you as the client have to create that record, and generally at larger scales, also implement a caching layer between your services and commonly (ab)used APIs like describe-instance. In terms of forensics on the instances themselves, disk capture is much easier than in the datacenter using the snapshot API in AWS or mounting read-only in Google Cloud; however, memory capture is still a manual process and requires interacting with the instance under inspection (for example SSHing or SSMing into an instance and running LiME). The preference would be for cloud providers to allow API access to memory capture - something like LibVMI.
Finally logging. Logging has always been a challenge, whether in the datacenter or the public cloud. There are some excellent native logging sources in public cloud services, for example if you use an AWS Elastic Load Balancer you can get request logging dumped right to S3 in AWS. You also have logs around what the cloud provider APIs are doing - in the AWS case this is CloudTrail. The challenge comes in moving all this log data around. In a hybrid deployment, you might want to ship it all back to your on-prem SIEM. Netflix keeps it all in the cloud and builds our detection and analysis tooling there - we are datacenter free. There are also several commercial offerings that are cloud-native and will take your logging data directly from your cloud account into theirs and offer tools on top of it. These are great for small to medium sized enterprises, but the pricing models are most often based on volume of data and so become prohibitively expensive for larger organizations. There are some exciting options coming online in AWS, like Athenawhich allows Presto queries to be run directly against text logs in S3, or in the future AWS Glue, a managed ETL pipeline, in combination with AWS EMR.
Thanks to all the attendees for an excellent discussion and to the RSAC organizers for hosting the event!