Securing Reliability through Collaboration: The Power of SRE and Security Interlock


Posted on by Smitha Sriharsha

Site Reliability Engineering (SRE) is a discipline that combines software engineering and systems administration to ensure high availability and reliability of Cloud based services. The incorporation of security into SRE collaboration techniques is a tool for businesses that will help ensure that their services are secure, reliable and accessible to users.

The SRE Landscape (SRE)

SRE is the function that glues the infrastructure to the software services teams. An effective SRE team can ensure that the services are available and meet the defined service level objectives (SLO), respond to production incidents, and lead the resolution process. The team must be available to handle critical incidents and be part of an on-call rotation, and they must develop and maintain monitoring systems to detect issues and alert the appropriate teams. Unlike traditional product and application teams, SRE teams may or may not own the individual apps/services.

An SRE ecosystem brings in elements of risks, challenges, and complexity. Most of the SRE deliverables are the execution of runbooks, playbooks, and production deployments that involve step by step policies and procedures. This may also involve handling customer data, customer or enterprise infrastructure changes, and deployment of proprietary or third-party code including SRE tooling, scripts, automation code, and production-ready application code from other App teams.

Five Elements that Increase the SRE and Security Efficiencies

Here are some of the best practices that can amplify the security of the SRE ecosystem:

1. Security Policies and Compliance Requirements

One of an SRE pillar strengths is its process and procedures. Playbooks and runbooks, which are the bread and butter of the SRE function, may be the finest integration point for the security function. Procedures offer the security team with a holistic understanding of the workflows, roles and responsibilities, asset landscape, and customer base. These procedures assist in determining the security policies and compliance requirements, which are security controls that apply and do not apply to the specific ecosystem—an effective strategy to reduce costs. Early emphasis will aid in the identification of risks and the development of risk mitigation strategies.

2. Security Training

The human component of security is the most undervalued factor. Most businesses fail to educate their workers on the value of security despite investing in dedicated security training programs and budgets. Even with the best security training program that organizations might have, a cookie cutter approach to training does not work for SRE functions due to the complex skillset of the SRE role. In these situations, the best approach is to explore current training curricula and brainstorm with the business to come up with foundational security training modules that are must haves. Plan to address the training gaps based on the findings and feedback from the stakeholders as it is important to measure the ROI of these trainings.

3. Cloud Compliance and Security Control Implementation

The goals of both SRE and security are to ensure the confidentiality, integrity, and availability of data and systems while also ensuring the availability and reliability of cloud/hosted services. Security and SRE integration can be quite powerful and effective in implementation of common procedures and control objectives including:

  • Incident Response
  • Disaster Recovery
  • Access Control
  • Monitoring and Alerting
  • Implementation of Security Controls

4. Secure Platforms and Technology

The SRE team’s expertise along with lessons learned during outages can be a boon to the security team. This knowledge expedites the discovery of security control deficiencies as SRE will get direction from the security team regarding security control gaps, priority, and execution, resulting in the selection of secure platforms and technologies appropriate for the specific ecosystem. This procedure also ensures that selections align with the culture of the organization.

5Security Automation and SRE

SRE reduces difficult manual work through automating repetitive tasks and scaling operational effort using code instead of human efforts. For security automation driven by SRE, consider doing periodic runs and scans, automated ticketing of security findings from those runs and scans, and automating SBOM creation and validation. Automation of access control requirements such as key rotations, credentials management for non-human accounts (service/bot accounts, billing accounts) as well as security and compliance integrations provided by the cloud monitoring platforms (such as Splunk, datadog) into existing SRE monitoring framework will also reduce difficult manual work.

Conclusion

For cloud deployments, security and service availability are essential. The above five factors can help improve the security posture of a cloud-based or hosted environment: Powered by SRE and security.

Contributors
Smitha Sriharsha

Sr Manager Platform Security Engineering, F5 Networks

Security Strategy & Architecture

application security DevSecOps operational technology (OT Security) security architecture security operations cloud security

Blogs posted to the RSAConference.com website are intended for educational purposes only and do not replace independent professional judgment. Statements of fact and opinions expressed are those of the blog author individually and, unless expressly stated to the contrary, are not the opinion or position of RSA Conference™, or any other co-sponsors. RSA Conference does not endorse or approve, and assumes no responsibility for, the content, accuracy or completeness of the information presented in this blog.


Share With Your Community

Related Blogs