Three Outages That Changed How We Think About API Security

Posted on June 17, 2025 by Ravi Teja Thutari

Application Programming Interfaces (APIs) are the backbone of modern systems, carrying sensitive logic and data between services. In fact, a Forrester Research report notes that 83% of web traffic today is attributed to APIs, making them the largest attack surface for applications. Yet many cyber teams still assume internal or high-performance APIs are “secure by default.” Reality has proved otherwise: A Salt Security report published in 2023 found 94% of organizations admitted to security issues with their production APIsr. The following three case studies (each fictional but grounded in real-world patterns) show how overlooked API vulnerabilities can trigger major outages. For each, we examine the root causes, impacts, and recovery steps, then highlight the security lesson learned. Throughout, we emphasize concrete best practices (aligned with the OWASP API Top Ten risks) that every team should adopt.

Case Study: Broken Authentication in a Payments API

A fintech company’s public payment API began misbehaving under heavy load. Over a weekend, automated monitoring detected thousands of dollars of unauthorized fund transfers. Engineers traced the problem to a recently deployed authentication service: the JWT tokens being issued by the identity service were not being properly validated at the payment endpoint. In effect, attackers could craft tokens to impersonate any user.

Root cause: A configuration change disabled strict signature checking on the API gateway (an accident during a library upgrade). When the new JWT token signing key was rolled, the gateway failed to validate expired or forged tokens correctly. As a result, malicious clients could reuse or guess tokens at will, reflecting a classic broken authentication vulnerability – “compromising a system’s ability to identify the client/user compromises API security overall”. In this case, API calls lacked mutual TLS and did not enforce token freshness, so attackers brute‐forced admin-level tokens over several hours before being caught.

Impact: The company temporarily shut down the payments API to stop the fraud. Hundreds of pending transactions failed, triggering customer complaints. Internal systems (ledgers and account balances) required manual reconciliation once service was restored. The downtime cost tens of thousands of dollars and severely eroded trust with enterprise clients.

Resolution and Lessons: The team immediately rolled back the faulty deployment and revoked all issued tokens. A security patch was applied to ensure every incoming request was validated against the current public key, and multi-factor authentication was added for high-value transactions. Importantly, the incident prompted a design review: services were reconfigured to use OAuth2 with short-lived JWTs and automated key rotation, so that any compromised token would quickly expire. Developers also added strict object-level authorization checks on every transfer API call.

Key takeaways: Always enforce strong, validated authentication at the API gateway level rather than relying on downstream checks.Use HTTPS (TLS) everywhere and verify token signatures to prevent “None” or forged JWT attacks. For critical actions (money transfer, account changes), apply additional controls such as rate-limiting or transaction limits and require secondary confirmation. Finally, implement centralized logging and alerts so that unusual auth patterns (like repeated “admin” token usage) trigger immediate investigation.

Case Study: Unthrottled API Causes Outage Under Load

A popular video-streaming service experienced a severe outage when a new video recommendation API was deployed. A bug in the service code caused the recommendation endpoint to enter an infinite loop under certain conditions. Shortly after launch, thousands of automated requests from client apps began to overwhelm the service. Within minutes, Central Processing Unit (CPU) and memory spiked to 100%, and the entire cluster started to crash. Customers saw “service unavailable” errors for the recommendation feature, and cascading failures affected related services (due to overloaded shared database connections).

Root Cause: The incident was fundamentally one of unrestricted resource consumption. The recommendation API had no rate-limiting or quotas, and the bug meant each query spawned additional recursive lookups. Attackers (or even a benign but runaway client) could easily trigger a Denial-of-Service attack at this endpoint. According to OWASP, API designs often lack limits on bandwidth, CPU, or other resources, which can be exploited to cause downtime. In this case, the team’s load test had not caught the infinite-loop edge case, and because authentication was correctly applied, standard monitoring didn’t flag the abnormal recursion until it was too late.

Impact: The service was down for six hours while engineers fought the runaway processes. Video recommendations were offline, degrading the user experience site wide. The DevOps team had to manually restart dozens of instances and revert to an older version that lacked the new faulty logic. Meanwhile, cloud costs ballooned due to autoscaling servers that could not shut down mid-stream.

Resolution and Lessons: The immediate fix was to stop the bad code and deploy a patched version with a conditional break in the loop. The team then enforced strict API quotas and rate limits on the recommendation endpoint. They introduced circuit-breaker logic: if requests per minute exceed a safe threshold, the API now rejects further calls or returns a cached “limit exceeded” response, preventing resource exhaustion. In addition, developers instrumented the service with latency and error alarms (triggering if CPU > 80% or error rates spike) to catch similar issues early.

Key Takeaways: Always design APIs with consumable rates and throttling in mind. Put rate limiting at the gateway or service mesh level, so a single buggy client or malicious bot cannot take an organization’s service offline. Apply circuit breakers and timeouts on every external call. Conduct thorough stress testing, including simulating attack scenarios and faulty inputs, to uncover hidden loops or resource leaks. Monitoring must include not just traffic volume but also system metrics and error patterns, so unusual recursive behaviors are detected before full collapse.

Case Study: Forgotten Legacy API Exposes Sensitive Data

A healthcare software provider suffered a critical breach when a security research lab discovered an unprotected API endpoint. The endpoint–originally built for an older version of the patient portal was left active in production with default credentials and no encryption. The attacker first found this “zombie” API via open source documentation and quickly accessed patient records without any logging or rate-limit barriers. The company realized the breach only when a patient’s data appeared in dark web archives.

Root Cause: The breach stemmed from improper inventory and configuration management. OWASP warns that modern deployments often expose more API endpoints than expected, and without proper documentation or retirement strategies, stale services become backdoors. In this case, the legacy endpoint ran on a separate subdomain, using an old framework with hard-coded admin credentials. A developer had forgotten to disable or migrate it, so it never inherited the main portal strict access controls. Since no one actively tracked this endpoint, it operated without Transport Layer Security (TLS) and sent sensitive data in plain JSON.

Impact: The data breach forced an immediate system-wide audit. The production database was shut off from the network until all legacy endpoints were secured. Remediation included revoking and reissuing thousands of patient records, notifying regulators, and revising internal policies. The company’s reputation suffered greatly, and they faced fines for non-compliance with data protection standards.

Resolution and Lessons: The first step was shutting down the orphaned endpoint. Next, the team established an “API inventory” process: they conducted a complete discovery of all running APIs (using both live traffic analysis and static code scans), ensuring no undocumented services were exposed. All remaining APIs were audited for encryption (TLS) and authentication hygiene. Going forward, any deprecated API version must be formally retired–configurations disabled, and old servers decommissioned. The incident also led to creation of an API catalog (with allowed endpoints) tied into the CI/CD pipeline, so that any new API requires registration and a security review.

Key Takeaways: Maintain a living inventory of every API endpoint–including internal, legacy, and third-party APIs. Remove or lock down unused endpoints immediately. Treat every API as if it could be discovered by attackers: enforce TLS on all interfaces, require tokens even on internal routes, and avoid default or shared credentials. Automate periodic scans (e.g. using API discovery tools or internal pentests) to find hidden or misconfigured APIs . In short, adopt a “Zero Trust” mindset: assume any exposed endpoint is hostile until properly secured and monitored.

Lessons and Best Practices

These case studies illustrate that API outages can arise from security flaws as much as infrastructure failures. The good news is that the lessons are clear and actionable. In every outage above, the same themes emerged: strong authentication/authorization, resource controls, and inventory management. Companies should continuously discover and inventory all APIs, enforce proactive scanning and monitoring, and practice sound incident response. Teams should partner closely (DevOps and SecOps) so that security is embedded from design through deployment.

Contributors

Ravi Teja Thutari

Senior Software Engineer, Hopper

View More Blogs

Blogs posted to the RSAConference.com website are intended for educational purposes only and do not replace independent professional judgment. Statements of fact and opinions expressed are those of the blog author individually and, unless expressly stated to the contrary, are not the opinion or position of RSAC™ Conference, or any other co-sponsors. RSAC Conference does not endorse or approve, and assumes no responsibility for, the content, accuracy or completeness of the information presented in this blog.

Share With Your Community

Related Blogs