DevSecOps is (appropriately) emerging as the de facto pattern for managing and deploying applications and managing infrastructure. Security controls, deployments, and virtually all other aspects of enterprise systems should integrate and, from the beginning, automate. However, we should take care of the trust we place in automation and realize that blind spots in DevSecOps are always shifting. Organizations rely on code and decision-making created by humans who are prone to making human mistakes. When security incidents occur, they tend to happen at the seams and cracks of an organization, where the automation is incomplete, observability is not omniscient, and humans are still in the loop.

I recently was involved in an incident that is unfortunately all too familiar to readers: API key exposure. Some time ago, we received an email from AWS informing us that an IAM keypair was exposed in a public git repo. As all security teams would, we immediately took action to mitigate and remediate. However, over the course of the investigation, some of the details didn’t make sense, particularly given the controls we had in place. We had automatic and mandatory secrets scanning on all of our repositories. We have our SIEM in place and had already caught similar things in the past, but it was quiet this time. We also had deprecated and removed our IAM users in favor of SSO and short-lived API keys. These three tools, when combined, should have ensured this never occurred in the first place. So, what actually happened?

The investigation confirmed what we already knew: that the keys were real, legitimate, and belonged to a moderately high-privilege user. The origin of the exposure was what confused us, though. It came from a repository in one employee’s personal namespace on Github, not our company namespace. Additionally, no developers appeared to be aware of its existence, including the person who owned it. To add a further wrinkle, the commit originated 26 days before the exposure was noticed, but logs showed a startling lack of abuse of those keys. In fact, they clearly showed when the keys were discovered by an automated scanning tool, validated, and then subsequently reported to Amazon, who automatically suspended them, all in the space of a few minutes.

It quickly became apparent that the issue was at the seams and gaps of our organization’s detection and automation. While the automation around secret scanning and IAM key management was in place, their effectiveness and correct function was assumed. If not for another fortuitous accident that eventually caused the exposure, it might have been much longer before this was detected.

How the key ended up in a public repository continued to baffle us until, on a hunch, one responder asked the developer who owned the repository if they used VS Code with any plugins that required authentication to Github. As it turned out, the ultimate step leading to exposure was when VS Code attempted to helpfully create a repo on the user’s behalf when it believed that the user did not have write access to push a commit back to the original repository. Instead, it created a new repository with a default name in that user’s namespace. Shortly after that, the now exposed credentials were detected by an automatic scanning service that runs on all public Github repositories and the keys were suspended by Amazon. Case closed? Well, nearly.

We observed that the secret scanning we believed to be functional and ubiquitous inside our own organization was failing silently in some cases  at the seams. Remember how I said we had deprecated all IAM users and were removing them? We had just one AWS account remaining to fix, and the key just so happened to originate from that one. The SIEM, however, performed exactly as expected and turned out to be the critical piece in completing the investigation. It just happens that there was nothing to detect.

What can we take away from an incident which initially appeared to be a straightforward key exposure?

●    Complex systems fail in complex ways. Nobody would have predicted the compound failure required to expose a key in this fashion.

●    Despite having robust controls and tooling, extremely fallible humans are still in the loop at nearly every stage of the development process. Therefore, care and attention must be paid to the control and continuously testing it to provide assurance that it remains effective.

●    All of the prevention in the world is no substitute for rapid response. The inverse is also true.

Remember: Incidents tend to happen at the seams and cracks of your organization, where the automation is incomplete, observability is not omniscient, and humans are still in the loop. Our blind spots are constantly evolving, and we must update our mental models of how to approach security accordingly.