GitGuardian announced the results of its 2021 State of Secrets Sprawl on GitHub report. The report, which is based on GitGuardian’s constant monitoring of every single commit pushed to public GitHub, indicates an alarming growth of 20% year-over-year in the number of secrets found. A growing volume of sensitive data - or secrets – such as API keys, private keys, certificates, username and passwords end up publicly exposed on GitHub, putting corporate security at risk as the vast majority of organizations are either ignoring the problem or poorly equipped to cope with it.
According to the report, 12% of leaks on GitHub occur within public repositories owned by organizations and 85% of the leaks occur on developer’s personal repositories. Secrets present in all these repositories can be either personal or corporate and this is where the risk lies for organizations as some of their corporate secrets are exposed publicly through their current or former developer’s personal repositories.
Types of Secrets Found
- 27.6% Google keys
- 15.9% Development tools (Django, RapidAPI, Okta,
- 15.4% Data storage (MySQL, Mongo, Postgres,...)
- 12% Other (including CRM, Cryptos, identity providers, payments systems, monitoring)
- 11.1% Messaging systems (Discord, Sendgrid, Mailgun, Slack, Telegram, Twilio…)
- 8.4% Cloud provider (AWS, Azure, Google, Tencent, Alibaba…)
- 6.7% Private keys
- 1.9% Social network
- 0,8% Version Control Platform (GitHub, GitLab)
- 0.4% Collaboration tools (Asana, Atlassian, Jira, trello, Zendesk...)
Top 10 File Extensions
As you might expect, with the many programming languages, frameworks and coding practices adopted throughout the world, there is a very long list of extensions that can contain secrets here is the view of the top 10.
- Top 10 file extensions account for 81% of all the results,
- The top 3 accounting for over 56% of the results:
- 27.7% Python
- 9.6% Environment variables file
- 7.5% JSON
- 4% Properties
- 3.6% PEM
- 3.2% PHP
- 2.7% YAML
- 2.2% XML
- 2% Typescript
GitHub is more than ever “The Place to Be” for developers when it comes to innovating, collaborating and networking. GitHub gathers more than 50 million developers working on their personal and/or professional projects. When 60 million repositories are created in a year and nearly two billion contributions added, some risks arise for companies even if they don’t use GitHub or open source their code, because their developers do.
As architectures move to the cloud and rely more on components and applications, the growth of commits occurring and the use of digital authentication credentials has increased the number of secrets detected. To compound the problem companies are pushing for shorter release cycles, developers have many technologies to master, and the complexity of enforcing good security practices increases with the size of the organization, the number of repositories, the number of developer teams and their geographical spread.
Companies can’t avoid the risk of secrets exposure even if they put in place centralized secrets management systems. Solutions are available for them to automate secrets detection and put in place the proper remediation, but the market is far from mature on this subject. “The reality is most organizations are operating blind. Most leaks of organization’s credentials on public GitHub occur on developers’ personal repositories, where organizations often have no visibility, let alone the authority to enforce any kind of preventive security measures.” Jeremy Thomas, CEO GitGuardian
Some best practices can be followed to limit the risk of secrets exposure or the impact of a leaked credential:
- Never store unencrypted secrets in .git repositories
- Don’t share your secrets unencrypted in messaging systems like slack
- Store secrets safely
- Restrict API access and permissions
But respecting them is not sufficient and companies need to secure the SDLC with automated secrets detection. Choosing a secrets detection solution they need to take into account:
- Monitoring developers’ personal repositories capacities
- Secrets detection performance - Accuracy, precision & recall
- Real-time alerting
- Integration with remediation workflows
- Easy collaboration between Developers, Threat Response and Ops teams.