A new report reveals that more than 30% of cloud data assets contain sensitive information.

Dig Security recently released findings from its first-ever State of Cloud Data Security 2023 Report, which analyzes more than 13 billion files stored in public cloud environments and reveals how sensitive data is at risk in the modern enterprise.

Dig’s researchers found that more than 30% of cloud data assets contain sensitive information. Personal identifiable information (PII) is the most common sensitive data type that organizations save. In a sample data set of 1 billion records, more than 10 million social security numbers were found (the sixth most common type of sensitive information), followed by almost 3 million credit card numbers, the seventh most common type.

Cloud adoption is driving widespread data sprawl, which introduces risk that leads to security and compliance breaches as data is constantly shared, copied, transformed and forgotten. But if it is known where sensitive data is located, it is easier to manage risk and secure data. Dig’s research found the most common sensitive data type organizations save is PII containing employee and customer data.

Additional report highlights

  • 91% of database services with sensitive data were not encrypted at rest, 20% had logging disabled and 1.6% were open to the public.
  • More than 60% of storage services were not encrypted at rest, and almost 70% were not logged.
  • 95% of principals with permissions are granted them through excessive privilege.
  • More than 35% of principals have some privilege to sensitive data assets. Almost 10% have admin access and almost 20% have consumer access to a sensitive asset.
  • Almost 10% of principals have consumer permission, and around 5% have admin access to PCI data.
  • Almost 1% of sensitive assets are shared with third-party vendors, and more than 2% of sensitive data assets are at risk due to direct access from a remote account.

Sensitive data, on average, is accessed by 14 different principals, and 6% of companies have sensitive data that has been transferred to publicly open assets. Compounding the issue is the frequent flow of data across geolocations. Sensitive information accessed from different geolocations is common. More than 56% of sensitive data assets are accessed from multiple geographic locations, and 26% are accessed by five or more geolocations. As data flows, the risk grows with 77% of sensitive data assets having more than one cross-service flow.

  • 40% of data flows to data lakes (Hadoop and Snowflake). Hadoop ingests 37%, which duplicates sensitive data into an unmanaged environment.
  • Replication between storage assets is responsible for 30% of the activity involving sensitive data.
  • More than 50% of sensitive data assets are accessed by 5-to-10 applications, and almost 20% of sensitive data assets are accessed by 10-to-20 applications.