Not much thought has been given to protecting unstructured data because historically, hackers, phishers and thieves target large structured databases and for years now, email. But it isn’t anyone’s fault. It is merely a symptom of the lack of maturity in the tools that aim to tackle this unruly problem. Traditional Enterprise Data loss prevention (DLP) tools were not initially designed for protecting unstructured data, and encryption and policy are not centralized and few have taken advantage of improvements in recent years. In the meantime, unstructured data has piled up and is growing.
To target this problem, a new set of vendors and products emerged with “data-centric” solutions adding to the confusion. So many vendors with a variety of capabilities to choose from, but how do you know which is right? What vendor do you choose? The answer to these questions is to think more about what you want to accomplish and weigh the approaches first. Having insights into the differences between the three predominant approaches to data-centric security and understanding their unique benefits will allow you to make better decisions around the best fit to strengthen the front lines of protecting unstructured data while complying with privacy and industry regulations and data governance policies.
The road to data centric security
Data-centric security encompasses a wide range of processes and tools, many with overlapping functionalities. Adding to this confusion was a flurry of gap-filling point solutions such as end-point protection and CASBs that aimed to address cloud adoption and mobility.
When you boil it down, data-centric security is about the protection, visibility and control of sensitive unstructured data. The majority of vendor solutions work to address each – but they do so in varying degrees, in different manners and with different levels of success, and without protecting the data first, and not at the file level.
Protecting the data first, at the file level should be the foundation of your data-centric security policy in today’s threat environment. Visibility and control are critical, especially for privacy requirements, but without the data being appropriately protected (encrypted, anonymized, redacted) at the file level, you are just as well doing nothing. This approach also helps cut through the confused world of data-centric everything and get to the task at hand; avoiding data, financial or reputational loss.
A quick way to drive out the distinctions between vendor solutions is to focus on what the solution targets to defend, and the primary tools used in its defense. You’ll find three predominant approaches in the market today:
- Flow-Centric (DLP)
Understanding these different approaches and their degree of integration with other processes and tools will help you navigate vendor engagements, get to the answers you need, and help you make decisions.
Also known DLP, or tools that monitor data as it traverses within and outside of the network. DLP is a check point at ingress and egress points (i.e. server, network, and endpoint locations) where high volumes of traffic occur such as email messages and file transfers. It is based on rules that contain specific conditions, actions and exceptions that filter messages and attachments based on their content. Typically log, block or allow, although some more modern tools are leveraging AI/ML to augment or replace traditional tools.
This flow centric or DLP approach will ensure that the data that you want to protect will either get blocked from a specific action, for example, if a user attempts to send a file through email with sensitive content, DLP will alert and the email will either get blocked or encrypted depending on policy. Another example is if a user copies a file containing intellectual property onto a USB drive, DLP will alert management and block the action. In theory, these are good examples of preventing data from leaking, but could also interrupt important workflows and slow productivity.
Fortunately, these approaches are a great way to prevent information assets from leaking, however they are not designed to protect all data types. They only monitor files that violate policies leaving blind spots with incorrectly written policies and files that have not been classified. In fact, most organizations don’t really even know what sensitive unstructured data they have despite it representing at least 80% of data in the organization.
Other factors with this approach are that they can be:
- Difficult to deploy and manage, taking months or years,
- Difficult to scale as the number of endpoints increase and cloud adoption grows, and
- Cost prohibitive
Most importantly, this approach doesn’t protect the data itself – it merely prevents data from leaking out of the network.
Protecting data at the folder level is a relatively common approach, also known as the “walled garden.” It targets folders, file-shares and disks containing sensitive data. It typically:
- Encrypts folders
- Can be password protected and
- Allows for permissions for specific users or groups across network shares
Some tools that are used with folder centric approaches are through the use of tags that work with user behavior monitoring and Identity and Access Management (IAM) tools to flag sensitive information and take appropriate action.
While in many ways this approach is reasonable, it comes with challenges that can be cumbersome and difficult to manage. To put things in perspective, when targeting folders, scale is a challenge. For example, in an average large enterprise, 1 terabyte of data can be spread across 50,000 folders. When using permission control, any changes to access would require finding the folder(s) that need changes and make changes to each individual file. Even in cases where you have “parent” and “child” folders for business units or groups, if permissions get broken, you will still need to change or fix permissions to the “child” folders manually. Therefore, not only is scale a problem, but managing access control to an exorbitant number of folders can be time consuming with a high risk for mistakes.
Monitoring user access and behavior is important because identifying potential unauthorized accessing of sensitive data in folders, you’ll need to visibility. But look at the bigger picture and what you wouldn’t see. If an authorized user accesses a folder, copies a file with sensitive information and sends it to an unauthorized user through email, the copy of the file is no longer visible. Even more alarming is now that you’ve lost visibility into that action. Once the data leaves the secured folder, unless it is returned, it is no longer controlled or protected. Monitoring is monitoring and not protecting the data with encryption.
With folder centric solutions, detecting anomalies through user behavior analytics can be useful in determining who accessed what folder generating a multitude of alerts. However, an active stream of alerting activity can be burdensome on IT teams every time an event happens in response to violations of policy.
The target for a file centric approach is just that, files. Files or documents that are created organically in common applications like Microsoft Office, or derived from information that has been drawn out of secure structured databases and saved as pdf or txt files and multiplied across users, devices, and clouds. Targeting files allows you to:
- Protect those with sensitive information with encryption
- Secure them regardless of their location (device, cloud location, server)
- Automate classification and protection
If you protect sensitive files with encryption, security then travels with the file wherever it goes, so that it’s safe even in cloud locations, file-shares and on any device. There’s no need to replicate data protection as it is in use, in transit, or at rest because protection is now the DNA of the file.
A file-centric approach to document protection focuses on the content of the file, who has access to it, and what policy states how the file is protected. If the file contains information that is considered sensitive, either subject to regulatory mandates, or is considered intellectual property, appropriate protection will be applied automatically regardless of where the data is or moves. Granular rights like access control or allow/disallow specific functions like copy/paste, screen capture and edit can also be applied through information rights management to the individual files.
To be effective, a file centric approach should consist of specific automated functions such as data discovery and data classification and controlled from a centralized policy management console. This promotes ease of use, frees users of the burden of making security decisions, and sets unstructured data security initiatives up for success from the start.
Automated discovery, classification and encryption process
To properly protect unstructured data, you need to know where it lives. With the adoption of cloud, you have essentially lost control and visibility over what information is being pulled from structured environments and saved or stored in alternative file types such as Microsoft Word and PowerPoint for legitimate business purposes. It is unknown when files with sensitive information are manipulated, accessed by and shared with either internal or external 3rd parties.
Keep in mind, that changing your approach does not mean that investments in existing flow-centric solutions or DLP are meaningless. In fact, if you are struggling with finding a return on investment with existing solutions, this approach will finally provide the ROI because it fills the gaps in what E-DLP and folder centric approaches have. It means that, with a file centric approach, your unstructured data is secure regardless of where it is or goes.