The Truth about Unstructured Data
The exponential growth rate of unstructured data is not a new phenomenon. While the hacking of a database to steal sensitive credit card or personally identifiable information is what dominates the headlines, the reality is that a large amount of an organization’s intellectual property and sensitive information is stored in documents. But they are unsure of where it resides or how much of it they have. And worse yet, it is accessed, shared, copied, and stored all in an unprotected state.
Managing and controlling unstructured data is by far one of the most challenging issues of data security for enterprises. All personally identifiable information and other sensitive information, corporate or otherwise, should be protected with encryption and persistent security policies so that only authorized users can access them. In this article, I will discuss the key drivers behind the influx of unstructured data in enterprises, the risks associated with not properly managing and securing unstructured data, as well as best practices for document protection.
Unstructured data is not dark data (although it can be depending on your definition of dark data) or social media, but it is the collection and accumulation of documents (files), emails as a file in a folder, and file sharing that takes place every day in businesses around the world. It’s the on-going creation of everyday information pulled from structured databases and saved in a variety of formats from Microsoft Office files, PDF, and intellectual property such as CAD drawings – photos and graphics – created for internal use, drafted for external use, and/or published via social media and other channels, just to name a few categories.
According to Search Technologies, eighty percent of data is unstructured, yet the issue of securing unstructured data is still low on the security radar. Adding to the chaos of unstructured data are numerous challenges, including stricter regulatory requirements; protection of intellectual property (IP) and trade secrets; disparate security domains beyond traditional corporate WAN/LAN into cloud, mobile, and social computing; and preventing threats by insiders, both accidental and malicious.
Traditional security has focused on preventing a breach of the enterprise perimeter with layers of physical and electronic security, using a range of tools such as firewalls, filters, and anti-virus software to stop access. Once those measures fail or are subverted, intruders gain access to all the (figurative) candy in the candy store and potentially “crown jewels”.
The first attempts to deal with unstructured data came via Enterprise Digital Rights Management (ERDM) systems. Such dedicated systems typically didn’t work well with existing workflows, required training, needed staff time to manage, was often not realistically scoped and had unforeseen negative impacts on other IT functions. At the end of the business day, ERDM projects were often stranded at the security doorstep.
A better approach is to accept the free-wheeling chaos of unstructured data and adapt technologies that find it in the enterprise, classify and prioritize it, and protect it via encryption with policies on who can see or access the data.
The first step is discovery, using a scanning process to analyze file information across enterprise files, discovering unprotected files and looking for sensitive information. A scanning process can be instructed (on an automated basis) to review certain types of files, such as Microsoft Office (Word, Excel, Powerpoint), images, PDFs, CAD drawings, as well as the names and contents of files that match regular expressions or keywords. In addition, discovery can include analyzing unprotected data files along with files that have been encrypted by a protective process and “watermarked” with a digital rights management (DRM) token. Discovery of unstructured data is a constant process, analyzing data in motion between computers and networks, data at rest (storage), and data in use when a document is opened, with the potential for data to be shared, printed, copied, or saved in an alternative file type (i.e. word to pdf).
Securing unstructured data via encryption is a necessary and logical step, but encryption alone is not enough. A more robust approach adds a unique “tag” or embedded ID into the encryption process to the final protected file, providing the basis to track changes to and copies of files and provide user access policies through a centralized corporate file management process. The embedded tag can be used to restrict access to data in the encrypted file to a specific user or designated classes of users, as well as providing the ability to trace the creation and migration of data from one computer within the enterprise to anywhere else within or external to the enterprise, from endpoints to clouds and backup storage locations.
Unique tags can also be used within a centralized file management process to easily revoke access to all file derivatives and renamed copies wherever they reside within or even outside of the enterprise. Tags also provide the capacity to limit access to data regardless of location. As unstructured data is continuously on the move, it becomes one of the most at-risk assets in the organization.