Every second of every day, billions of people and countless more devices create data that can be found in relatively open places online. This level of data creation is unprecedented in human history — and it will continue to grow exponentially as more people connect to the internet, connected devices proliferate, and we live more of our lives in the digital sphere. But as the universe of publicly available information (PAI) expands, the challenge of connecting the dots grows.
Here’s how enterprise security teams can make sense of a data tsunami:
What is PAI?
PAI is an umbrella term that covers a wide range of data. Some of that data can be found on the open web, news sites and social media. PAI also lives on the dark web, although gaining visibility there can be more challenging than accessing PAI on the open web. Finally, PAI includes a broad range of commercially available data, such as public records.
Why do so many organizations struggle to maximize the value of PAI?
There are several reasons. To begin, using PAI at scale is a relatively new practice. As a result, many security professionals simply aren’t aware of PAI’s value.
But awareness is only one hurdle. While PAI is “open-source” by definition, enterprise security can’t afford to ignore sources that are more challenging to access. Searching for relevant data across the open and dark webs requires investments in tools as well as training.
Searching public records and using other forms of commercially available PAI likewise requires additional investments. While making these investments will pay dividends in terms of better organizational security, the reality is, enterprises often treat security as a cost center. Consequently, these investments are often deprioritized. When they are a priority, it’s typically after the enterprise has already suffered a major incident due to a security lapse.
That said, even enterprises that prioritize security investments struggle with harnessing the value of PAI because of cultural and linguistic challenges. The vast majority of PAI is written in a large variety of non-English languages. Less than 5 percent of the world’s population speaks English as a native language. Meanwhile, the top 30 most spoken languages worldwide represent only 60 percent of the world’s population. But even these statistics understate the challenge considering the proliferation of emojis, GIFs and memes.
Finally, enterprises that prioritize security investments and overcome the cultural and linguistic barriers inherent in maximizing the utility of PAI still face the challenge of turning data into actionable intelligence. One reason: the amount of PAI an enterprise needs to sift through is staggering. But volume isn’t the only challenge here. The velocity of information can be especially overwhelming for the typical corporate security team, which is often understaffed and ill-equipped with tools and training.
Even when an enterprise successfully manages the variety, volume and velocity associated with PAI, that intelligence is often processed in silos. A Fortune 500 company, for example, will leverage PAI that impacts multiple domains, ranging from the firm’s servers to the brand’s online presence, to physical locations as well as points along the supply chain. Unpacking these silos quickly and making that information actionable is difficult for even the most sophisticated enterprises.
Security starts with specific questions
While the totality of PAI can be overwhelming, enterprise security is a much more focused challenge. Rather than thinking about how an enterprise can make sense of all the PAI that’s out there in the world, security teams should identify areas where the enterprise faces potential threats and then formulate specific questions to focus their efforts. Each enterprise is unique, but here are some examples of questions that a security team might consider:
· What might threaten an upcoming event?
· How robust/vulnerable to disruption would it be to a proposed supply chain?
· Are our products being counterfeited, and what is the impact?
· How can we stop counterfeit goods from threatening our brand?
· Is anyone stealing our intellectual property?
· Is the enterprise leaking proprietary information?
Naturally, there are many more questions a security team may want to ask. But remember, the question is the starting point of a research project. Once you ask the question, you can begin to see the relevant context and identify meaningful clues.
For example, a question relating to potential threats for an upcoming event would naturally lead the security team to research the location, timing of the event, attendees, and other details that are connected to the event. From there, the security team will want to know if any individuals or organizations have demonstrated intent to harm. Are these bad actors in geographic proximity to cause harm? Do they have the capability to do harm?
In contrast, a question about counterfeit goods will lead the team in a different direction that involves scanning PAI sources for places where those goods enter the stream of commerce. From there, the security team will need to determine whether the goods being sold are counterfeit or the genuine article is sold in an unauthorized manner. Are there explicit dangers to the brand because of this activity? What are the defining characteristics of the goods in question? Who is selling them? Who is supplying the goods?
Complement human analysts with SaaS
One way to conceptualize the challenge of building relevant streams of information for the security team is to think in terms of trawling the open ocean for a certain type of fish. Several relevant pieces of information like location depth, and temperature would make your efforts toward that end much more efficient. That analogy is also true to finding PAI responsive to your organization’s needs.
Employing a sophisticated software-as-a-service (SaaS) platform built to solve the “three V” problem (Volume, Velocity and Variety) is the key to finding relevant signals inside the noise. Just as with the questions a security analyst uses to focus their research projects, the software should be adept at collecting, curating and filtering information from a broad array of sources and languages to address the unique requirements of each enterprise. The filtering capability, for instance, should at a minimum spotlight or include factors such as location, specific languages, keyword combinations, etc.
To understand how to deploy this functionality effectively, start with a static data set so that you can learn how to refine the data such that it is responsive to your requirements. From there, you can start to explore streams of dynamic information. To that end, cross-lingual search functionality (the ability to type in one language and get results in others) will be key to unlocking much of the value in PAI as most of it is not in English.
But remember, this is an iterative and ongoing process. Over time, a security team will build PAI capabilities. Along similar lines, while specific threats may come and go, the mission continues. By necessity, intelligence analysis is an ongoing workflow that must always consider the latest information.
It doesn’t really matter where an enterprise is in terms of sophistication when it begins that process. What matters is the degree to which the enterprise is willing to prioritize PAI capabilities as part of its larger security framework. In other words, the sooner an enterprise learns to crawl in this space, the better.