Calculating the Importance of Norms in Big Data
Fans of Douglas Adams’ “Hitchhiker’s Guide to the Galaxy” will recall that the “Answer to the Ultimate Question of Life, the Universe, and Everything” is 42. That’s it. Just 42. The Ultimate Question itself may be unknown, but the answer is certain. It is 42. No context. No meaning. 42.
As the Big Data enterprise begins to produce more “answers,” it is important that we not accept the sort of context-free results that the massive computer named “Deep Thought” produced in Hitchhiker. Instead, we need to be able to compare one set of results to another to see where we stand, whether we’re talking about loss prevention, incident rates, video analytics or any other dimension of security.
In this final installment of our four-part series, we’ll look at both the sources and uses of normative data in Big Security Data.
What is Normative Data?
In its simplest form, normative data can be thought of as statistical samples of large data sets. It provides answers to question like:
- How many times does X happen per week at a commercial office building?
- How many times does X happen in retail stores vs. commercial property?
- What percentage of employees or visitors exhibit behavior Y nationwide?
- Has the percentage changed since last year? Or seasonally?
- Are incident rates worse at certain kinds of properties?
- Are my incident rates worse than national or regional norms?
- Is there anything out of the ordinary in this week’s data?
- What trends occur after public safety warnings? Hurricanes?
If Big Data could provide this sort of information to help our customers make decisions, wouldn’t that be a huge improvement in the way we practice security?
How Can We Get It?
I had the pleasure of working in the healthcare informatics business for a number of years early in my career. By contrast to the security industry, healthcare is a field rich in normative data sources. They are collected by doctors and hospitals and public agencies, reported to states and quality boards, and analyzed extensively by for-profit companies trying to give their clients an edge.
The result is that for almost any given situation, a consumer or provider or insurance company can compare performance and cost against known averages that are sliced and diced 10 ways from Sunday. This allows all stakeholders to have a more productive conversation about “the facts on the ground” and how they compare to current best practices, historical performance, comparable stakeholders, regional variations or any other measure deemed relevant.
Other real-market examples of norms that help improve overall industry performance include: airline on-time performance statistics; automobile quality ratings; manufacturing defect rates; consumer product safety ratings; advertising effectiveness measures; financial services performance; and the list goes on.
Today, however, the data in the security industry is largely fragmented and not available for analysis outside of a single enterprise. This makes any attempt at standardized norms or comparative evaluation a rather parochial exercise. This compartmentalization of data is largely a byproduct of the stovepipe system architectures that have dominated our software vendors, as well as the absence of any regulatory reporting requirement to draw the data out.
What is the Future of Security Norms?
Cloud computing is beginning to surmount the challenge of stovepipes, now that SaaS vendors in many verticals are able to anonymously aggregate data for the benefit of their entire customer base. If you look in the fine print of almost any SaaS agreement, most of them will have one or more terms indicating your consent to anonymous data aggregation. This key legal term marks the starting point for deriving valuable information for the industry as a whole.
Of course, no one vendor will ever hold all the data, but that doesn’t mean individual SaaS services can’t still provide enormous benefit through Big Data offerings. As I witnessed in the healthcare industry, it was often “valuable enough” for a hospital to be able to compare itself to just a subset of other hospitals. That’s because a random sample of part of a group will tend to exhibit the same statistical properties as the whole group, or at least be close enough in many cases to be valuable enough for performance improvement.
And the Answer You’ve Been Waiting for?
The answer is actually 48, not 42. Or at least that’s what 3 million of our anonymous users tell us about how often a door is used each day.