How Will Big Data Change Security?
In the first part of this series (published in April 2013) we discussed some of the major technologies that will play a role in the application of Big Data to the practice of physical security.
In the first part of this series (published in April 2013) we discussed some of the major technologies that will play a role in the application of Big Data to the practice of physical security. We offered several conclusions:
Big Data is a perfect fit for cloud computing due to unlimited on-demand resources.
The ROI comes from unique insights not previously possible on smaller data sets or local servers.
Besides economics benefits, there is real potential for improving life safety and property protection with real-time analytics.
In this article, we look at some specific examples of how Big Data techniques can provide benefits at several scales of analysis. Note that these examples are only the tip of the iceberg. They are just illustrations of what can be done. We are limited only by our imagination.
The data sets for these examples come from three different domains. The first two will be familiar to every enterprise security leader: access control and video analytics. Both produce copious amounts of data that contain many hidden patterns and predictive indicators. The third data source is one that may surprise you: log files from Web servers.
Population Statistics vs. Local Analytics
The analytic examples here are drawn at two different scales. The first is large groups of users and facilities who may or may not be related, but are nonetheless correlated with one another due to outside factors such as location, type of business, time of year, etc. We’ll call these “population statistics.” The second is based on data that is local to an organization or even a single facility, but can benefit from the powerful analytics provided in a Big Data context. We’ll call these “local analytics.”
The use of population statistics exploits the fact that security systems generate massive quantities of data that have business and social value beyond immediate use in security management. However, with the legacy model of one security system per building or even one security system per enterprise, the old data populations were never large enough to support this mode of analysis. Today, with more security systems operating across larger, aggregated systems in the cloud, data sets are far more interesting and valuable. The tools we discussed in our first installment allow us to realize this value.
Local analytics provide facility or company-specific data insights, but do so with tools usually reserved for much larger problems. That means more computing power and more sophisticated algorithms brought to bear for much lower cost. Again, this benefit is enabled by cloud architectures that leverage economies of scale and provide capabilities that simply aren’t feasible on the class of servers usually associated with security systems. As Microsoft Research has observed, “big data is inconvenient. It’s too big to fit on a screen, or in memory, or on disk.”
Big Data in Action: When Do People Get to Work?
A simple illustration of population statistics is to use Big Data techniques to answer the seemingly straightforward question, When do people get to work? The answer depends on many factors, but one specific hypothesis we can test is whether there is any significant difference in arrival time based on time zone. To get at this, we examined a data set of around two million anonymized access events for a single workday this year. The same analysis over one year would comprise approximately 1 billion data points – clearly more than the typical security system is designed to process.
Figure One (see Fig. 1, p. 30) shows this data set as a histogram of arrival times for each of three major U.S. times zones. Three waves of activity representing each time zone are clearly visible, and offset by approximately one hour each, as expected.
What becomes more interesting is when we dig further into the data and look at the statistics for morning arrival times in each time zone. As shown in Figure Two (see Fig. 2, p. 30), regional variations occur in the median arrival time, but we can only see this when data is aggregated across a statistically significant population.
How can a business make use of this insight? For one thing, it may suggest – apologies to California – if you want a workforce of early birds, open an office in the Midwest. The data shows that employees in the Central time zone arrive on average as much as 30 minutes earlier than their counterparts across the rest of the country. Overall, though, the data indicates that the regions are perhaps more alike than different. The shape of the activity graphs is remarkably similar for the full day period, showing the same influx, mid-day lull, and subsequent trailing off.
If there’s one thing that Big Data analysis shows us, it’s that data created for one purpose may have more value for many other applications. This type of analysis shown in Figure Two could be extended to look at variations in day of the week or month, or larger seasonal patterns. These results could in turn provide valuable insights to better understand the workforce, commuting patterns, office occupancy, and more. The answers are there for those who have questions, and they won’t all be security users.
How Many of My People Are Showing Up?
If you provide security for a facility, campus, or a nationwide chain of offices, a key factor you need to understand is how many people are accessing your premises on any given day. Understanding both regional and seasonal variations is also important, as well as any changes to historical patterns.
To get a top-level view of this population analysis, we looked at an event sample for roughly four million anonymized users for a single day. We then produced a frequency distribution of number of times each user accessed one or more doors in their facility (see Fig. 3, p. 30). This tells us the level of activity for the day. We found that the single largest group – slightly more than one million – were those who never used their cards at all on the date analyzed. There were also a handful of people who went through a door an astonishing 400+ times in a single day. (It’s hard to see how they got any work done, but that’s another question.)
It’s easy to see how facility owners and others can benefit from this type of data. For example, knowing that roughly one quarter of the people don’t show up on any given day means that you might be over-allocating office space or other building resources. Perhaps it’s time for hoteling instead of assigned office space. Similarly, facilities that show lower or higher activity levels than this norm may have special security or maintenance requirements. At a municipal level, this type of data also has a lot to say about traffic, parking and city planning.
Is This Week Normal or Abnormal?
One of the biggest benefits from large data sets and strong analytical tools is the ability to detect patterns in data, and to show when there are exceptions. In the following example, we use longitudinal analysis of one year of data to find patterns that can be compared to the present.
Figure Four (see Fig. 4, p. 31) draws on a cloud-based history of video analytics produced by surveillance cameras in a commercial office setting. The data analysis software in this case established a baseline moving average of the activity level observed by the cameras (green line). It then overlaid an abnormal week’s observed event levels on top of the historical norm for this location (blue line). The obvious question is, Why is this week so much busier than the average? A better question might be, Would anyone have detected this was an abnormal week without the benefit of Big Data analytics? Compare this type of insight to what is possible with the “naked eye” or the usual reporting functions available in local security systems.
How Do My Properties Compare?
One of the most common questions in an enterprise setting is, How do my facilities compare? This question is often the subject of store-to-store comparisons in retail, for example, but could just as well apply any time there are large numbers of facilities brought under a single security umbrella. Understanding large quantities of data in a single analytical framework provides insights that would not be possible unless the data is aggregated into a single system with the speed and power to provide answers in a timely fashion.
Figure Five (see Fig. 5, p. 31) shows security events at four different locations within the same enterprise. In this case, the data representation shows at a glance that all four of these facilities exhibit approximately similar security event traffic. None of them really stands out – and sometimes that’s all you need to know. The trick here is to produce that knowledge quickly, on demand, without lengthy user or IT involvement.
How Does This Year Compare to Last Year?
Companies often want to measure the effect of new initiatives to see if they are having a positive effect on outcomes. Again, Big Data visualizations make this a straightforward process that quickly yields results for large data sets. Figure Six (see Fig. 6,
p. 31), 2012 clearly has a lower incident rate than 2011. A multi-dimensional analysis could expand on this technique to compare changes across multiple facilities undergoing the same improvements.
Improving the User Experience
Earlier, we indicated that an unusual data source in the security domain is log files from Web servers. What’s unusual is that it’s not security data per se. Rather, for cloud-based security systems (or any system supporting front end browsers), the data represents the behavior of the security personnel actually using the system to perform their jobs. Why is this important? In a world where consumer-oriented Web companies intensely study online user behavior to improve their applications, security system customers appear to be last in line for this aspect of User Experience (UX) enhancement. That’s a detriment to both our customers and the people we protect. A simple example of the kind of data we can extract from a few billion Web clicks is to understand which activities are performed most often by security managers; hence, where we should focus our User Experience improvements. Figure Seven (see Fig. 7, p. 31)distills a year’s worth of online customer behavior into the five most commonly performed security actions.
Where to from Here?
The examples presented here are all retrospective in that we have compiled and analyzed events that have already happened. As previously mentioned, they only scratch the surface, and are limited solely by our imagination.
Next time, we’ll look at the predictive value of security information in a Big Data context. After all, it’s nice to understand the past, but our real goal is to understand and change the future.