PhysicalPhysical SecurityVideo Surveillance

Selecting computing architecture for AI-powered video analytics

surveillance camera on white wall — *Image via Unsplash*

Artificial intelligence (AI)-powered video analytics is all the rage. Gone are the days when human operators were responsible for overseeing the movements of visitors to shopping malls, parking garages and the like.

The main power and allure of AI-powered video analytics lies in its predictive qualities. The systems can be trained to understand how people behave normally in any setting and what possible hazardous behavior could look like. So, they can be taught how to catch someone in a school or mall who pulls out a gun. That threat then still needs to be validated by a human agent (more about that later), but the first AI-powered 'sweep' can be made to be very effective.

Companies and organizations looking to make use of the new video analytics technology find themselves having to make important choices pertaining to their setup. This is because legacy cameras that are older than 10 years will have very limited use. This outdated technology simply does not have the 4K capabilities that are needed for the efficient use of the analytics software. These cameras will need to be upgraded if they are not already 4K. That part is pretty straightforward.

Decisions become more complex when the computing architecture needed to process and store the data from the camera network comes into view.

Centralized, on-premise servers

Centralized on-premise servers, also called digital video recorders (DVRs), today still provide the computing power of the majority of video surveillance systems in this country. These servers come with the advantage of being able to manage and store video for up to hundreds of cameras, however, for companies looking to employ AI-powered video analytics, they fall short in their inability to provide any kind of meaningful statistical analysis, search, or AI capabilities.

This is because these systems feature many drives for storing the video, but typically, a single processor manages both the drives and the video streams. The single processor cannot process all of the video streams and still be able to apply object classification in real time to the same streams.

Public cloud

Working through a cloud service comes with the advantage that you may have access to thousands of GPUs for faster training on collected data. However, when that cloud is public, problems can arise. By definition, the public cloud suffers from considerable latency because every piece of information has to be relayed through a secondary device, delaying the transit time of data significantly.

This latency means that organizations will not have a significant issue storing recorded footage, but sharing real-time streams from a camera to different other points of their network will become highly problematic. That mall visitor who pulled out a gun? They are no longer walking where the streamed image indicates they are. Latency makes it impossible to manage the situation and contain the threat. The human operator who will have to make a GO/NO GO decision on escalation will be at a loss. And it will serve no purpose in streaming that image to different concerned store owners in the malls.

Public clouds also come with issues pertaining to privacy and data protection. Who will access owned data once it has left for the public cloud? Will it be shared with other parties by the cloud service provider? Will the cloud service provider secure it sufficiently against leaks? There is often very little transparency on what data and privacy protection policies are in place, and it is of course completely impossible to gauge to which degree these policies are followed.

Private cloud

A private cloud service offers the benefits of trainability, but in a set-up where the company or individual remains in full control of its own data. Any AI-powered video analytics solution can still be trained on whatever data the video surveillance system has provided, but the company keeps a complete view of how that data is then used and stored.

Edge computing

Through edge computing technology, companies can place processing power very close (at the “edge” of the network) to their cameras. The advantage is drastically shortened latency which makes it possible to actually manage a hazardous situation in real-time effectively. To go back to the previous example: The visitor is still carrying a gun. The mall owner can now, thanks to the edge processors, make images with very limited latency available to the human operators who need to validate the threat, and this stream of current images can be made available to whoever else has a need to know.

Edge processors are good at processing data, but are not built to store data, which means that companies that want to archive their data streams for longer periods of time, will still need storage elsewhere. That is why almost all edge computing solutions work in tandem with either a public or private “edge-to-cloud” solution.

Parenthesis: Fog computing

With fog computing, every camera is directly linked to the cloud. Allowing for the cloud to process and store data at the edge, seems good on paper, but is as yet a mere academic reality. Also, fog computing’s additional layer comes with the disadvantages of extra power consumption and tedious data management between nodes.

Organizations will need to decide what they want to accomplish with their surveillance and what budget they have available before making a decision in favor of any (combination) of the options discussed.