Content Analysis: Promises, Promises

The promise of Video Content Analysis or VCA technology for surveillance systems is to potentially gain back the active role of surveillance in security and alert the security personnel what, where and when something worth further investigation had happened.

VCA processes the video stream from each camera in real-time, detecting and tracking moving objects and annunciating alarms upon certain conditions that indicate prohibited or suspicious events. Automatic analysis of surveillance video content facilitates the logging of all video events, alarms, behaviors and interaction between moving objects in a database, providing a searchable storage that annotates and catalogues the video from each camera. With these automated capabilities, active surveillance promises faster situation assessment and faster dispatch and response, reducing property damage, losses, fines, liability, insurance premiums, and loss of life and injury.

Enterprise fit

But before getting too excited about VCA, it is useful to briefly look where it fits into the enterprise security system. Any new technology debuts with promise, but not every promise is always realized. Understanding VCA’s capabilities can help put the technology and its most imminent uses in perspective.

Behind any given security surveillance system, most companies have an enterprise-level security policy. Among other things, these policies often delineate the company’s business model, prioritize importance of the model elements (physical, network, cash, human, mobile, etc.) and characterize threat in the context of operations.

A common architecture for an enterprise video surveillance system is shown in Figure 1 with the desired VCA components. This VCA system is a back-end process to the overall system, implying that the cameras, video distribution network and other components must be not only fully functional but also properly designed and sized for the task. But this is not always the case for existing systems.

Cameras and optics must be selected to perform optimally in their environment (day, night, all-weather, artificial illumination, etc.) to cover the area of interest and with resolution to detect the smallest of defined threats. Location and orientation of cameras should provide the best viewing geometry to detect objects of interest with minimal obstructions, minimizing areas of little interest, such as the sky. The video distribution system could be coax, fiber optic or wireless, depending on which data transfer conduit is best suited to the environment and distances involved.

However, special attention must be paid to bandwidth limitations to ensure the video at the head-end is at a sufficient frame rate and free of compression and transmission artifacts for optimal VCA processing. Most VCA systems require a minimum of 7 fps, but especially fast objects moving close to the camera could impact this requirement. Settings for digital video encoders/decoders should be optimized to available bandwidth and the VCA processing requirements. Most wireless video transmission systems have automatic bandwidth control mechanisms that drop so-called “lower frequency/priority” video data if bandwidth becomes restrictive. However, such dropped data usually leave video artifacts that can cause false detections for the VCA.

Bundled with DVRs

Some VCA systems are bundled with DVRs, requiring either the replacement of the existing recording system or adding another recorder. Others are inserted into the video path, adding another potential point of failure. The VCA should only take an uncorrupted sample of the video, therefore not inserting itself into the video channel and possibly compromising the existing video surveillance system.

It is the opinion of the authors that for robustness, VCA systems are best loosely integrated with DVRs to query video playback of alarms and events, and in a manner allowing only the specific server, camera or other component to be replaced or booted without bringing down and re-booting the entire system. Another, but more important, benefit of not inserting the VCA component into the direct video path, is that interference with ongoing security operations is eliminated or easily minimized during installation, calibration, training and testing of the system.

The placement of VCA in the security surveillance systems is the subject of much current research. Several approaches are being investigated, from attempting to put VCA with the camera – without compromising performance and effecting substantial cost increase of the camera system – to platform-independent implementations of such systems.

Video Content Analysis for smart surveillance systems comprises three primary components: object detection and tracking, object classification and event recognition from video. These components entail several open problems in computer-vision research. Even though such problems may not have complete solutions, today’s smart surveillance systems help security personnel by restricting the application domain, specifically, classes of objects and types of events that can be modeled by security policies and regulations.

Object detection and tracking is the first step in any VCA system. While several tracking algorithms have been researched and published, only a handful of these approaches have made their way to the product implementations. It is of utmost importance that objects are detected and tracked appropriately, as every subsequent process is highly dependent on accurate object tracks.

Each VCA product uses its own proprietary enhancements on one of the proven tracking algorithms, customized to their specific environment. For example, Los Angeles-based Northrop Grumman Corp.’s AlertVideo system uses an object detection and tracking algorithm with adaptive background subtraction, followed by foreground object segmentation using edge and region-based correspondences. The background model constitutes a mixture of Gaussians for R, G and B channels to eliminate the effects of smooth lighting changes, rain, snow and wind blown leaves. After background subtraction, candidate foreground object pixels are grouped by using connected components and object regions are reinforced by edges. Observed object velocities, color makeup and a few geometric features are used to track objects in consecutive frames. Each detected object is assigned a unique ID number, which is preserved as long as the object is tracked in a view.

Object recognition/classification from still images has been an area of active research for several decades and more recently applied to video object recognition. Object recognition/classification facilitates better understanding of video events and only more advanced VCA products offer the object classification capability. Some VCA systems use a learning-based approach for object classification to provide maximum flexibility and adaptability to the environment. Systems must distinguish between vehicle types, biological forms (a man from a deer), types of airplanes and watercraft, or a person’s posture – are they walking normally, or crouching or prone, etc. End users should note that systems trying to do higher level processing without adequate object detection, tracking and classification will yield higher false alarms in the event recognition phase.

Through Video Event Recognition, objects’ classes and tracks are analyzed by event understanding algorithms to detect anomalous behaviors. To understand the events in video scenes, both individual behaviors of objects and relationships among the objects must be understood, and elementary components of more complex behaviors need to be resolved. Again, several approaches, from rules-based to model-based and natural language understanding, exist in the market. This is one area in which the security policies of users greatly help to simplify the task at hand.

VCA benefits

Essential to this process is an extensive detection library, which should cover most required physical security and perimeter intrusion events (see Figure 2). Some elementary events can be used to describe more complex ones. Systems must also be able to identify key split and merge behaviors, where a single object splits into two or more objects and two or more objects merge into one object. Trackers can identify split and merge behaviors and parent/children objects in each such behavior. These behaviors serve as the key behavior components for several higher-level events such package drop-off, exchange between people, people getting out of cars or forming crowds, etc.

Maximizing the benefits of VCA requires an integrated system examination of the end user’s current video security system, with specific attention paid to the items and issues discussed above. Whether it’s an addition to an existing system, or part of a brand new, IP-based installation, a VCA system design must start from a company’s security policy and concept of operations to identify and prioritize threats. The VCA should put no restrictions on the sensor (camera) and its optics needed for the specific physical infrastructure environment, and do so with sufficient resolution to detect and identify the smallest threat. Camera orientation and location should be re-examined to ensure proper coverage and optimum viewing of objects in the scene. The video distribution system must provide a faithful representation of the video at the head-end with sufficient frame rate for the threats to be detected.

The primary purpose of the VCA is to detect, track and identify moving objects such that when their behavior exhibits a threat or is unsafe, an alarm is annunciated. This requires the VCA to identify and classify the object from the video information available. Even to the point that if like objects, say two cars or two people, come together and then split, each object maintains a unique identification. This aids tracking but also allows the VCA log to maintain a consistent register of all events and alarms exhibited by that object during its life in the camera view.

Simple behaviors like wrong direction, abandoned object, speed, intrusion, stopping or loitering can be an alarm trigger, but more complex events are also easily established. For example, the system can be programmed to allow vehicles at a control gate, but alarm on pedestrians or bicyclists. An vehicle that parks with no occupant exiting can set off an alarm, as can the occupant proceeding away from the building. Memory and recognition also play a role; if a person loitering in an area temporarily leaves the camera view and soon returns; the same person is again identified, triggering an alarm. A consistent tracking thread for each object makes searching the VCA log for forensic investigations much easier.

False alarm reduction

A VCA system also has to be capable of handling the diversity found in outdoor and indoor environment (clouds, shadows, day/night transitions, background holes, swaying camera poles, flying bugs, etc.) to reduce false alarms and provide high confidence in probability of detection and identification.

In addition to the normal alarm capabilities, end users are encouraged to select a VCA system that can learn its environment. The system should be able to adapt to environmental limitations, and allow the operator to internalize them for use in setting alarm triggers or for recognition as a false alarm.

Selection of a VCA system should also place a priority on its systems interface and development tool. Frequently, a VCA system will need to interface to integrated security monitoring systems or to a geo-information mapping system. For example, an interface to the camera’s pan/tilt/zoom (PTZ) control systems can allow a VCA system to automatically slew and zoom into an alarm area, or to have multiple alarm policies for PTZ pre-set positions.

Performance standards are usually not mentioned in the context of VCA systems today. But in the near future, such standards will be established for a myriad of variables, including the size of an object versus the probability of detection, the recognizing of false positives, and time-of-detection for various types of lighting and environmental conditions.

Sidebar: Hot Tech Honored

Intelligent video grabbed lots of attention at the International Security Conference. Such technology was among the winning products in the annual New Product Showcase, sponsored by the Security Industry Association.

The “Best of Show” winner, for example, was the A4 (Sunnyvale, Calif.) 3D facial enrollment camera. The A4Vision 3D facial recognition system controls physical access to buildings and entrances. It is also used for identification systems such as machine-readable passports, driver’s licenses, boarder control and national identifications. Through proprietary matching engine and algorithms, the system performs subject identification and verification and can be used in a standalone environment. Another Product Achievement Award winner in the ISC New Product Showcase came from 3VR of San Francisco, Calif. The 800i performs like a high-end DVR with the added support of face detection, tunable, intelligent motion detection and security architecture that links monitoring, analysis and forensics in one system. Each unit provides intelligent management of eight cameras via MPEG4 video at 352 x 240 (CIF) or high-quality JPEG frames at 352 x 240 (CIF). 800i also makes data searches possible by time/date, camera or face, and allows users to automate responses to security events.

Check at www.securitymagazine.com and its LINX service for more info on intelligent security video.