Proving value in the security guarding industry is complicated. When things are quiet, people tend to assume that there are no problems; when incidents happen, on the other hand, management sees it as the fault of the security personnel. External events and factors beyond our control should not be the way we measure the ongoing performance of the guard force. Traditional performance indicators – spend versus budget, reduction in incidents by frequency and severity, as well as shortened response time – often used by security managers, decision makers, and those who control budget, completely miss the true performance of the individual security professional.

So how do you measure that? Training and standards exist throughout the guarding industry: every guarding company, client, and state has its own version of it. Measuring adherence and performance is often subjective and sometimes misguided as it is very difficult to observe and manage a decentralized and distributed workforce 24/7. Field supervisors and frontline managers can do surprise inspections, spot checks, and even review video footage to see how their team is performing. However, these methods only capture a snapshot in time. Management is not truly able to measure the performance of the guard carrying out their duties in real-time, nor aggregate performance metrics over time (i.e. over many shifts). We theorized we could better measure real time guard performance through our App by recording things like check-in and location, then applying data analytics and standardized measurement criteria, we could objectively track the performance of a pool of guards.

Starting in January of this year, we implemented this system for over 500 security officers across 75 client sites in different geographies. Using mobile phones and predetermined scoring algorithms, we tracked the following metrics: timeliness, reliability, trustworthiness, reporting and client feedback. Formulas for calculating these rates are available in the images above.

Timeliness answers the question of how punctual a certain guard is. When a guard arrives on site and checks in through the mobile device, it records the Guard Arrival Time and the GPS location. If the GPS location is within the acceptable geo-fence, we compare Guard Arrival Time and Shift Start Time to see if the guard is late, and if so, how late. This lateness is then mapped to a Timeliness score between 0 and 1 through a bounded quadratic function, as illustrated in Figure 1 above.

Reliability measures how much a certain guard calls out within a calendar week. When a guard cancels a shift within the next 4 weeks (i.e. 28 days), we record two variables: 1) the cancellation being the N-th one in that week, and 2) the Call-out Notice Time, which is number of days between the call-out and the actual shift date. Our system then uses these two variables as inputs to calculate a Reliability Sub-Score, as illustrated in Figure 2 above.

Trustworthiness aims to measure how much we can trust a guard in staying and completing a shift, particularly when there is no supervisor on-site. Through near constant tracking of geolocation data, detects various suspicious behaviors such as checking-in offsite, checking-in in a moving vehicle, abandoning site, checking-out in a moving vehicle, and checking-out off-site. These data, displayed in the manager’s dashboard as per Figure 3 above, then become the source of a Trustworthiness Score for each guard.

Reporting measures how well a guard follows client instructions. Through push notifications, the App prompts the guards to submit a status or incident report, effectively having the guard keep a detailed electronic Daily Activity Report (DAR). Guards were evaluated based on the percentage of prompted reports completed.  

Additionally, we used a Client Rating. Clients rated their security officers 1-5 stars evaluating four specific categories: appearance, professionalism, ability to follow instructions and customer service., and we required all ratings equal to or less than 3 provide feedback as to which category prompted the negative rating. For client ratings, we recognized that each client may have their own bias – some are more tolerant, and others less so. We used an algorithm (see formula 4 above) to derive the unbiased client evaluation of the guard.

Each metric, represented in both raw data and calculated scores, was stored in the cloud, aggregated and audited routinely to ensure accuracy. Throughout this study, auto-generated emails kept the guards apprised when the system recorded they checked in late, provided insufficient reporting, or exhibited anomalous behavior. The intent was to encourage real-time, continuous improvements in their performance.

Based on the calculated scores, we easily and objectively viewed guard performance and stratified the guard pool. Management could assess a guard's performance over time (whether they were getting better or worse), and distinguish between one-off mistakes versus consistent poor performance. Guards who did not meet standards were counseled and in some cases removed from their position. Guards performing above standard were given more responsibilities, higher pay and preferential shifts. 

These quantified metrics not only improved guard performance, but enabled management to be more effective as well. Management knew which officers needed more direction and focused post checks accordingly.  This system also enabled better matching between guards and clients (i.e. top-tier clients matched with top guards). In the future, training could be tailored for specific individuals or across the entire guard pool if specific deficiencies are indicated.

While we are continuing our research in this area, our initial findings support the hypothesis that this approach would lead to improved guard and client satisfaction, a reduction in incidents, and improved security and safety. Going forward, we hope to add more metrics into the scoring methodology to include engagement, proficiency, and possibly experience.    

They say the devil is in the details, but we have found the delight is in the data. Gathering more information, parsing, and aggregating allowed us to make better-informed decisions and ultimately brought up the average quality level across the board.