Preventive Measures for Lower Cost and Risk within Physical Security Operations

If you manage physical security budgets or operations, a core year-over-year objective is to “do more with less.” Physical security device and system failures (and repairs) is one of the top-three operational expenses or interruptions for the physical security SOC teams, and is one of the first budget line items you should look to reduce.

Historically, physical security teams have operated on the premise “if it ain’t broke, don’t fix it”, which sounds reasonable. However, a run-to-failure approach is a reactive management technique that waits for device failure before any maintenance/repair action is taken. Hence, it’s the most expensive method given the risk of sacrificing security while a device or system is down, the high cost of nuisance/false alarms and their toll on SOC personnel, plus the high costs of labor overtime due to emergency repairs and the additional patrols required while systems are down.

Fortunately, most physical security departments, especially those securing 10+ locations, have a lot of cost reduction opportunities they can leverage. One significant solution is Vector Flow’s SOC Systems Health Dashboard which automatically performs device configuration assessments which has proven to detect that at least 30% of physical security devices are wrongly configured, and about 5% of them require repairs. Proactively resolving these issues has been proven to reduce risk and costs via a significant reduction in emergency repairs and associated personnel overtime.

Now you may be reading this and thinking, “no, our systems team and system integrator executes routine maintenance” or “we have a maintenance contract in place.” Maybe so, but do you know the number of nuisance alarms your SOC receives due to faulty devices? Are you aware of device failures in between scheduled maintenance windows? Does your maintenance program drive pre-emptive repairs or upgrades? The reality is that most security departments do not have the ability to track these metrics and can’t measure the negative impact of physical security device/system failures on SOC operations.

Here’s a roadmap to move you from reactive to proactive approach:

1. Predictive maintenance: Physical security systems and associated sensors generate volumes of data, so you can identify issues long before a device fails, and manage metrics like Mean Time Between Failures (MTBF), Mean Time To Repair, % SLA compliance on repairs, warranty, etc. Historically this data has stayed in silos of the manufacturer’s proprietary data format and is not available to easily measure. Vector Flow marries physical security event and device usage data together at scale. Vector Flow operates on the principle that device maintenance should be proactively performed when certain device indicators show signs of decreasing performance or an upcoming critical failure. For example, one of our customers leveraged Vector Flow to pull data from several different physical security systems and was able to identify device issues 2-4 weeks in advance — reducing labor costs, downtime, parts, and other related costs by 40%.

2. Device Failure to repair work order: In situations when a physical security device is failing or showing anomalous behavior, Vector Flow leverages physical security domain-specific machine learning (ML) algorithms to correlate real-time data with previous failures at similar devices to identify the root cause. Next, Vector Flow automates work order creation for this issue/device for faster resolution. The design and implementation of Vector Flow System Health Dashboards is both data-centric and user-centric. That makes life easier for SOC and Systems Teams, as well as external vendors such as your physical security systems integrator. This requires a good understanding of practices and everyone within the value chain.

Here’s an example of how the maintenance processes work without and with Vector Flow based on a real customer experiencing security device or system failure.

Physical Security Team
(No Data-Driven Predictive Maintenance) 

1.) Device failure occurs

2.) SOC receives a device alarm in the SOC alarm console

3.) SOC Operator can acknowledge the alarm but cannot “clear” the alarm since the device is not working.

4.) SOC continues to get device error alarms all day cluttering their alarm console. SOC operators get frustrated.

5.) SOC Operators make a phone call and email their Systems Tech Team as well as the System Integrator. SOC adds additional guard tours to the affected area requiring staff overtime.

6.) (2 days later) System Integrator dispatches a technician to the site for emergency repair. Technician truck rolls into the site for troubleshooting per their training (issue trees, etc.). After several tries technician is not able to resolve the issue and seeks help. The technician decides to just replace all the sub-parts (camera, or door switch, etc.) and realizes they don’t have all the parts

7.) (15 days later) Technician orders parts and comes back with replacement parts, resolving the issue.

8.) SOC is relieved that they don’t have nuisance alarms, and unnecessary guard tours to affected area.

Physical Security Team + Vector Flow
(Data-Driven System Health Monitoring & Predictive Maintenance)

1.) Vector Flow System Health monitoring detects anomalous device behavior 2 months prior to device failure (correlating historical failure data along with real-time data)

2.) Vector Flow identifies this “Device Needs Repair” and creates a Trouble Ticket for the System team to review and approve

3.) Upon Approval, Vector Flow creates a work order in the company’s ticketing system (ServiceNow), and emails the work order to the company’s system integrator with device details including “recommendations” to resolve issues

4.) System Integrator schedules technician visit that now resolves problems in a single appointment. As part of the scheduling process, the technician gets all necessary information on the device they are servicing, including historical issues and probable root causes.

5.) SOC is informed that device has been repaired proactively.

 

Impact

  1. High Risk – since the area cannot be electronically monitored while the device stays in the failed state.
  2. High Cost – as the technician had to make multiple trips to resolve the same problem, many labor hours. Additional guard tours required overtime by the security officers.
  3. High Fatigue – and frustration for both the SOC and technicians who are plagued with back and forth, higher downtime and not able to understand the root cause.
  1. Low Cost, and No Fatigue – as the systems proactively identified failing devices, and identified the root cause remotely thereby reducing the number of truck rolls, and eliminating any overtime-related costs.
  2. Low Risk – as the area was monitored at all times, as device failure was prevented.

3. Data-driven culture: One key and often overlooked element for achieving cost reductions by leveraging data is the lack of the SOC and the system teams’ buy-in to establish a data-driven operations culture. After all, most technicians or systems teams are rewarded for putting out fires as they fix things and problem-solve, as opposed to preventing device failures. That’s a great trait, but a very costly response. Their energy is better directed toward prevention. This requires a cultural change where technicians receive accolades for preventing fires (device failures) versus fighting fires. One of Vector Flow’s customers trained their system teams to start each day by looking at the System Health dashboard and a data-driven “to do” list BEFORE they start working on the perceived “fire of the day”. This approach helped prioritize their efforts and the systems teams now proactively collaborate with their SOC and system integrator. This has led to a significant reduction in emergency repairs, kudos from the COO of the company, as well as staff promotions. Hence, getting people at your organization on board with the changes that come with a data-driven culture is essential, though not necessarily easy. It’s important to get buy-in from your team to create a culture of success.

Without pertinent data, you can’t predict anything. If you don’t have a baseline of what’s normal for a security system or device, you can’t identify or predict anomalies. With the right roadmap in place you can yield remarkable results as one of our customers has, including: a tenfold ROI, 40% reduction in maintenance costs, 80% decrease of device/system failures, and a significant reduction in emergency repairs. Physical security teams at leading companies have already enjoyed significant rewards from similar efforts. The use of data-driven System Health Dashboards is now a must have for physical security teams and cannot be approached as a one-off effort. Instead, it’s best for physical security to treat this as part of an ongoing digital transformation, one that requires starting simple with 1 or 2 sites, incorporating lessons learned, and then expanding to all sites. Most importantly, look to establish a data-driven mindset across your physical security operation — from top leadership to security managers to SOC analysts to system integrators.

By: Vik Ghai, VP/CTO