Providing Out-of-Band Connectivity to Mission-Critical IT Resources

Network Resilience vs Redundancy vs Backups

An illustration of redundant data systems for network resilience
Network resilience is an organization’s ability to continue delivering critical business services during adverse events, service degradation, and complete outages. Resilience is vitally important to a company’s revenue and reputation, and failure has serious consequences. For example, a popular file transfer appliance was recently hit with CL0P ransomware, resulting in the theft of more than 4 million healthcare patients’ sensitive data. SolarWinds’ high-profile breach resulted in legal action from the SEC due to a potential lack of resilience infrastructure and practices.

Many organizations already use redundant and backup systems for disaster recovery and assume this makes them resilient. However, these measures aren’t able to withstand many major events like ransomware attacks, supply chain failures, and WAN outages. This article compares network resilience vs. redundancy and backups and describes some of the tools and best practices for ensuring resilience.

Network resilience vs. redundancy vs. backups

Backups

Copies of data, configurations, and application code used in a hot or cold restore of a failed production system.

Redundancy

The duplication of critical systems, services, and applications so organizations can “failover” during an outage.

Network Resilience

The ability to continue delivering critical business services during adverse events, service degradation, and complete outages.

What are backups?

Backups are extra copies of critical data, configurations, and application code that are made in case the originals are lost or compromised. Backups are usually stored off-site so that they’ll be available if the primary data center or business location suffers an outage. The backup site communicates with the primary systems to download data on a scheduled or continuous basis to maintain a secondary copy of data at its most current state. This connection, while necessary, also allows ransomware and other malware to infect backups, which limits their usefulness in recovery operations. Additionally, if that connection is interrupted by an outage or configuration error, backups may be incomplete or inaccessible.

What is redundancy?

Redundancy involves duplicating the most critical systems, services, and applications so organizations can “failover” to them if the primary systems go down or become inaccessible. Typically, a company will have redundant systems in one or more disaster recovery sites in different locations to prevent a regional ISP outage or weather event from affecting them all at the same time. If one site goes down, teams reroute traffic to a redundant site to continue delivering services. However, each redundant site is susceptible to the same risks as the primary site, and cybercriminals and malware could potentially jump from one site to another.

What is network resilience?

Network resilience is the ability to continue operating and delivering core services – if in a degraded state – during adverse events. Backups and redundancy contribute to resilience, but there are additional pieces to the puzzle. Teams also need the ability to recover data, rebuild systems, and perform security testing without worrying about ransomware reinfection or access disruption. Additionally, organizations must be able to protect management interfaces from cybercriminals on the network, or they could become completely cut off from vital systems and services.

The best way to improve resilience is by building a resilience system containing all the infrastructure, tools, and services needed to continue delivering services and recover failed or compromised systems. It must be isolated from the production network using isolated management infrastructure (IMI) to prevent malicious actors from compromising it and ensure teams have continuous remote access even if the primary network goes down.

Read more about ransomware resilience with IMI:

Resilience systems use the following tools, technologies, and best practices to provide network resilience.

 

Network Resilience Tools, Technologies, and Best Practices

Alternative Networking

Routing, switching, Wi-Fi, VoIP, virtualization, and software-defined network overlays for SDN & SD-WAN

Alternative Compute

CPU/GPU compute, containers, virtual machines, and any other resources needed to run applications and deliver services during an outage

Storage & Storage Recovery

Enough storage to recover systems and applications, rebuild new systems, and support content delivery

Automation

Tools like zero-touch provisioning (ZTP) to facilitate speedy recovery while minimizing human error

Out-of-Band (OOB) Management

A separate, isolated management plane that ensures continuous remote access to troubleshoot and recover infrastructure during production network outages and attacks

Redundancy and backups are crucial for disaster recovery and contribute to your ability to continue operating during adverse events. But, because these rely on the underlying infrastructure, achieving network resilience requires a more comprehensive strategy. A resilience system using IMI allows you to continuously deliver critical services and provides teams with everything they need to safely recover.
A diagram showing how to use ZPE to follow Gartner’s best practices for an isolated management infrastructure.
The Nodegrid platform from ZPE Systems streamlines network resilience by providing a vendor-neutral foundation for a resilience system. Nodegrid’s out-of-band management solutions enable redundancy while creating an isolated management plane capable of running your choice of third-party tools for troubleshooting, recovery, security validation, and more. You can even use Nodegrid’s powerful x86 compute architecture to host and deliver services while your primary systems are down.

Network resilience with ZPE Systems

Want to learn more about using Nodegrid to build network resilience vs. redundancy and backups? Our Network Automation Blueprint provides a step-by-step guide to building an IMI resilience system.

Download the Network Automation Blueprint

The Future of Edge Computing

The Future of Edge Computing
Edge computing moves computing resources and data processing applications out of the centralized data center or cloud, deploying them at the edges of the network and allowing companies to use their edge data in real-time. An explosion in edge data generated by Internet of Things (IoT) sensors, automated operational technology (OT), and other remote devices has created a high demand for edge computing solutions. A recent report from Grand View Research valued the edge computing market size at $16.45 billion in 2023 and predicted it to grow at a compound annual growth rate (CAGR) of 37.9% by 2030.

The current edge computing landscape comprises solutions focused on individual use cases,  lacking interoperability and central orchestration. The future of edge computing, as described by leading analysts at Gartner, depends on unifying the edge computing ecosystem with comprehensive strategies and centralized, vendor-neutral management and orchestration. This future relies on edge-native applications that integrate seamlessly with upstream resources, remote management, and orchestration while still being able to operate independently.

Where is edge computing now?

Many organizations already use edge computing technology to solve individual problems or handle specific workloads. For example, a manufacturing department may deploy an edge computing application to analyze log data and provide predictive maintenance recommendations for a single type of machine or assembly line. A single company may have a dozen or more disjointed edge computing solutions in use throughout the network, creating visibility and management headaches for IT teams. This piecemeal approach to edge computing results in what Gartner calls “edge sprawl”: many disparate solutions deployed without centralized control, security, or visibility. Edge sprawl increases management complexity and risk while decreasing operational efficiency, creating significant roadblocks for digital transformation initiatives.

Additionally, many organizations misunderstand edge computing by thinking it’s just about moving computing resources as close to the edge as possible to collect data. In reality, the true potential of the edge involves using edge data in real-time, gaining “cloud-in-a-box” capability that works in concert with the network’s upstream resources.

Anticipating the future of edge computing

At Gartner’s 2023 IT Infrastructure Operations & Cloud Strategies Conference, edge technology experts predicted that, by 2025, enterprises will create and process more than 50% of their data outside the centralized data center or cloud. Surging edge data volume will accelerate the challenges caused by a lack of strategy or orchestration.

Gartner’s 6 Edge Computing Challenges

Lack of extensibility

Many purpose-built edge computing solutions can’t adapt as use cases change or expand as the business scales, limiting agility and preventing efficient growth.

Inability to extract value from edge data

Much of the valuable data generated by edge sensors and devices gets left on the table, so to speak, because companies lack the resources needed to run all their data analytics and AI apps at the edge and are stuck simply collecting data rather than being able to do much with it.

Data storage constraints

Edge computing deployments are often smaller and have more data storage constraints than large data centers and cloud deployments, but quickly distinguishing between valuable data and destroyable junk is difficult with edge resources.

Knowledge debt from edge-native apps

Edge-native applications are designed for edge computing architectures from the ground up. Edge containers are similar to cloud-native apps, but clustering and cluster management work much differently, creating what’s known as “knowledge debt” and straining IT teams.

Lack of security controls, policies, & visibility

Edge deployments often lack many of the security features used in data centers, and sometimes other departments install edge computing solutions without onboarding them with IT for the application of security policies and monitoring agents, adding risk and increasing the attack surface.

Inability to remotely orchestrate, monitor, & troubleshoot

When equipment failures, configuration errors, or breaches take down edge networks, remote teams are often cut-off and unable to troubleshoot or recover without traveling on-site or paying for managed services, increasing the duration and cost of the outage. Current edge solutions are novel and don’t connect to or integrate with the full networking stack.

At the Gartner conference, analyst Thomas Bittman gave multiple presentations echoing his advice from the Building an Edge Computing Strategy report published earlier in the year. In preparing for the future of edge computing, Bittman urges companies to proactively develop a comprehensive edge computing strategy encompassing all potential use cases and addressing the challenges described above. His recommendations include:

  • Enabling extensibility by utilizing vendor-neutral platforms that allow for expansion and integration, which supports growth and agility at the edge.
  • Looking for opportunities to deploy artificial intelligence, data analytics, and machine learning alongside edge computing units, for example, with system-on-chip technology or all-in-one edge networking and computing devices.
  • Anticipating data storage and governance challenges at the edge by defining clear policies and deploying AI/ML data management solutions that dynamically determine data value.
  • Reducing knowledge debt by utilizing vendor-neutral platforms that support familiar container and cluster management technologies (like Docker and Kubernetes).
  • Securing the edge with a multi-layered defense, including hardware security, frequent patches, zero-trust policies, strong authentication, network micro-segmentation, and comprehensive security monitoring.
  • Centralizing edge management and orchestration (EMO) with a vendor-neutral platform that unifies control, supports environmental monitoring, and uses out-of-band (OOB) management while interoperating with automated edge management workflows (such as zero-touch provisioning and infrastructure configuration management).

Bittman’s recommended edge computing strategy uses the central EMO as a hub for all the technologies, processes, and workflows involved in operating and supporting the edge. This strategy will prepare companies for the future of edge computing and support efficient, agile growth and innovation.

Enter the future of edge computing with Nodegrid

Nodegrid is a vendor-neutral edge management and orchestration platform from ZPE Systems. Nodegrid easily interoperates with your choice of edge solutions and can directly run third-party AI, ML, data analytics, and data governance applications to help you extract more value from your edge data. The open, Linux-based Nodegrid OS can also host Docker containers and edge-native applications to reduce hardware overhead and knowledge debt.

Nodegrid devices protect your edge management interfaces with hardware security features like TPM and geofencing, support for strong authentication like 2FA, and integrations with leading zero-trust providers like Okta and PING. The Nodegrid OS and ZPE Cloud are Synopsys-validated to address security at every stage of the SDLC. Plus, you can run third-party security solutions for SASE, next-generation firewalls, and more.

Nodegrid edge networking solutions use out-of-band technology to give teams 24/7 remote visibility, management, and troubleshooting access to edge deployments. It freely interoperates with third-party solutions for infrastructure automation, monitoring, and recovery to support network resilience and operational efficiency. Nodegrid is like a cloud-in-a-box solution, incorporating edge computing and the full networking stack. Nodegrid’s edge management and orchestration platform provides single-pane-of-glass visibility, control, and resilience while supporting future edge growth.

Use Nodegrid for your Gartner-approved edge computing strategy

The Nodegrid EMO platform helps you anticipate the future of edge computing with vendor-neutral, single-pane-of-glass visibility and control. Watch a free Nodegrid demo to learn more.

Request a Demo

Distributed Edge Computing Use Cases

An industrial worker selecting an illustration of distributed edge computing concepts surrounding the word edge computing
Across every industry, networks are decentralizing as organizations expand with remote business sites, Internet of Things (IoT) deployments, and mobile technologies. Distributed edge computing involves moving data processing systems and applications out of the centralized cloud or data center and distributing them around the network’s edges, where much of the data is generated. As defined by The Open Glossary of Edge Computing, edge native computing integrates with centralized cloud computing resources, local workloads, remote management, and orchestration while having the ability to operate independently.

Edge computing supports secure, real-time data analysis by reducing off-site data transmission. Edge native computing also enables the transition to digital transformation 2.0 by allowing companies to do something with their edge data in real-time, not just collect it. This post discusses six different use cases that could benefit from distributed edge computing, including healthcare, finance, energy, manufacturing, utilities/public services, and AI & machine learning.

Jump to the executive summary.

Distributed edge computing use cases

Use cases for distributed edge computing include:

Healthcare

  • Mitigate security, privacy, and compliance concerns with local data processing, AI, and Zero Touch Provisioned Virtual Network Functions

  • Improve patient health outcomes with real-time alerts that don’t require Internet access

  • Enable emergency mobile medical intervention while reducing mistakes

Finance

  • Support distributed financial networks while reducing security and regulatory risks by managing scope through isolation and built-in change management.

  • Get fast, localized business insights to improve revenue and customer service

  • Deploy AI-powered surveillance and security solutions without network bottlenecks

Energy

  • Enable real-time data processing and ensure network access for air-gapped and isolated environments with IT and OT operations. without network access

  • Improve efficiency with predictive maintenance recommendations and other insights

  • Proactively identify and remediate safety, quality, and compliance issues

Manufacturing

  • Get real-time, data-driven insights to improve manufacturing efficiency and product quality

  • Reduce the risk of confidential production data falling into the wrong hands during transit

  • Ensure continuous communications and operations during network outages and other adverse events

Utilities/Public Services

  • Use IoT technology to deliver better services, improve public safety, and keep communities connected

  • Reduce the fleet management challenges involved in difficult deployment environments

  • Provide IT with reliable remote access to install critical security patches and maintain devices

  • Aid in Disaster Recovery and resilience

AI & Machine Learning

  • Get enhanced data analytics capabilities for any distributed edge computing use case

  • Improve AI/ML efficiency by eliminating network bottlenecks and reducing security risks

  • Use edge devices with a built-in networking stack to improve the agility, cost-effectiveness, and scalability of edge AI/ML

Migration from On Premises to Edge Computing

Image: Concrete use case that can work across all industries, showing the migration from on-prem computing to microservices at the edge, along with the associated level of security risk.

Healthcare

The healthcare industry quickly and enthusiastically adopted IoT technology for medical equipment like insulin pumps, pacemakers, and imaging devices to improve patient health monitoring and outcomes. These sensors generate massive quantities of data that healthcare organizations must transmit to applications in central data centers or the cloud for processing. This data can’t be transferred over the open Internet for security and compliance reasons, so it’s usually funneled through a central firewall via MPLS (for branches, clinics, and other physical sites), overlay networks, or SD-WAN (for wearable sensors and mobile EMS devices). The firewall becomes a bottleneck that increases latency and prevents real-time data processing, introducing potentially lethal delays in health monitoring and response.

Distributed edge computing for healthcare involves installing medical data processing applications closer to the sensors and devices generating most of the data. Edge computing occurs on the same local network or even the same onboard chip (using system-on-chip or SoC technology), which reduces security risks and latency. For example, software running on an implanted heart-rate monitor can analyze patient data in real time without a network connection. If it detects any concerning activity that falls outside of an established baseline, it uses multiple cellular and ATT FirstNet connections to send alerts to the cardiologist without exposing any private patient data. Even if the application can’t establish a network connection at all, the device itself can alert the patient that there’s a problem so they can take immediate action.

Another healthcare use case is mobile EMS units processing patient health data en route to the hospital using edge compute resources built into cellular edge routers. Edge native applications can help medics prevent allergic reactions and harmful medication interactions when administering treatment.

Finance

Finance industry networks are typically highly decentralized, using branches, web and mobile applications, and self-service ATMs to make their services accessible to customers around the world. Banks and other institutions know that edge data has value beyond the financial transactions being conducted, so they use data analytics software (often powered by AI & machine learning) to gain insights into how to improve their services and generate more revenue. However, there are enormous security, regulatory, and reputational risks involved in transmitting sensitive financial data, making it challenging to leverage cloud- or data center-based analytics software.

Distributed edge computing moves financial data processing applications to branches and 26remote PoPs (points of presence) to help mitigate the risks of transmitting data off-site. For example, financial institutions can install all-in-one branch gateway services routers with built-in edge compute functionality in networking closets, drive-up kiosks, or even inside an ATM’s housing. Running data analytics software from this device enables real-time data processing for business insights, surveillance, customer service improvements, and more. These routers should also include out-of-band (OOB) management technology to support infrastructure isolation and simplify compliance with PCI DSS 4.0 and other regulations.

Energy

Edge data in the oil and gas industry comes from IoT sensors and automated equipment deployed in remote sites, drilling rigs, and offshore platforms all over the world. Analyzing that data is crucial for productivity, safety, and compliance, but it’s often difficult to maintain a fast and reliable network connection with applications in data centers or the cloud.

Distributed edge computing allows oil and gas companies to effectively harness their data in challenging deployment environments, such as the middle of the ocean. For example, companies can tuck compact, cellular-enabled edge computing devices into maintenance closets or other small compartments to deploy software that analyzes equipment monitoring data, well logs, and borehole logs. This software can provide predictive maintenance recommendations, alert technicians to potential quality or safety issues, and deliver productivity forecasts and insights without requiring an Internet connection.

Manufacturing

Companies across nearly every industry are increasingly automating their manufacturing to improve productivity, lower costs, and reduce errors. To further reduce human involvement, they use software to monitor equipment health, track production costs, schedule preventative maintenance, and perform quality assurance (QA) tasks. This software, which typically runs from the cloud or a centralized data center, relies on data generated by automated operational technology (OT) and other manufacturing machinery. As in the above use cases, transmitting OT back and forth creates latency and security issues. There are additional risks associated with manufacturing operations located overseas, where political instability, disasters, and other external forces could interrupt communications.

Distributed edge computing enables real-time, data-driven insights to improve manufacturing efficiency and elevate product quality. Plus, some edge computing solutions, like the Nodegrid integrated branch services router, provide out-of-band (OOB) management access to remote equipment. OOB management creates a dedicated management network that’s completely isolated from the production network, ensuring continuous remote access to operational technology, monitoring systems, and edge native applications during Internet outages and other adverse events.

Utilities / public services

Many forward-thinking cities are deploying Internet of Things (IoT) devices to improve their utilities and public services and better connect their communities. These “smart cities” collect data from Internet-connected thermostats, parking meters, traffic lights, security cameras, and other devices deployed outdoors, in public facilities, and in citizens’ homes. However, local governments often find it challenging to keep up with fleet management, ensuring all these devices are connected, patched, and up-to-date to prevent breaches and failures.

Distributed edge computing reduces the networking and bandwidth requirements for IoT-enabled utilities, public services, and smart cities. Edge native applications can analyze data on the same sensor or device that generates it, reporting back to a centralized cloud or data center as needed to provide alerts, reports, and visualizations. All-in-one edge networking solutions combine connectivity with compute capabilities and are small enough to fit in utility cabinets, under public benches, or on top of street lights. They provide remote IT teams with easy access to monitor devices, deploy updates, and troubleshoot issues over a reliable, cellular OOB connection. An edge native networking solution should also enable automatic, zero-touch operations to streamline digital fleet management at scale.

AI & machine learning

Artificial intelligence (AI) and machine learning (ML) applications ingest data to train, operate, and make decisions. Much of that data originates at the network’s edges – in fact, there are AI & ML applications for every edge use case and industry listed above. Transmitting vast quantities of data to the cloud or a data center introduces network bottlenecks, latency, and security risks that can prevent organizations from getting the full value out of their AI investment.

Because artificial intelligence is very resource-hungry, edge native computing for AI/ML sometimes looks a little different than in other use cases. A typical edge computing deployment for AI & ML involves racks of high-performance machine learning processing units deployed in edge data centers on the same site as (or very nearby) the devices generating data. This approach works well for large machine-learning workloads occurring in a limited number of deployment sites. A more flexible approach involves using smaller graphics processing units (GPUs) or multi-purpose edge devices to handle individual AI/ML workloads in smaller and more distributed edge deployment sites. These “thin” or “nano” deployments are agile and cost-effective, scaling easily as organizations grow in size and geographic distribution.

Executive summary

  • Distributed edge computing for healthcare improves patient health outcomes and data privacy with SoC applications on wearable medical devices and cellular edge routers in mobile EMS units.
  • Distributed edge computing for the finance industry provides insights into how to improve services and revenue while helping to mitigate security and regulatory risks with on-site data processing and infrastructure isolation.
  • Distributed edge computing helps the energy sector effectively harness critical data from sensors and equipment in challenging deployment environments to improve quality, safety, and productivity.
  • Distributed edge computing for manufacturing helps companies process data from automated machinery and operational technology to improve manufacturing efficiency and elevate product quality.
  • Distributed edge computing for utilities/public services reduces the networking and fleet management challenges for IoT-enabled utilities, public services, and smart cities with all-in-one edge networking solutions, OOB, and zero-touch operations.
  • Distributed edge computing for AI & machine learning uses multi-purpose edge devices to handle individual workloads, improving the agility, scalability, and cost-effectiveness of edge AI/ML.

Distributed edge computing with Nodegrid

Nodegrid is a line of all-in-one edge networking solutions from ZPE Systems. Nodegrid’s vendor-neutral, integrated branch services routers combine edge gateway networking functionality with Gen 3 out-of-band management and edge computing capabilities. The Nodegrid platform streamlines distributed edge computing for any use case with consolidated hardware and software that reduce deployment costs and management headaches while improving efficiency.

See Nodegrid’s edge solutions in action

Nodegrid delivers streamlined, cost-effective solutions for distributed edge computing in healthcare, EMS, financial services, local governments, and more. To see how Nodegrid works for your edge computing use case, request a free demo.

Request a Demo

DORA Act: 5 Takeaways For The Financial Sector

Thumbnail – DORA Act 5 Takeaways for the Financial Sector

The Digital Operational Resilience Act (DORA) is a regulatory initiative within the European Union that aims to enhance the operational resilience of the financial sector. Its main goal is to prevent and mitigate cyber threats and operational disruptions. The DORA Act outlines regulatory requirements for the security of network and information systems “whereby all firms need to make sure they can withstand, respond to and recover from all types of ICT-related disruptions and threats” (DORA Act website).

Who and What Are Covered Under the DORA Act?

The DORA Act is a regulation that covers all financial entities within the European Union (EU). It recognizes the critical role of information and communication technology (ICT) systems in financial services. DORA applies to financial services including payments, securities, credit rating, algorithmic trading, lending, insurance, and back-office operations. It establishes a framework for ICT risk management through technical standards, which are being released in two phases, the first of which was published on January 17, 2024. The DORA Act will go into effect in its entirety on January 17, 2025.

With cyberattacks constantly in the news cycle, it’s no surprise that governing bodies are putting forth standards for operational resilience. But without combing through this lengthy piece of legislation, what should IT teams start thinking about from a practical standpoint? Here are 5 takeaways on what the DORA Act means for the financial sector.

DORA Act: 5 Takeaways for the Financial Sector

1. Shore-up your cybersecurity measures

The DORA Act emphasizes strengthening cybersecurity measures within the financial sector. It requires financial institutions, such as banks, stock exchanges, and financial infrastructure providers, to implement robust cybersecurity controls and protocols. These include adopting advanced authentication mechanisms, encryption standards, and network segmentation to protect sensitive financial data and critical infrastructure from cyber threats. Part of this will also require organizations to apply system patches and updates in a timely manner, which means automated patching will become necessary to every organization’s security posture.

2. Implement resilience systems

Operational resilience is a key focus area of the DORA Act, aiming to ensure the continuity of essential financial services in the face of cyber threats, natural disasters, and other operational disruptions. Financial institutions are required to develop comprehensive business continuity plans, establish redundant systems and backup facilities, and conduct regular stress tests to assess their ability to withstand and recover from various scenarios. Implementing a resilience system helps with this, as it provides all the infrastructure, tools, and services necessary to continue operating during major incidents.

3. Conduct regular scans for vulnerabilities

The DORA Act mandates financial institutions to implement robust risk management practices to identify, assess, and mitigate cyber risks and operational vulnerabilities. This includes conducting regular assessments, vulnerability scans, and penetration tests, and developing incident response procedures to quickly address threats. This is all part of taking a proactive approach to identify and mitigate cyber incidents, and reduce the impact that adverse events have on financial stability and consumer confidence.

4. Collaborate and share information with industry peers

The DORA Act encourages financial institutions to share cybersecurity threat intelligence, incident data, and best practices with industry peers, regulators, and law enforcement agencies. The ability to monitor systems and collect data will be crucial to this approach, and will require systems that can rapidly (and securely) deploy apps/services during ongoing incidents. This will help financial institutions to better understand emerging threats, coordinate responses to cyber incidents, and strengthen collective defenses against threats and operational disruptions.

5. Segment physical and logical systems to pass regular audits

Through the DORA Act, regulators are empowered to conduct regular assessments, audits, and inspections of systems. This will ensure that financial institutions are implementing adequate controls and safeguards to protect against cyber threats and operational disruptions. A crucial part to this will involve physical and logical separation of systems, such as through Isolated Management Infrastructure, as well as implementing zero trust architecture across the organization. These will help bolster resilience by eliminating control dependencies between management and production networks, which will also help to streamline audits.

Get the blueprint to help you comply with the DORA Act

DORA’s requirements are meant to help IT teams better protect sensitive data and the integrity of financial systems as a whole. But without a proper network management infrastructure, their production networks are too sensitive to errors and vulnerable to attacks. ZPE has created the blueprint that covers these 5 crucial takeaways outlined in the DORA Act. The architecture outlined in this blueprint has been trusted by Big Tech for more than a decade, as it allows them to deploy modern cybersecurity measures, physically and logically separated systems, and rapid recovery processes. Download the blueprint now.

Zero Trust Edge Solutions: Continuing the Zero Trust Journey

A glowing shield with a 0 on it overlays a glowing map of the world to represent zero trust at the edge.

The zero trust security methodology follows the principle of “never trust, always verify,” which assumes that any account or device could be compromised and should be forced to continuously establish trustworthiness. This sounds like an extreme approach, but with the frequency of high-profile data breaches and ransomware attacks steadily increasing, security teams must pivot their approach away from prevention and toward damage mitigation and recovery. Zero trust security limits the lateral movement of compromised accounts on the network by establishing micro-perimeters around network resources that continually assess an account’s behavior for suspicious activity.

Organizations also must extend zero trust security policies and controls to remote business sites at their network’s edges, such as branches, Internet of Things (IoT) deployments, and home offices. Zero trust edge solutions are software platforms that provide networking, access, and security capabilities designed specifically for the edge. This guide explains what zero trust edge solutions do and the challenges involved in using them before discussing how to build a unified ZTE platform.

What are zero trust edge solutions?

A zero trust edge solution combines edge-centric security functionality with remote access and networking capabilities. ZTE’s core feature is zero trust network access (ZTNA), which securely connects remote users to enterprise applications and resources, similar to a VPN. ZTNA is more secure than VPNs because it only allows users to authenticate to one resource at a time and prevents them from seeing or accessing anything else until they re-establish their identity and credentials. ZTE’s other features and capabilities vary depending on the vendor and deployment type. ZTE solutions come in three different forms:

  • As a service: Companies can purchase ZTE functionality as a cloud-based, vendor-managed service. Remote users connect to regional points of presence (POPs) to reach the ZTE stack in the cloud before being routed to enterprise resources. This deployment style is easier to deploy for organizations with lots of users in the field but few (if any) physical edge locations to host security or networking solutions.
    .
  • With SD-WAN: Some ZTE providers combine zero-trust features with software-defined wide area networking (SD-WAN) capabilities. SD-WAN creates a virtual network overlay that’s decoupled from the underlying WAN infrastructure, enabling centralized control and automation. Packaging ZTE and SD-WAN together helps organizations consolidate their tech stack at physical edge sites like branches, warehouses, and manufacturing plants while still offering ZTNA to work-from-home and field employees.
    .
  • Build your own: Since there are very few mature ZTE providers on the market, and it can be difficult to find pre-made solutions with all the features needed for complex, distributed edge networks, many teams opt to build their own platform by combining tools from multiple vendors. Typically, these organizations have physical branches with existing WAN infrastructure that they use as regional POPs to host ZTNA and other security solutions.

Why build your own ZTE solution?

If pre-made solutions exist, why would companies go through the hassle of creating their own zero trust edge platform? Presently, there aren’t any “complete” ZTE solutions that offer full, zero-trust protection for branches and other physical edge sites.

For example, many ZTE platforms don’t protect management ports on the control plane, leaving critical edge infrastructure like servers, switches, and power distribution units (PDUs) exposed to cybercriminals. Additionally, branch ZTE solutions rely upon production network infrastructure, so if there’s an outage or ransomware attack, remote management teams are completely cut off from troubleshooting and recovery. These solutions also lack helpful edge networking features like fleet management and automation, and their closed ecosystems limit the ability to extend their capabilities.

Building your own zero trust edge platform allows you to combine all the security, networking, and management functionality you need to get full security coverage and streamline branch operations. The key to creating a robust and efficient ZTE solution is starting with a vendor-neutral platform that can unify the entire security architecture.

How Nodegrid simplifies ZTE

Nodegrid edge networking solutions from ZPE Systems provide the perfect vendor-neutral platform for integrated zero trust edge deployments. All-in-one edge gateway routers deliver a full stack of branch networking capabilities, including out-of-band (OOB) management. OOB creates a dedicated control plane on an isolated network so remote teams have continuous access to manage, troubleshoot, and repair edge infrastructure.

Nodegrid protects the management interfaces on the OOB network with robust, zero trust security processes and controls. For example, the encryption keys for each Nodegrid device are destroyed after provisioning so that only the public key is accessible when needed for authentication to our cloud. Nodegrid devices also use the Trusted Platform Module (TPM) as a hardware security module to prevent cybercriminals from tampering with the configuration or storage.

Our platform runs on the Linux-based, x86 Nodegrid OS, which supports VMs and Docker containers for third-party applications. That means you can deploy ZTNA, SD-WAN, and other zero trust edge solutions without purchasing or managing additional hardware at each branch. Nodegrid’s OOB and failover functionality ensure those security and access solutions remain operational during ISP outages, ransomware attacks, and other disruptions. Teams can also run their favorite tools for automation, troubleshooting, and recovery on the Nodegrid platform, streamlining edge operations and ensuring their toolbox is available on the OOB network. Nodegrid also simplifies fleet management with true zero-touch provisioning to securely and automatically deploy configurations at edge business sites.

Want to unify your zero trust edge solutions with Nodegrid?

Nodegrid provides a robust, vendor-neutral platform to unify and extend your zero trust edge capabilities. Request a free demo to see Nodegrid in action. Watch Demo

What to do if You’re Ransomware’d: A Healthcare Example

What to do if youre ransomwared

This article was written by James Cabe, CISSP, a 30-year cybersecurity expert who’s helped major companies including Microsoft and Fortinet.

Ransomware gangs target the innocent and vulnerable. They hit a Chicago hospital in December 2023, a London hospital in October the same year, and schools and hospitals in New Jersey as recently as January 2024. This is one of the biggest reasons I’m committed to stopping these criminals by educating organizations on how to re-think and re-architect their approach to cybersecurity.

In previous articles, I discussed IMI (Isolated Management Infrastructure) and IRE (Isolated Recovery Environments), and how they could have quickly altered outcomes for MGM, Ragnar Locker victims, and organizations affected by the MOVEit vulnerability. Using IMI and IRE, organizations find that the key to not only speedy recovery, but also to limiting the blast radius and attack persistence, is isolation.

Why is isolation (not segmentation) key to ransomware recovery?

The NIST framework for incident response has five steps: Identify, Protect, Detect, Respond, and Recover. It’s missing a crucial step, however: Isolate. Stay tuned for a full breakdown of this in my next article. But the reason this is so critical is because attacks move at machine speed, and are very pervasive and persistent. If your management network is not fully isolated from production assets, the infection spreads to everything. Suddenly, you’re locked out completely and looking at months of tedious recovery. For healthcare providers, this jeopardizes everything from patient care to regulatory compliance.

Isolation is integral to building a resilience system, or in other words, a system that gives you more than basic serial console/out-of-band access and instead provides an entire infrastructure dedicated to keeping you in control of your systems — be it during a ransomware attack, ISP outage, natural disaster, etc. Because this infrastructure is physically and virtually isolated from production (no dependencies on production switches/routers, no open management ports, etc.), it’s nearly impossible for attackers to lock you out.

So, what really should you do if you’re ransomware’d? Let’s walk through an example attack on a healthcare system, and compare the traditional DR (Disaster Recovery) response to the IMI/IRE approach.

Ransomware in Healthcare: Disaster Recovery vs Isolated Recovery

Suppose you’re in charge of a hospital’s network. MDIoT, patient databases, and DICOM storage are the crown jewels of your infrastructure. Suddenly, you discover ransomware has encrypted patient records and is likely spreading quickly to other crown jewel assets. The risks and potential fallout can’t be understated. Millions of people are depending on you to protect their sensitive info, while the hospital is depending on you to help them avoid regulatory/legal penalties and ensure they can continue operating.

The problem with Disaster Recovery

Though the word ‘recovery’ is in the name, the DR approach is limited in its capacity to recover systems during an attack. Disaster Recovery typically employs a couple things:

  • Backups, which are copies of data, configurations, and code that are used to restore a production system when it fails.
  • Redundancy, which involves duplicating critical systems, services, and applications as a failsafe in the event that primaries go down (think cellular failover devices, secondary firewalls, etc.).

What happens when you activate your DR processes? It’s highly likely that you won’t be able to, and that’s because the typical DR setup relies on the production network. There’s no isolation.

Think about it this way: your backup servers need direct access to the data they’re backing up. If your file servers get pwned, your backup servers will, too. If your primary firewall gets hacked, your secondary will, too. The problem with backup and redundancy systems — and any system, for that matter — is that when they depend on the underlying infrastructure to remain operational, they’re just as susceptible to outages and attacks. It’s like having a reserve parachute that depends on the main parachute.

And what about the rest of your systems? You just discovered the attack has encrypted your servers and is quickly bringing operations to a crawl. How are you going to get in and fight back? What if you try to log into your management network, only to find that you’re locked out? All of your tools, configurations, and capabilities have been compromised.

This is why CISA, the FBI, US Navy, and other agencies recommend implementing Isolated Management Infrastructure.

IMI and IRE guarantee you can fight back against ransomware

You discover that the ransomware has spread. Not only has it encrypted data and stopped operations, but it has also locked you out of your own management network and is affecting the software configurations throughout the hospital. This is where IMI (Isolated Management Infrastructure) and IRE (Isolated Recovery Environment) come in.

Because IMI is physically separate from affected systems, it guarantees management access so teams can set up communication and a temporary ‘war room’ for incident response. The IRE can then be created using a combination of cellular, compute, connectivity, and power control (see diagram for design and steps). Docker containers should be used to bring up each step.

Diagram showing a chart containing the systems and open-source tools that can be deployed for an Isolated Recovery Environment

Image: The infrastructure and incident response protocol involved in the Isolated Recovery Environment. These products were chosen from free or open source projects that have proven to be very useful in each of these stages of recovery. These can be automated in pieces for each phase, and then be brought down via Docker container to eliminate the risk of leakage or risk during each phase.

Without diving too far into the technicalities, the IRE enables you to recover survivable data, restore software configurations, and prevent reinfection. Here are some things you can do (and should do) in this scenario, courtesy of the IRE:

Establish your war room

You can’t fight ransomware if you can’t securely communicate with your team. Use the IRE to create offline, break-the-glass accounts that are not attached to email. This allows you to communicate and set up ticketing for forensics purposes.

Isolate affected systems

There’s no use running antivirus if reinfection can occur. Use the IRE to take offline the switch that connects the backup and file servers. Isolate these servers from each other and shut down direct backup ports. Then, you can remote-in (KVM, iKVM, iDRAC) to run antivirus and EDR (Endpoint Detection and Response).

Restore data and device images

The key is to have backup data at its most current, both for patient data and device/software configurations. Because the IRE provides an isolated environment, and you’ve already pulled your backups offline, you can gradually restore data, re-image devices, and restore configurations without risking reinfection. The IRE ensures devices “keep away” from each other until they can be cleansed and recovered.

Things You’ll Need To Build The IMI and IRE

Network Automation Blueprint

We’ve created a comprehensive blueprint that shows how to implement the architecture for IMI and IRE. Don’t let the name fool you. The Network Automation Blueprint covers everything from establishing a dedicated management network, to automating deployment of services for ransomware recovery. Get your PDF copy now at the link below.

Gen 3 Console Servers To Replace End-of-Life Gear

It’s nearly impossible to build the IMI or deploy the IRE using older console servers. That’s because these only give you basic remote access and a hint of automation capabilities. You’ll still need the ability to run VMs and containers. Gen 3 console servers let you do all of the things for IMI and IRE, like full control plane/data plane separation, hosting apps, and deploying VMs/containers on-demand. They’ve also been validated by Synopsys and have built-in security features I’ve been talking about for years. Check out the link below for resources about Gen 3 and how we’ll help you upgrade.

Get in touch with me!

I’d love to talk with you about IMI, IRE, and resilience systems. These are becoming more crucial to operational resilience and ransomware recovery, and countries are passing new regulations that will require these approaches. Get in touch with me via social media to talk about this!