Providing Out-of-Band Connectivity to Mission-Critical IT Resources

Network Function Virtualization with Nodegrid – Tech Talk Tuesday from ZPE Systems

Explainers & How-to’s

Network Function Virtualization with Nodegrid – Tech Talk Tuesday from ZPE Systems

Todd Atherton (Channel Sales Director) and Marc Westberg (Channel Sales Engineer) walk you through the benefits of #network function virtualization with Nodegrid. Cover the virtualization best practices, requirements, and network considerations in this 20-minute video.

Run a lean, mean stack at the edge
You don’t have to deploy multiple dedicated devices that eat up valuable space and power at the edge. Nodegrid allows you to run applications at the edge, including monitoring, security, and more, all in a 1RU or lighter package.

Automate bringup of device images
Use Nodegrid’s zero touch provisioning capabilities to automatically deploy devices at the edge. Deploy the factory-default Nodegrid box, which then securely connects to the cloud and automatically installs the image.

Gain networking flexibility
Use a wide range of networking options, including passthrough, bridge, and NAT, with plenty of link types like cellular, ethernet, and others to suit your use case.

See how we helped DigiCert consolidate 4 devices into 1, including Palo Alto Networks firewalls in a high-availability configuration, using the Nodegrid Net SR. Watch their testimonial video and download the case study here https://zpesystems.com/digicert-improving-critical-network-infrastructure/

Want more case studies? Visit our webpage https://zpesystems.com/resources/media-library/case-studies/

Are You a Partner Interested in Attending?

Visit the Tech Talk Tuesdays Page

ZPE Systems delivers innovative solutions to simplify infrastructure managment at the datacenter, branch, and edge.

Learn how our Zero Pain Ecosystem can solve your biggest network orchestration pain points.

Watch a Demo Contact Us

Video Wall

Network Management Best Practices

A collage of concepts related to network management best practices for resilience and security.

Network management involves administering, controlling, and monitoring an organization’s network. For most companies, the top priority for network teams is ensuring the continuous availability of critical business services, even during disruptive events like natural disasters, ransomware attacks, and infrastructure failures. Network resilience is the ability to continue operating (if in a degraded state) and delivering digital services in the face of adversity. This guide discusses the network management best practices for improving and supporting network resilience.  

Network management best practices

Network Management Best Practices for Resilience

Isolated Management Infrastructure (IMI)

  • Moves management interfaces off the production network to protect them from cybercriminals

  • Out-of-band (OOB) management ensures continuous remote access to IMI even when production infrastructure is offline

  • Isolated Recovery Environments (IREs) allow teams to restore infrastructure and services without risking reinfection

Network Automation

  • Reduces the risk of failures or security breaches by eliminating human error in configuration changes

  • Simplifies fleet management tasks like connectivity checks, device location monitoring, and software patching

  • Application-aware routing, intelligent load balancing, and automatic failover ensure optimal performance and availability

Network Security

  • Zero trust security protects valuable data and resources from attackers already on the network

  • SASE and SSE extend enterprise security policies and tools to remote users, applications, and devices

  • AIOps provides enhanced security monitoring, threat detection, and remediation capabilities

Isolated Management Infrastructure (IMI)

Major ransomware attacks and breaches happen so frequently that cybersecurity professionals must now operate as if the network has already been compromised. This high-threat atmosphere led to the rise of the Zero Trust Security methodology discussed below. It’s also why a recent CISA Binding Directive outlines the best practice of isolating your management interfaces to a designated management network.

Moving all control functions for network infrastructure off the production LAN reduces the risk of cybercriminals accessing your management interfaces and “crown jewel” assets. This practice is known as isolated management infrastructure (IMI), and it separates the management plane from the data plane using designated network infrastructure. Doing so prevents attackers on the production network from finding and accessing the interfaces used to control servers, firewalls, routers, and other critical infrastructure devices. Thanks to management network segmentation and zero-trust security controls, hacking an IMI is almost impossible.

A diagram showing a multi-layered isolated management infrastructure.

The best practice is to use out-of-band (OOB) serial consoles (a.k.a. console servers or terminal servers) to help construct the IMI. An OOB management solution uses dedicated network interfaces (such as 4G/5G cellular LTE or fiber) to provide an Internet connection for remote management access that doesn’t rely upon the primary production network at all. The benefit of using an OOB console server for the IMI is that teams have continuous access to monitor, manage, troubleshoot, and recover remote infrastructure when the production network is unavailable. Additionally, routing management ports to terminate on OOB terminal servers deployed top-of-rack creates multiple layers of management isolation to protect critical assets from criminals on the network.

A diagram showing the components of an isolated recovery environment.

Another network management best practice aided by IMI and OOB serial consoles is an isolated recovery environment (IRE). An IRE is built with designated infrastructure that is easily and quickly deployable, including an OOB control plane (such as a serial console), redundant storage & compute, and security and recovery tools. This gives teams a safe environment to recover from ransomware attacks without worrying about reinfection. Ideally, the IMI will use devices that consolidate network functions to enable easy deployments and scaling of IRE and OOB, but those devices should have robust features that can host the apps, tools, and services required to rebuild systems and restore data.

Network automation

Modern networks are large, complex, and ever-expanding, with user expectations growing more demanding every day. Even the best network administrator sometimes makes mistakes, either through negligence or because they have an overwhelming amount of work to do. Maybe they copy and paste the wrong security setting for a particular firewall appliance in a rush to deploy a new site on time; perhaps they miss a critical device health alert because they’re responding to a separate incident. These human errors, while understandable, can have devastating consequences on network resilience by causing security breaches, equipment failure, and service outages.

Network automation removes human error from the equation, ensuring network management tasks are carried out perfectly every time. Automation streamlines the most tedious network and fleet management tasks so teams can improve efficiency without allowing anything to fall through the cracks. Automation tools also respond to changing network conditions faster than human administrators to optimize the performance and availability of critical systems and services.

Network Automation Examples

Infrastructure as Code (IaC) abstracts infrastructure configurations from the underlying hardware so they can be written and deployed as repeatable, automatable scripts.

Zero Touch Provisioning automatically downloads and installs new network device configurations with little to no human interaction to streamline remote deployments.

Software-Defined Wide Area Networking (SD-WAN) decouples WAN control functions from the underlying hardware to enable features like application-aware routing, intelligent load balancing, and automatic failover to improve performance and availability.

Automatic Patch Management ensures software vulnerabilities are closed before being exploited by cybercriminals while providing automatic recovery and rollback in case of issues.

Network security

As discussed above, network breaches occur so frequently that it’s now a security best practice to assume attackers are already on the network. This is part of the zero trust security methodology, which follows the principle of “never trust, always verify” regarding all the users, devices, and applications that access the network. Zero trust security uses strong authentication methods (e.g., 2FA or one-time passwords), hardware roots of trust, and network micro-segmentation. These methods prevent attackers from moving around the network and accessing valuable resources (such as management interfaces).

Another security-related network management best practice is to extend zero-trust controls and policies to the network’s edges, such as to work-from-home devices, branch offices, and other remote business sites. This is achieved using edge-centric security solutions such as Security Service Edge (SSE) and Secure Access Service Edge (SASE). These technologies route remote, web-destined network traffic through a whole stack of cloud-based security solutions. This allows organizations to apply consistent security to edge traffic without creating bottlenecks at a centralized firewall or deploying additional security appliances at each site.

A diagram illustrating a basic SASE network security architecture.

Another emerging network management best practice, especially for complex, automated infrastructures, is using artificial intelligence (AI) and machine learning to aid security and recovery. For example, AIOps solutions analyze data pulled from various sources on the network, including monitoring platforms, security appliances, and system event logs. AIOps is excellent at detecting anomalies, extrapolating potential consequences, and positing solutions. It can find novel and zero-day threats on the network, spot the signs of an imminent device failure, and perform root-cause analysis (RCA) to discover the source of problems. AIOps enhances the management, automation, and security practices on this list to improve the overall efficiency and resilience of enterprise networks.

Network management FAQs

1. How do I ensure interoperability amongst network management solutions?

Managing a modern network requires many different solutions, often from many different vendors. All these solutions must work together to prevent the management plane from getting too complex and ensure there are no coverage gaps. One option is to stick within one vendor’s ecosystem, but you may miss out on beneficial features or pay for functionality you don’t need. The best approach is to use a vendor-neutral (a.k.a. vendor-agnostic) network management platform to unify all your tools. To learn more, read The Benefits of Vendor Agnostic Platforms in Network Management.

2. What’s the difference between network automation and orchestration?

Network automation and network orchestration are two concepts that are often referenced together, leading to some confusion about the difference between them. Network automation focuses on individual tasks and processes, such as deploying a single software update. Network orchestration involves coordinating and managing multiple tasks and processes, or even entire workflows, such as configuring and deploying all the software on a server. To learn more, read IT Automation vs Orchestration: What’s the Difference?

3. Is network resilience the same as redundancy and backups?

Redundancy and backups are both critical to business continuity, but they do not equate to network resilience. Backups are copies of data, configurations, and code that are used to restore failed (or compromised) production systems. Redundancy duplicates services, applications, and systems so the primary versions can be “failed over” in case of failure or attack. Resilience is an organization’s overall ability to recover or adapt when major disruptions occur. To learn more, read Network Resilience: What is a Resilience System?

Resilient network management with Nodegrid

These network management best practices represent the industry-leading solutions for addressing the most common resilience challenges facing organizations. The network resilience experts at ZPE Systems can help you implement these practices with Gen 3 out-of-band management solutions and a vendor-neutral network management platform that supports automation. ZPE’s Nodegrid platform is the perfect ransomware recovery multi-tool, providing an isolated control plane as well as access to all the tools and software needed to restore critical operations.

Network management best practices for ransomware recovery and resilience

Learn more about using Nodegrid to improve ransomware resilience by downloading our white paper, 3 Steps to Ransomware Recovery.

Download Whitepaper

ZPE Systems offers various solutions to help you implement your enterprise network management strategy.

Including data center infrastructure management, critical remote infrastructure management, and a secure uCPE gateway for distributed branch & edge networks. To learn more, contact us online. 

Contact Us

Network Resilience: What is a Resilience System?

A digital web of interconnected network resilience concepts being selected by a business person in a suit.

Network resilience means being able to withstand or recover from adversity, service degradation, and complete outages with minimal business disruption. The longer business-critical services are down, or systems are breached, the greater the risk of significant financial, reputational, and legal consequences. A resilience system is a set of technologies that enable an organization to continue operating while teams work to repair failures and recover from cyberattacks. But what exactly is a resilience system, and what does it look like? This guide to network resilience defines resilience systems, provides example use cases, compares them to related technologies like backups and redundant systems, and describes the key components required to build them.

What is a resilience system?

A resilience system provides all the infrastructure, tools, and services necessary to continue operating, if in a degraded state, during major incidents. It also includes everything needed to recover data, rebuild systems, perform security testing, and continue delivering core business functionality. A resilience system is typically isolated from the production network, preventing cybercriminals from finding and compromising it and ensuring teams have continuous access even if the primary network goes down.

Resilience system use cases

Some examples of the challenges that resilience systems help overcome include:

1. Ransomware recovery

In a ransomware attack, cybercriminals infect systems with malware that spreads throughout the network and encrypts any data it encounters. Modern ransomware now uses packaged attacks that move at machine speed, instantly incapacitating entire networks. Organizations completely lose access to critical systems and data until they pay a ransom, often in untraceable cryptocurrency. Ransomware is an exceptionally tenacious form of malware and tends to reinfect backup data and rebuilt systems, significantly hampering recovery efforts and increasing the duration and cost of the attack. The best practice for resilience systems is to isolate them on an out-of-band (OOB) network, inaccessible to hackers who have breached the production in-band network. Doing so creates a safe, isolated recovery environment (IRE) where teams can restore critical data and systems without the risk of reinfection. The resilience system includes all the tools and hardware needed to restore critical business services and infrastructure. An IRE significantly accelerates ransomware recovery and minimizes downtime, so businesses can avoid paying ransoms and reduce the overall cost of attacks.

2. Network outages

Enterprise network architectures and supply chains are highly complex, with lots of moving parts that rely on external vendors to maintain availability. Just one of those vendors dropping the ball could take the entire organization offline, severely impacting network resilience. For example, in 2023, an expired cryptographic certificate caused Cisco’s Viptela SD-WAN appliances to fail on reboot, completely taking down affected networks until the issue was resolved. With a resilience system, Viptela customers could have potentially avoided this downtime by failing over to alternative network resources. For example, a resilience system with integrated cellular failover allows branches to continue connecting to and delivering critical business services while also providing a lifeline for remote teams to access and recover failed systems. A resilience system also provides observability and automatic notifications so teams are instantly alerted to issues like certificate expirations and can respond quickly to recover critical services.

3. Shift to remote work

Incidents like ransomware attacks and equipment failures happen frequently enough that companies can create detailed plans and proactively implement solutions to minimize their impact, but not all adverse events are so predictable. When the COVID-19 pandemic struck, the massive shift to remote work strained the network resources of most organizations. Instead of maintaining a limited number of branch offices, teams suddenly had to treat every employee as a new branch, leading to performance degradation and outages as they scrambled to reinforce the business’s remote capabilities. A resilience system gives teams the tools and resources they need to provision additional infrastructure, manage networking logic, deploy new security solutions, and more, even while the primary network is offline or under a heavy load. A resilience system is the key to quickly adjusting network performance and security to adapt to sudden changes like a transition to fully remote operations.

Do backups and redundancy equate to network resilience?

The short answer is no; backups and redundancy do not equate to network resilience, though they do contribute to making systems more resilient.

  • Backups are copies of data, configurations, and application code used to do a hot or cold restore when a production system fails. The underlying infrastructure must remain operational for teams to access and use backups, and unless additional resilience measures are taken, it’s easy for backups to become infected or compromised, severely hampering recovery efforts.
  • Redundancy involves duplicating critical systems, services, and applications as a failsafe in case the primaries go down. Organizations can “fail over” to the redundancies to continue critical business operations during outages. However, redundant systems are just as susceptible to failures and infections without additional resilience measures like out-of-band management and isolated management infrastructure.

Backups and redundancy are part of network resilience but alone are not enough to ensure business continuity. Resilience systems focus on maintaining the architecture of the production network while adding the ability to recover or adapt to adversity. The next section discusses all the tools and technologies that make up network resilience systems.

What does a resilience system look like?

There are four key components that go into a resilience system.

Key Components of a Resilience System

Alternative Networking

Full-stack routing and switching, Wi-Fi, VoIP, virtualization, software-defined network overlays for SDN & SD-WAN

Alternative Compute

Full-stack compute, containers, virtual machines, and any other resources needed to run applications and deliver services

Storage & Storage Recovery

Enough storage to recover systems and applications as well as support content delivery

Automation

Tools like zero-touch provisioning (ZTP) to facilitate speedy recovery while minimizing human error

Alternative networking and compute resources ensure the organization can failover in the event of a network failure or continue delivering services when production servers are unavailable. Teams also need enough storage to restore backup data, build new systems, and support the content delivery network (CDN). Automation solutions like zero-touch provisioning (ZTP), configuration management, and security validation tools accelerate the recovery process while mitigating the risk of human error. Combined, these components enable teams to reduce the frequency, severity, and duration of outages, improving overall network resilience.

Network resilience with ZPE Systems

A resilient network will continue delivering critical business services in the face of any challenge, whether from cybercriminals, supply chain issues, global events, or even plain human error. A resilience system is isolated from the production network to ensure security and availability, and it consists of all the tools and technologies needed to troubleshoot, recover, and deliver your most crucial data, applications, and infrastructure. The Nodegrid platform from ZPE Systems is the perfect foundation for a resilience system. Nodegrid is a vendor-neutral, out-of-band management solution capable of running your choice of third-party software. Nodegrid allows you to build a highly customizable IRE containing all the tools needed to safely recover from ransomware. You can even use Nodegrid to deliver services while the primary network or systems are down, making it your all-in-one network resilience multi-tool.

Want to ensure network resilience by accelerating ransomware recovery?

Minimize the business impact of ransomware with the help of our whitepaper, 3 Steps to Ransomware Recovery. Learn how to follow Gartner’s best practices to build an Isolated Recovery Environment

Download Whitepaper