Edge Management and Orchestration

by Jordan Baker | Sep 28, 2023 | Consolidation, Edge Computing, EdgeOps, Improve Network Security, Increase Productivity, NetDevOps, NetDevOps Transformation, Network Automation, Network Edge Orchestration, SD-WAN, Secure Access Service Edge (SASE), Security Service Edge (SSE), Vendor Neutral Platform, Zero Touch Provisioning (ZTP)

Organizations prioritizing digital transformation by adopting IoT (Internet of Things) technologies generate and process an unprecedented amount of data. Traditionally, the systems used to process that data live in a centralized data center or the cloud. However, IoT devices are often deployed around the edges of the enterprise in remote sites like retail stores, manufacturing plants, and oil rigs. Transferring so much data back and forth creates a lot of latency and uses valuable bandwidth. Edge computing solves this problem by moving processing units closer to the sources that generate the data.

IBM estimates there are over 15 billion edge devices already in use. While edge computing has rapidly become a vital component of digital transformation, many organizations focus on individual use cases and lack a cohesive edge computing strategy. According to a recent Gartner report, the result is what’s known as “edge sprawl”: many individual edge computing solutions deployed all over the enterprise without any centralized control or visibility. Organizations with disjointed edge computing deployments are less efficient and more likely to hit roadblocks that stifle digital transformation.

The report provides guidance on building an edge computing strategy to combat sprawl, and the foundation of that strategy is edge management and orchestration (EMO). Below, this post summarizes the key findings from the Gartner report and discusses some of the biggest edge computing challenges before explaining how to solve them with a centralized EMO platform.

Key findings from the Gartner report
Edge computing challenges
. Enabling extensibility
. Extracting value from edge data
. Governing edge data
. Supporting edge-native applications
. Securing the edge
. Edge management and orchestration
Edge management and orchestration with Nodegrid

Key findings from the Gartner report

Many organizations already use edge computing technology for specific projects and use cases – they have an individual problem to solve, so they deploy an individual solution. Since the stakeholders in these projects usually aren’t architects, they aren’t building their own edge computing machines or writing software for them. Typically, these customers buy pre-assembled solutions or as-a-service offerings that meet their specific needs.

However, a piecemeal approach to edge computing projects leaves organizations with disjointed technologies and processes, contributing to edge sprawl and shadow IT. Teams can’t efficiently manage or secure all the edge computing projects occurring in the enterprise without centralized control and visibility. Gartner urges I&O (infrastructure & operations) leaders to take a more proactive approach by developing a comprehensive edge computing strategy encompassing all use cases and addressing the most common challenges.

Edge computing challenges

Gartner identifies six major edge computing challenges to focus on when developing an edge computing strategy:

Gartner’s 6 edge computing challenges to overcome
Enabling extensibility so edge computing solutions are adaptable to the changing needs of the business.	Extracting value from edge data with business analytics, AIOps, and machine learning training.
Governing edge data to meet storage constraints without losing valuable data in the process.	Supporting edge-native applications using specialized containers and clustering without increasing the technical debt.
Securing the edge when computing nodes are highly distributed in environments without data center security mechanisms.	Edge management and orchestration that supports business resilience requirements and improves operational efficiency.

Let’s discuss these challenges and their solutions in greater depth.

Enabling extensibility – Many organizations deploy purpose-built edge computing solutions for their specific use case and can’t adapt when workloads change or grow. The goal is to attempt to predict future workloads based on planned initiatives and create an edge computing strategy that leaves room for that growth. However, no one can really predict the future, so the strategy should account for unknowns by utilizing common, vendor-neutral technologies that allow for expansion and integration.

Extracting value from edge data – The generation of so much IoT and sensor data gives organizations the opportunity to extract additional value in the form of business insights, predictive analysis, and machine learning training. Quickly extracting that value is challenging when most data analysis and AI applications still live in the cloud. To effectively harness edge data, organizations should look for ways to deploy artificial intelligence training and data analytics solutions alongside edge computing units.

Governing edge data – Edge computing deployments often have more significant data storage constraints than central data centers, so quickly distinguishing between valuable data and destroyable junk is critical to edge ROIs. With so much data being generated, it’s often challenging to make this determination on the fly, so it’s important to address data governance during the planning process. There are automated data governance solutions that can help, but these must be carefully configured and managed to avoid data loss.

Supporting edge-native applications – Edge applications aren’t just data center apps lifted and shifted to the edge; they’re designed for edge computing from the bottom up. Like cloud-native software, edge apps often use containers, but clustering and cluster management are different beasts outside the cloud data center. The goal is to deploy platforms that support edge-native applications without increasing the technical debt, which means they should use familiar container management technologies (like Docker) and interoperate with existing systems (like OT applications and VMs).

Securing the edge – Edge deployments are highly distributed in locations that may lack many physical security features in a traditional data center, such as guarded entries and biometric locks, which adds risk and increases the attack surface. Organizations must protect edge computing nodes with a multi-layered defense that includes hardware security (such as TPM), frequent patches, zero-trust policies, strong authentication (e.g., RADIUS and 2FA), and network micro-segmentation.

Edge management and orchestration – Moving computing out of the climate-controlled data center creates environmental and power challenges that are difficult to mitigate without an on-site technical staff to monitor and respond. When equipment failure, configuration errors, or breaches take down the network, remote teams struggle to meet resilience requirements to keep business operations running 24/7. The sheer number and distribution area of edge computing units make them challenging to manage efficiently, increasing the likelihood of mistakes, issues, or threat indicators slipping between the cracks. Addressing this challenge requires centralized edge management and orchestration (EMO) with environmental monitoring and out-of-band (OOB) connectivity.

A centralized EMO platform gives administrators a single-pane-of-glass view of all edge deployments and the supporting infrastructure, streamlining management workflows and serving as the control panel for automation, security, data governance, cluster management, and more. The EMO must integrate with the technologies used to automate edge management workflows, such as zero-touch provisioning (ZTP) and configuration management (e.g., Ansible or Chef), to help improve efficiency while reducing the risk of human error. Integrating environmental sensors will help remote technicians monitor heat, humidity, airflow, and other conditions affecting critical edge equipment’s performance and lifespan. Finally, remote teams need OOB access to edge infrastructure and computing nodes, so the EMO should use out-of-band serial console technology that provides a dedicated network path that doesn’t rely on production resources.

Gartner recommends focusing your edge computing strategy on overcoming the most significant risks, challenges, and roadblocks. An edge management and orchestration (EMO) platform is the backbone of a comprehensive edge computing strategy because it serves as the hub for all the processes, workflows, and solutions used to solve those problems.

Learn more about zero trust on the control plane

Edge management and orchestration (EMO) with Nodegrid

Nodegrid is a vendor-neutral edge management and orchestration (EMO) platform from ZPE Systems. Nodegrid uses Gen 3 out-of-band technology that provides 24/7 remote management access to edge deployments while freely interoperating with third-party applications for automation, security, container management, and more. Nodegrid environmental sensors give teams a complete view of temperature, humidity, airflow, and other factors from anywhere in the world and provide robust logging to support data-driven analytics.

The open, Linux-based Nodegrid OS supports direct hosting of containers and edge-native applications, reducing the hardware overhead at each edge deployment. You can also run your ML training, AIOps, data governance, or data analytics applications from the same box to extract more value from your edge data without contributing to sprawl.

In addition to hardware security features like TPM and geofencing, Nodegrid supports strong authentication like 2FA, integrates with leading zero-trust providers like Okta and PING, and can run third-party next-generation firewall (NGFW) software to streamline deployments further.

The Nodegrid platform brings all the components of your edge computing strategy under one management umbrella and rolls it up with additional core networking and infrastructure management features. Nodegrid consolidates edge deployments and streamlines edge management and orchestration, providing a foundation for a Gartner-approved edge computing strategy.

Want to learn more about how Nodegrid can help you overcome your biggest edge computing challenges?

Contact ZPE Systems for a free demo of the Nodegrid edge management and orchestration platform.

Dissecting the MGM Cyberattack: Lions, Tigers, & Bears, Oh My!

by Jordan Baker | Sep 25, 2023 | Improve Network Security, Micro-segmentation, Minimize Impact of Disruptions, Network Automation, Out of Band Management, SecOps, User Management, Zero Trust Security

This article was written by James Cabe, CISSP, whose cybersecurity expertise has helped major companies including Microsoft and Fortinet.

The recent MGM cyberattack reportedly caused the company to lose millions in revenue per day. The successful kill chain attack — originally a military tactic used to accomplish a particular objective — granted inside access to the attackers, who encrypted and held for ransom some of MGM’s most prized assets. These ‘crown jewel’ assets, as they’re called in the cybersecurity realm, are most critical to the accomplishment of an organization’s mission. Because ransomware attacks persist in corporate networks until fully cleared, organizations must be ready to “fight through” an attack using resilient systems and effective procedures. This should involve identifying these crown jewels and designing them in a way that ensures they can operate through attacks.

When these types of large-profile attacks occur, many cast their eyes at cybersecurity leaders for failing to fend off the bad guys. The reality is these leaders struggle to get budget, corporate buy-in, and digital assets that are required to build a strong defense for business continuity. For MGM, it’s likely they also faced difficulty operationalizing current assets across a gigantic digital estate, and ultimately lacked a plan to recover from a total outage of crown jewel assets.

From the attacker’s perspective, an exceptional level of intelligence and preparation are required in order to understand a target’s internal operations and architecture and execute a successful kill chain. Successfully attacking a sophisticated organization like MGM requires rapid information stealing to capture and leverage cloud credentials, as well as to lock up those resources and lock out the most important support staff in an organization. This is the crux of the issue: infostealers and ransomware automate the mass grabbing of resources and quickly set up a denial of services for the stakeholders that are responsible for fixing these systems.

How did the MGM cyberattack start? After MGM discovered the breach, how did the attacker stay one step ahead? What approach should organizations take to ensure they can recover if they’re targeted?

Who Started The MGM Cyberattack, and How?

The MGM cyberattack began after an adversary group named “Scattered Spider” used phishing over the phone, an approach called ‘vishing,’ to convince MGM’s customer support rep into granting them access with elevated privileges. Scattered Spider is the same group responsible for the SIM-swapping campaign that happened a few months ago, where they successfully subverted multifactor authentication. Their primary tactic involves social engineering, which they use to steal personal information from employees.

MGM and many other casinos currently use advanced Zero Trust identity security from Okta. However, the attacker was able to trick the service desk into resetting a password to gain access into the network. Even with newer Zero Trust identity solutions, most organizations unravel once attackers get to the real “chewy center” of the network: the humans operating them.

Okta is quoted saying, “In recent weeks, multiple US-based Okta customers have reported a consistent pattern of social engineering attacks against their IT service desk personnel, in which the caller’s strategy was to convince service desk personnel to reset all multi-factor authentication (MFA) factors enrolled by highly privileged users.” Okta further warned, “The attackers then leveraged their compromise of highly privileged Okta Super Administrator accounts to abuse legitimate identity federation features that enabled them to impersonate users within the compromised organization.”

The MGM cyberattack and those like it are more about processes than technology. Let’s explore how the attack progressed, and how the criminals were successful at staying persistent and ultimately hitting their goal.

How Did A Simple Authentication Attack Morph Into a Complex Attack?

The Scattered Spider threat actors use a platform written by UNC3944 or AlphaV (known by several names). This is a middleware developer for attack platforms that allow criminals to follow a specific set of instructions (a kill chain) to gain access and ultimately encrypt and exfiltrate data from a targeted company. AlphaV’s platform is called BlackCat, which they use to establish a foothold, establish Command and Control (C2) for the malware, and exfiltrate data, to ultimately get paid.

With elevated Okta privileges at MGM, Scattered Spider deployed a file containing a Java-based remote access trojan, which became a “vending machine” for other remote access trojans (RATs) that sought out other nearby machines to spread quickly. The AlphaV RAT would ‘pwn‘ MGM’s Azure virtual servers to gain access, then sniff for more user passwords and create dummy accounts.

These RATs leveraged a built-in tool called “POORTRY,” the Microsoft Serial Console driver turned malicious, to terminate selected processes on Windows systems (e.g., Endpoint Detection and Response (EDR) agents on endpoints). AlphaV, the platform maintainer, signed the POORTRY driver with a Microsoft Windows Hardware Compatibility Authenticode signature. This helped the malware to evade most Endpoint Detection software.

This tool was used to get elevated and persistent access to the Okta Proxy servers that were in the scope of the attack and accessible remotely by the attacker. This attack can evade a lot of detection tools. This access allowed them to capture AM\IAM accounts that allowed them greater access to the organization. This stealing of credentials from the Okta Proxy servers was confirmed by Okta responders as well as the threat actor on their blog. This is called a “living off the land” attack.

How Did MGM Discover the Cyberattack?

The first notification of the hack was dropped on the VXUnderground forums. The staff there verified through chat contact with the threat group UNC3944\AlphaV, who works in conjunction with the Scattered Spider threat actor, The attacker also confirmed this on their blog on the darknets.

On September 11, 2023, anyone attempting to visit MGM’s website was greeted by a message stating that the website was currently unavailable. The attack also stopped hotel card readers, gaming machines, and other equipment critical to MGM’s day-to-day operations and revenue generating activities.

How Did the Attacker Maintain Control?

The initial attack allowed AlphaV, who runs the C2 (Command and Control) networks for the RattyRat trojan, to have remote access to the VMware server farm that services the guest systems, the gaming control platforms, and possibly the payment processing systems. They maintained control despite all of MGM’s attempts to mitigate the problem, because they were able to establish elevated access in places the organization could not easily remove them from without removing access to the whole organization. They established something called “persistence.”

From the attacker’s blog on the darknet, “MGM made the hasty decision to shut down every one of their Okta Sync servers after learning that we had been lurking on their Okta Agent servers sniffing passwords of people whose passwords couldn’t be cracked from their domain controller hash dumps. At this point MGM being completely locked out of their local environment. Meanwhile the attacker continued having super administrator privileges to their Okta, along with Global Administrator privileges to their Azure tenant. They made an attempt to evict us after discovering that we had access to their Okta environment, but things did not go according to plan. On Sunday night, MGM implemented conditional restrictions that barred all access to their Okta (MGMResorts.okta.com) environment due to inadequate administrative capabilities and weak incident response playbooks. Their network has been infiltrated since Friday. Due to their network engineers’ lack of understanding of how the network functions, network access was problematic on Saturday. They then made the decision to ‘take offline’ seemingly important components of their infrastructure on Sunday. After waiting a day, we successfully launched ransomware attacks against more than 100 ESXi hypervisors in their environment on September 11th after trying to get in touch but failing.“

MGM tried many things to remove access into their network. However, because of an advanced attack that installed a shadow identity provider in their own Identity Solution, they were able to maintain access long enough to redeploy access to most of the assets they found to be the backbone of the company. AlphaV was then able to encrypt most of the crown jewels of MGM’s operations network.

Is There a Way to Stop These Types of Attacks?

The MGM cyberattack required physical reconnaissance, patience, and a lot of planning to set up the kill chain. Playbooks that can protect against this kind of attack are hard to create, because it can mean taking all guest services offline for a period, which requires very high authority in the organization. One of the comments from the attacker was that the organization did not act fast enough to take all remote access offline to their management framework that consisted of Okta Proxy Servers. When they did, the adversary was then able to lock them out by submitting a Multifactor Authentication Reset. To stall the attacker, they would have had to induce a full outage of their crown jewels while a formal assessment of all assets could be performed. Taking assets offline requires buy-in at the board level and executive level, which are difficult to come by even if an organization emphasizes its operational excellence, detection, and defense.

Organizations should have a plan to quickly recover from a total loss of a site, outside of backups (which can be lost) and disaster recovery sites. Organizations need to be properly hard-segmented into a full IMI (Isolated Management Infrastructure). Keeping crown jewels safe from an attacker that targets the chewiest part of an organization should be top of any list going from 2023 budget to 2024 planning.

The following is a light version of what can be done in a fully-automated response that can take mere hours instead of days for an outage (a full operations blueprint will be out in the near future).

An IMI can host an IRE (Isolated Recovery Environment), which is used to cut off all user data and remote access (except for OOB) to an entire infected site. A properly implemented recovery environment should automate most of these activities to speed up the recovery. One of the first considerations is the requirement for a secondary organization in your IAM that is not attached to normal operations. This is what is known as a set of “Break the Glass” accounts. These are known in military circles but have made it into formal practice as part of a strong playbook for ransomware. Once you do this, you can instantiate selected Zero Trust remote access to the site using credentials that are not in the scope of the attack, and then bring up a communications channel for a virtual war room using software like Rocket Chat, Jitsi, Slack, or other standalone communications tools that are installable on the IRE environment.

Avoiding normal authentication methods or IAM and normal communication channels is required for the integrity of the recovery and strengthens the recovery playbook. During this time, no email may be used that is associated directly with the organization. Ideally, email should never touch an account that is associated with it either.

The next step is to create a new set of clean side networks that do not directly connect to the main backbone or put it behind another firewall for triage good/bad. Using a sniffer software running on the IRE, the recovery team can then run a passive scan or an active scanner against all machines continuing to try to send email to exchange\M365. You can give access to people that are deemed good (not sending traffic) but lock off (with an EDR) the ability to open Outlook for a while, while keeping them on the web email. From there, continue working through to find all the sending drivers to see if they have a good backup. If not, back up the infected drive for offline data retrieval for later. Then reimage while scanning the UEFI BIOS during boot (if needed, run an IPMI scan). If the site has a list of assets that are considered crown jewels, prioritize these.

Once you have a segmented “clean side” established with all the network services required to operate the site (DNS, IAM, DHCP), then Internet access can be restored to this site on a limited basis; which means only out-bound communications, nothing in-bound. Restorative operations can continue apace. making sure that the infected side assets are captured in backup for later forensics following chain-of-custody if damages exceeding insurance limits are found to be the case. This is decided in the war room.

Get the Blueprint for Isolated Management Infrastructure

Maintaining control of critical systems is something security practitioners deal with in the Operational Technology (Industrial Control Systems) side of an organization. For them, the critical and most impactful part of the problem is the loss of control rather than the loss of data, a problem highlighted by the MGM cyberattack. Operational Technology Safety and Security teams set up and maintain Safety Systems as a fallback measure in case of any kind of disaster. This automation allows fallback of services safely, from which point they can recover operations. In 2023, most of our business is done on computers and networks. It is how to plan for business continuity. Now is the time that IT started following this safety system blueprint as well.

Download the Network Automation Blueprint now, which helps you lay the groundwork for your IMI so you can recover from any attack.

Download blueprint

Get in touch with me!

True security can only be achieved through resilience, and that’s my mission. If you want help shoring up your defenses, building an IMI, and implementing a Resilience System, get in touch with me. Here are links to my social media accounts:

What is a radio access network (RAN)?

by Jordan Baker | Aug 25, 2023 | EdgeOps, Failover Connectivity, Improve Network Security, Micro-segmentation, Network Automation, Network Edge Orchestration, Remote Network Management, SD-WAN, Secure Access Service Edge (SASE), Security Service Edge (SSE), Simplify Branch Infrastructure, User Management

This post provides an introduction to radio access networks (RAN) before discussing 5G RAN challenges, solutions, and use cases.

5G cellular technology is used for internet of things (IoT) deployments and operational technology (OT) automation across many different kinds of organizations, including city governments, global logistics companies, and healthcare providers. 5G access is provided by a radio access network (RAN) using mobile towers and small cells, but deploying these networks is challenging due to numerous factors, including poor public opinion. This post provides an introduction to radio access networks before discussing 5G RAN challenges, solutions, and use cases.

Table of Contents:

What is a Radio Access Network (RAN)?
What is 5G RAN?
5G Radio Access Network (RAN) challenges
A new approach to 5G deployments

What is a Radio Access Network (RAN)?

A radio access network (RAN) is the portion of a cellular network that connects smartphones and other end-user devices to the internet. Information is communicated back and forth between smartphones and the RAN’s transceivers via radio waves. Those wireless signals are translated into digital form, passed to the core network, and then to the global internet.

What is 5G RAN?

Every cellular generation has its own associated RAN technology. 4G RAN was the first generation based entirely on the internet protocol (IP) rather than older circuit-based technology. The newest generation, 5G, supports faster speeds, great capacity, and lower latency than previous generations. However, there are significant challenges in the way of 5G implementation.

5G Radio Access Network (RAN) challenges

There are three major hurdles to 5G implementation:

Public opinion – Thanks in part to misinformation and conspiracy theories, there has been a lot of resistance to 5G implementations. While many people already use smartphones with 5G technology, they tend to balk at the idea of giant cell towers and masts going up in their town or city.
mmWave limitations – Wireless frequencies in the mmWave (millimeter wave) spectrum provide the speed and capacity required for 5G, but they have a shorter range and difficulty penetrating walls. That makes 5G tricky in industrial settings and office buildings.
Remote recovery – A 5G RAN typically operates in cramped spaces without a continuous human presence, and administrators monitor and manage the equipment remotely over the cellular network. However, if that cell link goes down due to equipment failure or natural disaster, teams are cut off, and a truck must be rolled to fix the issue, adding significant costs and downtime.

Addressing these hurdles is complicated, as the solutions often create additional challenges. For example, the first two points can be addressed with 5G small cell technology. Small cells are typically compact enough to deploy on top of buildings or street furniture to extend 5G coverage into densely populated areas without a full-size mobile mast. This makes 5G small cell networks more palatable to city officials and the general public alike. However, small cells are still subject to planning restrictions, and the absence of a common 5G small cell framework makes the application process difficult and time-consuming.

In addition, some small cells are tiny enough to deploy indoors, improving 5G propagation and coverage in buildings. However, operators would need to deploy dozens or hundreds of small cells to achieve the speed and reliability needed for industrial IoT and high-tech use cases. Each one requires significant power resources as well as a fiber or wireless backhaul, and due to a lack of standardization, operators may even have to submit many individual planning applications. Plus, a small cell network of that size is complex to monitor and manage, requiring additional hardware and software solutions that add even more costs and complexity.

Addressing the third point requires an out-of-band network connection to 5G RAN deployments. For example, a 4G/LTE serial console provides an alternative internet connection so teams can remotely access RAN equipment during 5G outages. A serial console directly connects to radio access network infrastructure so remote administrators can do things like reboot a hung device or refresh DHCP even if the local network is down.

However, many serial consoles suffer from vendor lock-in, meaning they don’t connect to all devices or support third-party management, troubleshooting, and recovery tools. This either limits an administrator’s ability to remotely recover from outages or forces them to deploy additional hardware and software solutions to gain all the remote functionality required, adding to the expense and complexity of 5G RAN deployments.

A new approach to 5G deployments

The upgrade from 4G to 5G is proving to be more fraught than previous transitions between generations, so it’s clear that a new approach is needed. Small cell technology is a good start, but a lack of standardization severely hampers its adoption. Help is on the way, though – a group called the Small Cell Forum (SCF), which is made up of wireless leaders like AT&T, Cisco, Qualcomm, and Samsung, is working to establish a set of common definitions and recommendations to help the industry standardize 5G small cell networks.

In their definitional report, the SCF highlights the need for vendor-neutral hardware that’s customizable and swappable for various 5G use cases. Architectural design and planning applications are simpler when all of a small cell network’s equipment supports the same common 5G interface. Multi-functional devices combining networking, out-of-band access, and third-party application hosting significantly reduce expenses and management complexity.

Let’s examine some potential 5G use cases that could benefit from this new approach.

Smart cities

A smart city is the ideal use case for a 5G small cell network. Since wireless clients are packed into densely populated areas, an array of 5G small cells should provide sufficient coverage without the need for a full-sized mast. Deploying a small, vendor-neutral, multi-functional device like the Nodegrid Mini Services Router alongside small cells provides flexible backhaul options, out-of-band remote management, and application hosting. Installing small cells and Mini SRs on streetlamps, parking structures, and other public infrastructure gives teams everything they need to remotely monitor, operate, and recover 5G smart city infrastructure without adding more complexity to the network.

Global asset tracking and logistics

The internet of things (IoT) makes it possible for large, global enterprises to streamline asset tracking and supply chain logistics. Organizations use IoT-enabled devices to handle inventory management, fulfillment, shipment tracking, quality control, and more. 5G small cell technology provides the necessary speed, coverage, and bandwidth, but the sheer number of devices – and their global distribution – creates a lot of management complexity.

All-in-one solutions like Nodegrid reduce the tech stack by combining networking, management, and application hosting in a single box. Plus, Nodegrid provides a centralized management platform that can unify all connected devices, apps, and services in a single place. Administrators get a single pane of glass to monitor, control, troubleshoot, and automate the entire global architecture, reducing costs and streamlining operations.

Building automation

Many large property management companies rely on building automation systems that use operational technology (OT) to control door locks, lighting, HVAC, and more with very little human intervention. 5G’s improved speed and lower latency open up even greater automation capabilities, especially in warehouses and manufacturing plants.

Nodegrid’s compact, vendor-neutral solutions give remote operators a reliable, out-of-band connection to automated building systems to keep businesses running 24/7, even during 5G outages or LAN failures. You can deploy the Mini SR in cramped or semi-outdoor spaces to extend monitoring, security, and management coverage to every part of the 5G deployment. Nodegrid enables end-to-end building automation and makes 5G networks more resilient to failure.

Simplifying 5G with Nodegrid

A 5G radio access network (RAN) provides internet access to 5G-enabled systems, such as smartphones and IoT devices. While 5G deployments are proving complicated and fraught with issues, these challenges are overcome using small cell technology and vendor-neutral, multi-function devices like Nodegrid. Nodegrid’s integrated services routers deliver all-in-one networking, out-of-band management, backhauling, and application hosting capabilities to simplify 5G deployments without compromise.

Learn how Nodegrid can help deliver simplified 5G with out-of-band management!

Request a free Nodegrid demo to see how vendor-neutral solutions simplify 5G radio access network (RAN) deployments.

Data Center Migration Checklist

by Jordan Baker | Aug 18, 2023 | Data Center Management, Data Center Resilience, DevOps, NetDevOps Transformation, Network Automation, Remote Network Management, Scripting, Streamline Deployments, Zero Touch Provisioning (ZTP)

A data center migration is represented by a person physically pushing a rack of data center infrastructure into place

Various reasons may prompt a move to a new data center, like finding a different provider with lower prices, or the added security of relocating assets from an on-premises location to a colocation facility or private cloud.

Despite the potential benefits, data center migrations are often tough on enterprises, both internally and from the client side of things. Data center managers, systems administrators, and network engineers must cope with the logistical difficulties of planning, executing, and supporting the move. End-users may experience service disruptions and performance issues that make their jobs harder. Migrations also tend to reveal any weaknesses in the actual infrastructure that’s moved, which means systems that once worked perfectly may require extra support during and after the migration.

The best way to limit headaches and business disruptions is to plan every step of a data center migration meticulously. This guide provides a basic data center migration checklist to help with planning and includes additional resources for streamlining your move.

Data center migration checklist

Data center migrations are always complex and unique to each organization, but there are typically two major approaches:

Lift-and-shift. You physically move infrastructure from one data center to another. In some ways, this is the easiest approach because all components are known, but it can limit your potential benefits if gear remains in racks for easy transport to the new location rather than using the move as an opportunity to improve or upgrade certain parts.

New build. You replace some or all of your infrastructure with different solutions in a new data center. This approach is more complex because services and dependencies must be migrated to new environments, but it also permits organizations to simultaneously improve operational processes, cut costs, and update existing tech stacks.

The following data center migration checklist will help guide your planning for either approach and ensure you’re asking the right questions to prepare for any potential problems.

Quick Data Center Migration Checklist

Conduct site surveys of the current and the new data centers to determine the existing limitations and available resources, like space, power, cooling, cable management, and security.
Locate – or create – documentation for infrastructure requirements such as storage, compute, networking, and applications.
Outline the dependencies and ancillary systems from the current data center environment that you must replicate in the new data center.
Plan the physical layout and overall network topology of the new environment, including physical cabling, out-of-band management, network, storage, power, rack layout, and cooling.
Plan your management access, both for the deployment and for ongoing maintenance, and determine how to assist the rollout (for example, with remote access and automation).
Determine your networking requirements (e.g., VLANs, IP addresses, DNS, MPLS) and make an implementation plan.
Plan out the migration itself and include disaster recovery options and checkpoints in case something changes or issues arise.
Determine who is responsible for which aspects of the move and communicate all expectations and plans.
Assign a dedicated triage team to handle end-user support requests if there are issues during or immediately after the move.
Create a list of vendor contacts for each migrated component so it’s easier to contact support if something goes wrong.
If possible, use a lab environment to simulate key steps of the data center migration to identify potential issues or gaps.
Have a testing plan ready to execute once the move is complete to ensure infrastructure integrity, performance, and reliability in the new data center environment.

1. Site surveys

The first step is to determine your physical requirements – how much space, power, cooling, cable management, etc., you’ll need in the new data center. Then, conduct site surveys of the new environment to identify existing limitations and available resources. For example, you’ll want to make sure the HVAC system can provide adequate climate control – specific to the new locale – for your incoming hardware. You may need to verify that your power supply can support additional chillers or dehumidifiers, if necessary, to maintain optimal temperature ranges. In addition to physical infrastructure requirements, factors like security and physical accessibility are important considerations for your new location.

2. Infrastructure documentation

At a bare minimum, you need an accurate list of all the physical and virtual infrastructure you’re moving to the new data center. You should also collect any existing documentation on your application and system requirements for storage, compute, networking, and security to ensure you cover all these bases in the migration. If that documentation doesn’t exist, now’s the time to create it. Having as much documentation as possible will streamline many of the following steps in your data center move.

3. Dependencies and ancillary services

Aside from the infrastructure you’re moving, hundreds or thousands of other services will likely be affected by the change. It’s important to map out these dependencies and ancillary services to learn how the migration will affect them and what you can do to smooth the transition. For example, if an application or service relies on a legacy database, you may need to upgrade both the database and its hardware to ensure end-users have uninterrupted access. As an added benefit, creating this map also aids in implementing micro-segmentation for Zero Trust security.

4. Layout and topology

The next step is to plan the physical layout of the new data center infrastructure. Where will network, storage, and power devices sit in the rack and cabinets? How will you handle cable management? Will your planned layout provide enough airflow for cooling? This is also the time to plan the network topology – how traffic will flow to, from, and within the new data center infrastructure.

5. Management access

You must determine how your administrators will deploy and manage the new data center infrastructure. Will you enable remote access? If so, how will you ensure continuous availability during migration or when issues arise? Do you plan to automate your deployment with zero touch provisioning?

Learn more about enabling 24/7 remote access and end-to-end automation with a Gen 3 out-of-band (OOB) serial console.
Read the details about how Vapor IO deployed edge data centers in one hour while also providing 24/7 remote access.

6. Network planning

If you didn’t cover this in your infrastructure documentation, you’ll need specific documentation for your data center networking requirements – both WAN (wide area networking) and LAN (local area networking). This is a good time to determine whether you want to exactly replicate your existing network environment or make any network infrastructure upgrades. Then, create a detailed implementation plan covering everything from VLANs to IP address provisioning, DNS migrations, and ordering MPLS circuits.

7. Migration & build planning

Next, plan out each step of the move or build itself – the actions your team will perform immediately before, during, and after the migration. It’s important to include disaster recovery options in case critical services break, or unforeseen changes cause delays. Implementing checkpoints at key stages of the move will help ensure any issues are fixed before they impact subsequent migration steps.

8. Assembling a team

At this stage, you likely have a team responsible for planning the data center migration, but you also need to identify who’s responsible for every aspect of the move itself. It’s critical to do this as early as possible so you have time to set expectations, communicate the plan, and handle any required pre-migration training or support. Additionally, ensure this team includes dedicated support staff who can triage end-user requests if any issues arise during or after the migration.

9. Vendor support

Any experienced sysadmin will tell you that anything that could go wrong with a data center migration probably will, so you should plan for the worst but hope for the best. That means collecting a list of vendor contacts for each hardware and software component you’re migrating so it will be easier to contact support if something goes awry. For especially critical systems, you may even want to alert your vendor POCs prior to the move so they can be on hand (or near their phones) on the day of the move.

Additional Data Center Migration & Management Resources

10. Lab simulation

This step may not be feasible for every organization, but ideally, you’ll use a lab environment to simulate key stages of the data center migration before you actually move. Running a virtualized simulation can help you identify potential hiccups with connection settings or compatibility issues. It can also highlight gaps in your planning – like forgetting to restore user access and security rules after building new firewalls – so you can address them before they affect production services.

11. Post-migration testing

Finally, you need to create a post-migration testing plan that’s ready to implement as soon as the move is complete. Testing will validate the integrity, performance, and reliability of infrastructure in the new environment, allowing teams to proactively resolve issues instead of waiting for monitoring notifications or end-user complaints.

Streamlining your data center migration

Using this data center migration checklist to create a comprehensive plan will help reduce setbacks on the day of the move. To further streamline the migration process and set yourself up for success in your new environment, consider upgrading to a vendor-neutral data center orchestration platform. Such a platform will provide a unified tool for administrators and engineers to monitor, deploy, and manage modern, multi-vendor, and legacy data center infrastructure. Reducing the number of individual solutions you need to access and manage during migration will decrease complexity and speed up the move, so you can start reaping the benefits of your new environment sooner.

Want to learn more about Data Center migration?

For a complete data center migration checklist, including in-depth guidance and best practices for moving day, click here to download our Complete Guide to Data Center Migrations or contact ZPE Systems today to learn more.
Contact Us Download Now

What is an Application Delivery Platform?

by Jordan Baker | Jul 26, 2023 | Application Hosting, Edge Computing, EdgeOps, Improve Network Security, Network Automation, Remote Network Management, SD-WAN, Simplify Branch Infrastructure, Virtualization, Zero Touch Provisioning (ZTP)

An illustration showing a breakout of various software application components to highlight the need for an application delivery platform

Modern software architectures are highly complex and often very difficult to maintain and operate. A single enterprise application comprises hundreds (or even thousands) of individual services, technologies, and toolchains while requiring a lot of underlying infrastructure, such as servers, routing and load balancing rules, and security controls. All of this complexity increases overhead costs and adds to the ever-growing workloads of software, network, and infrastructure teams, especially when you multiply this effort across dozens or hundreds of software deployments.

Platform engineering is a new discipline introduced by Gartner to address these challenges by reducing the complexity of software engineering, network operations, and application delivery. The platforms built by these engineers are known by several names, including internal developer platforms, internal developer portals, and application delivery platforms. This guide defines an application delivery platform, discusses the underlying technology, and highlights a leading platform engineering solution.
.

Table of Contents:

What is an application delivery platform?
What is the importance of an application delivery platform?
What technology makes up an application delivery platform?
Introducing ZPE Systems’ Services Delivery Platform

What is an application delivery platform?

An application delivery platform is a suite of technologies that handles all the services that support an application, including security, traffic management, load balancing, and data management. Platform engineers combine all these services into a common toolset used to deploy applications at customer sites, so there’s no need to build a new architecture every time. This streamlined experience makes application delivery cost-effective by significantly reducing workloads and deployment timelines.

What is the importance of an application delivery platform?

The goal of an application delivery platform is to reduce deployment and management complexity. Deployment complexity leads to a greater risk of human error when configuring things like security controls and access policies, and any mistakes are likely to be found and exploited by cybercriminals. Management complexity makes it harder to stay on top of patch schedules. Unpatched software often contains vulnerabilities that are exploited by cybercriminals; for example, known ransomware groups targeted unpatched IBM software earlier this year.

By reducing complexity, an application delivery platform also reduces the attack surface, improving an organization’s overall security posture.

What technology makes up an application delivery platform?

By its very nature, an application delivery platform is highly customized to fit the needs of the applications being supported. Here are some examples of the services and technologies that are often included.

Server storage & compute: The platform needs storage (usually solid-state) and processing units (CPUs or GPUs) to run the applications and store necessary data. Ideally, the OS and computing architecture will support containers (e.g., Docker) for microservices applications.
Automation tools: A key feature of application delivery platforms is the ability to automatically provision and deploy new environments, apps, and network services as well activate services licenses and service chaining. That means the platform should host automation tools for configuration management, code delivery, and software-defined networking (SDN).
Security: The ideal platform makes it possible to deliver applications without configuring security every time. That means it provides unified management and repeatable deployments for security services like firewall traffic inspection, access control lists, and advanced authentication.
Routing & load balancing: A lot of backend networking goes into the typical application deployment to ensure traffic is routed correctly and optimized for performance. An application delivery platform should support network functions virtualization (NFVs) and SDN so standard network configurations can be easily deployed alongside the applications being delivered.

Management tools: Engineers need a way to remotely access, manage, and troubleshoot application deployments, even (and especially) during major service disruptions. The ideal platform includes out-of-band serial console management and supports third-party troubleshooting tools so remote teams can quickly recover systems and applications without an expensive on-site visit.

While this list is far from exhaustive, it covers the foundational technology that supports an application delivery platform. Platform engineering is still in its infancy, and many organizations struggle to efficiently execute it because of how many moving pieces need to be considered. The goal is to find a solution that provides the best framework of hardware and software capabilities that platform engineers can build upon, so they can create a fully customized application delivery platform without reinventing the wheel.

Introducing ZPE Systems’ Services Delivery Platform

The Services Delivery Platform from ZPE Systems is the perfect foundation for any platform engineering initiative. Nodegrid edge routers serve as the hardware backbone, providing networking and failover capabilities, OOB serial console management, and plenty of memory, storage, and CPU headroom for additional apps and services. You can build a fully customized hardware platform with the modular Net Services Router (NSR), extending your storage or compute capabilities or adding more ports to support your application deployment.

The vendor-neutral, Linux-based Nodegrid OS can run your custom applications as well as third-party automation, security, DevOps, and management tools. Plus, Nodegrid unifies all connected services and applications under a single management umbrella, allowing teams to oversee and orchestrate all of their deployments from one convenient portal.

Read about how ZPE’s Services Delivery Platform streamlined operations for DigiCert in our case study: Simplifying Critical Infrastructure for the Leading Digital Security Enterprise

Ready to Learn More?

The Services Delivery Platform from ZPE Systems simplifies platform engineering with powerful, multipurpose hardware and an open, vendor-neutral OS. Contact us today to learn more about using Nodegrid for your application delivery platform!

The Biggest Ransomware Attack You Haven’t Heard of…Yet

by Jordan Baker | Jul 6, 2023 | Data Logging, Improve Network Security, Micro-segmentation, Minimize Impact of Disruptions, Network Automation, Out of Band Management, SD-WAN, SecOps, Secure Access Service Edge (SASE), Security Service Edge (SSE), User Management, Zero Trust Security

This article was written by James Cabe, CISSP, whose cybersecurity expertise has helped major companies including Microsoft and Fortinet.

MOVEit over SolarWinds — The largest and most successful ransomware attack ever recorded is happening. Right now. It’s attacking healthcare and financial institutions with high rates of success, and recently stole sensitive data of 4 million more healthcare patients. It uses something called CL0P ransomware, and the threat actor is a well-known criminal group with the name FIN11. Many organizations are finding it difficult to stop the attack because they have no way to access infected devices, take them offline, patch, or even replace them. So, what exactly is going on?

The group responsible for the attack

FIN11 is a cybercriminal group that has been active since 2016 or before, originating from the Commonwealth of Independent States (CIS). While the group has historically been associated with widespread phishing campaigns, their focus has shifted towards other initial access vectors. FIN11 often runs high-volume operations targeting industries in North America and Europe for data theft and ransomware deployment, primarily leveraging CL0P (aka CLOP).

FIN11 is responsible for multiple widespread, high-profile intrusion campaigns leveraging zero-day vulnerabilities, and the group likely has access to the networks of many more organizations than it is able to successfully monetize. Despite this, they’re currently attacking MOVEit, a well-known SaaS provider who relies on a file transfer appliance called Accellion lFile Transfer Appliance (FTA). This legacy product remains unpatched, which has led to the breach of many Fortune 100 companies and state and federal agencies.

How did the ransomware attack start?

The ransomware attack began with several Accellion FTA customers, including those in industries like healthcare, legal, finance, retail, and telecom. Companies such as Jones Day Law, Kroger, Singtel, and many others had no idea that they had been attacked, because the initial breach was quiet and headless.

Their only indication came after receiving a threatening email aimed at extortion.

In this email, the group threatened to publish stolen data on the “CL0P^_- LEAKS” .onion website, according to an investigation from Accellion. The Federal Bureau of Investigation (FBI) and the Cybersecurity and Infrastructure Security Agency (CISA) are releasing this joint CSA to disseminate known CL0P ransomware IOCs and TTPs identified through FBI investigations as recently as June 2023.

According to the investigation, four zero-day security holes were exploited in the attacks:

CVE-2021-27101 – SQL injection via a crafted Host header
CVE-2021-27102 – OS command execution via a local web service call
CVE-2021-27103 – SSRF via a crafted POST request
CVE-2021-27104 – OS command execution via a crafted POST request

And, the published victim data appears to have been stolen using a “WEB SHELL”. These web shells give remote administrative access to the web server and create a jumping off point to attack the rest of the internal network. Mandiant, a well-known cyber investigation arm of Google, added, “The exfiltration activity has affected entities in a wide range of sectors and countries” (Threatpost). Exfiltration is the unauthorized removal of important or damaging data from an organization.

However the biggest problem is that these web shells are what researchers call “PERSISTENCE”. This means that an attacker can remain in your network indefinitely to continue damaging and attacking your resources. Researchers call these “APTs,” or Advanced Persistent Threats.

Why is the ransomware attack still going strong?

The ransomware attack is still going strong because there’s no patch available. According to open source information, beginning on May 27, 2023, CL0P Ransomware Gang began exploiting a previously unknown SQL injection vulnerability (CVE-2023-34362) in Accelion’s appliance that is the backbone of a solution known as Progress Software’s MOVEit Transfer service. Internet-facing MOVEit Transfer web applications were infected with a web shell named LEMURLOOT, which was then used to steal data from underlying MOVEit Transfer databases. In similar spates of activity, TA505, which is the group responsible for the Dridex trojan and Locky ransomware, conducted zero-day-exploit-driven campaigns against Accellion FTA devices in 2020 and 2021, and Fortra/Linoma GoAnywhere MFT servers in early 2023.

What most organizations want to know is: How do you quickly respond to issues like these? How can you be properly prepared to respond to an issue you didn’t cause or didn’t expect?

Patching is a good response. However, it takes an average of 205 days to patch a recently known zero-day exploit like the MOVEit vulnerability. While patching alone is typically the ideal response, it isn’t automatic nor can it be done quickly.

Another approach involves removing the offending software or appliance, or cutting off access to the software or appliance. But once you remove this access, how do you continue normal operations, and how can you easily bring the software/appliance back online? Without adequate infrastructure in place, physically deploying to each site is not practical, especially for distributed organizations.

CISA and the FBI encourage organizations to implement the recommendations in the Mitigations section of this CSA to reduce the likelihood and impact of CL0P ransomware and other ransomware incidents. The Mitigations section describes many approaches, including patching, removing software/appliance access, and implementing a recovery plan. But all of these take too much time and too many resources, which leaves organizations vulnerable as they scramble to create an adequate response.

The great news is, organizations can cover all their bases without having to reinvent the wheel. This approach is recommended in one of CISA’s recent directives, and gives organizations somewhat of a silver bullet that allows them to quickly defeat ransomware and remain prepared for any future attack.

What approach does CISA recommend to address ransomware attacks?

CISA’s recent directive (23-02), which addresses the vulnerability of Internet-exposed management interfaces, calls for organizations to create an isolated management infrastructure (IMI) via out-of-band connectivity. This is a drop-in solution that the military, telcos, and hyperscalers/cloud companies use to respond to widespread ransomware and other issues impacting security and resilience. This approach — which ZPE Systems has perfected in the last decade with the help of Big Tech — gives organizations a completely separate control plane through which they can monitor and manage their entire IT infrastructure in a safe and dedicated fashion.

What is isolated management infrastructure?

Isolated management infrastructure consists of the hardware and software that create a management network that’s fully separate from other production and management networks. The key to this is in out-of-band connectivity, which is defined as connectivity other than TCP/IP. Out-of-band can include direct USB, serial, or even non-routed zero-trust connections to crown-jewel assets.

Essentially, the IMI gives an organization complete oversight and control of their widespread IT infrastructure, in a way that is secure and accessible only to their IT teams.

In this diagram, the production infrastructure (blue ring) sits at each distributed location. The out-of-band infrastructure for LAN (OOBI-LAN) is the green ring and surrounds the production infrastructure with one layer of isolated management. The OOBI-WAN (orange ring) is what provides a second layer of isolated management, which teams can access from a central or remote location, to gain access to the OOBI-LAN and ultimately the production infrastructure.

Knowing these assets and providing access across the organization can be easy and does not have to disrupt current operations.

How can IMI stop the FIN11 ransomware attack?

In the ongoing FIN11 ransomware attack, Internet-facing applications are targets of the zero-day exploit. This means that no amount of security solutions can pre-mitigate the attack (i.e., there’s nothing you can do to stop it). This is where IMI shines.

Remember the OOBI-LAN/OOBI-WAN diagram? Here’s a zoomed-in view of the isolated management infrastructure sitting beside the production infrastructure. The IMI connects via serial, Ethernet, and USB to production gear, and provides the necessary functions (routing, storing golden images, hosting jumpbox tools, etc.) to recover from attack. But how?

IT teams can use OOBI-WAN to remotely access their OOBI-LAN and production gear. They can pull affected devices offline and bring them in for forensics, which takes place in an Isolated Recovery Environment (IRE). This means these assets and networks are still reachable by analysts and responders, but isolated from other vulnerable assets. This allows an organization to quickly and even automatically deploy tools and resources inside of this environment through devices like ZPE Systems’ Nodegrid.

To combat the FIN11 attack, organizations don’t need to unplug cables or shut their devices off. They can instead deploy their IMI as the framework for closing the attack surface while maintaining access and critical data to aid in recovery.

Get the blueprint for isolated management infrastructure

Don’t wait until the next attack to shore up your defenses. ZPE Systems has worked with Big Tech for ten years developing the isolated management infrastructure. It’s now available inside the Network Automation Blueprint, and walks you through how to implement your own IMI. Download the blueprint now to stay ready for any attack.

Download blueprint

Get in touch with me!

« Older Entries

Next Entries »

ZPE Solution Pathways

Discover Nodegrid

Edge Management and Orchestration

Table of Contents

Key findings from the Gartner report

Edge computing challenges

Edge management and orchestration (EMO) with Nodegrid

Want to learn more about how Nodegrid can help you overcome your biggest edge computing challenges?

Dissecting the MGM Cyberattack: Lions, Tigers, & Bears, Oh My!

Who Started The MGM Cyberattack, and How?

How Did A Simple Authentication Attack Morph Into a Complex Attack?

How Did MGM Discover the Cyberattack?

How Did the Attacker Maintain Control?

Is There a Way to Stop These Types of Attacks?

Get the Blueprint for Isolated Management Infrastructure

Get in touch with me!

What is a radio access network (RAN)?

What is a Radio Access Network (RAN)?

What is 5G RAN?

5G Radio Access Network (RAN) challenges

A new approach to 5G deployments

Smart cities

Global asset tracking and logistics

Building automation

Simplifying 5G with Nodegrid

Learn how Nodegrid can help deliver simplified 5G with out-of-band management!

Data Center Migration Checklist

Data center migration checklist

Quick Data Center Migration Checklist

1. Site surveys

2. Infrastructure documentation

3. Dependencies and ancillary services

4. Layout and topology

5. Management access

6. Network planning

7. Migration & build planning

8. Assembling a team

9. Vendor support

10. Lab simulation

11. Post-migration testing

Streamlining your data center migration

Want to learn more about Data Center migration?

What is an Application Delivery Platform?

What is an application delivery platform?

What is the importance of an application delivery platform?

What technology makes up an application delivery platform?

Introducing ZPE Systems’ Services Delivery Platform

Ready to Learn More?

The Biggest Ransomware Attack You Haven’t Heard of…Yet

The group responsible for the attack

How did the ransomware attack start?

Why is the ransomware attack still going strong?

What approach does CISA recommend to address ransomware attacks?

What is isolated management infrastructure?

How can IMI stop the FIN11 ransomware attack?

Get the blueprint for isolated management infrastructure

Get in touch with me!