Image: MSP remote access relies on the very infrastructure it manages.
This architecture has some benefits. It centralizes access control for the specific customer environment, somewhat simplifies credential management, and allows security teams to enforce authentication policies before engineers reach sensitive systems.
But remote access relies on the assumption that all of this production infrastructure remains operational.
What happens when it fails?
When In-Band Management Breaks: Common Failure Scenarios
VPNs and jump hosts operate entirely in-band, meaning they rely on the same network infrastructure they are meant to manage.
We covered this dependency at length in our last MSP article. Essentially, in-band management is cut off during failures, turning small issues into big outages that eat into MSP margins. And there’s a whole range of failures that can occur. Here are just a few of the common scenarios that lead to long outages and truck rolls:
Routing failures can entirely remove the path between engineers and the environment. A BGP misconfiguration, OSPF failure, or even a bad firmware update can drop VPN sessions instantly. The device causing the issue may still be running, but without access, engineers can’t fix it.
Firewall policy errors often block management traffic. A single misapplied rule or automated update can cut off access to internal systems. The firewall is online but unreachable, making a simple rule change impossible without on-site help.
WAN or ISP outages eliminate remote connectivity altogether. Even if the internal network is still functioning, engineers outside the environment have no way in. What should be a quick fix becomes a truck roll.
Authentication failures can lock engineers out of jump hosts, even when systems are otherwise healthy. If identity services like Active Directory or LDAP are unavailable, login attempts fail and troubleshooting stops.
Core service failures, such as DNS or certificate validation issues, can also break access indirectly. Devices may still be reachable, but the tools used to connect to them stop working.
We’ll break these scenarios down further in a separate article, but the pattern is clear: Even when infrastructure is still running, engineers lose the ability to reach it when it matters most.
Why the Problem Gets Worse as MSPs Scale
Let’s set aside the fragility of this in-band remote access model and talk strictly about scale. When you’re managing dozens of customer environments, each introduces more VPN gateways, firewalls/policies, routing domains, identity integrations, etc.
That simple remote access model turns into a highly distributed patchwork of VPN tunnels, jump hosts, bastion servers, and authentication systems spanning multiple networks. It doesn’t take a large leap of the imagination to see why this doesn’t scale.
Access is Fragmented
Engineers rarely connect to a single management environment (unless of course they’re using ZPE Cloud). Instead, they maintain separate access paths for each customer, which looks like this:
- Different VPN clients or portals
- Separate credential sets
- Unique bastion hosts
- Different network segmentation models