Managed service providers rely on remote access to keep customer environments running. VPNs, jump hosts, and centralized access tools make it possible to manage infrastructure across dozens or hundreds of sites without leaving the operations center.
But during outages, these tools can become part of the problem. When remote access depends on the production network, even routine failures can cut off the access engineers need to fix issues. What should be a quick recovery turns into a prolonged outage that requires on-site intervention.
Here are some of the most common failure scenarios MSPs face, and a look at the architecture that helps overcome them.
Routing Failures
Many routing failures stem from human error. According to 2025 research from the Uptime Institute, almost 40% of organizations suffered a major outage due to human error in the last three years. If a core router experiences a misconfiguration, control-plane crash, or routing instability, the network paths that connect engineers to the environment may disappear entirely.
Common examples include:
- BGP route leaks or policy errors that remove upstream connectivity
- OSPF adjacency failures that break internal routing between segments
- VRF or VLAN misconfigurations that isolate management subnets
- Routing table corruption during firmware upgrades
In these situations, VPN sessions drop immediately because the path between the engineer and the VPN gateway no longer exists. Worse, the router responsible for the failure may be fully operational from a hardware perspective and all it needs is a configuration correction. But engineers can’t gain remote console access to make this correction.
What should have been a 30-second configuration rollback becomes a multi-hour recovery effort.
Firewall Policy Errors
Firewall misconfigurations are one of the most common causes of remote access loss. Modern firewalls enforce highly automated policies through orchestration systems, policy templates, or automated compliance updates. These systems are great for consistency, but they introduce new failure modes.
A few examples include:
- A security policy update accidentally blocking VPN management traffic
- A zone-based firewall rule preventing internal device access
- A NAT configuration error breaking inbound VPN connections
- An automated policy sync overwriting existing allow rules
A lot of times, the firewall itself remains online and functional. The only issue is a misconfigured rule. Because the firewall sits directly in the remote access path, it becomes unreachable (just like the router we mentioned in the previous example). Engineers may be able to confirm the outage through monitoring systems, but without access to the firewall CLI or console, there is no way to correct the configuration remotely.
WAN or ISP Outages
Many MSP environments rely on customer WAN circuits to provide remote management access. Failures on these circuits cut remote connectivity regardless of the health of the internal infrastructure. Fiber cuts, for example, are one of the most common causes of outages that last 48 hours or longer.
Common scenarios include:
- Carrier fiber cuts (looking at you, backhoe operators 😜)
- Last-mile circuit failures at branch locations
- ISP routing incidents causing upstream blackholing
- DDoS mitigation events that disrupt inbound traffic