Hyperscale Use Case 2

Large Scale Fiber Restoration

As Data Centers become more and more integral to every aspect of our lives, ensuring that they remain operating and accessible 24/7 becomes increasingly critical. High availability is frequently delivered through the use of active-active configurations, redundancy and diverse connectivity between locations.

In an ideal world each Data Center would be connected to multiple fiber routes, with all fibers on all routes patched through to the servers, storage and routers within the DC providing multiple levels of redundancy and restoration capability.  In practice this is often too expensive and systems are designed to withstand a single point of failure, with the capability of restoring protection to the services as soon as possible.

 

Where this becomes particularly challenging is at the physical infrastructure level.  With some DCs having in excess up to 10,000 fibers or more per route, a service restoration from one route to another can take days, service prioritization is challenging and requires accurate and up-to-date records or which services are running over which fiber and their relative priority.  The ability to accurately re-patch thousands of fibers in a timely manner whilst accurately catching and recording changes for the inevitable roll-back once the initial fault is repaired is highly desired if it can be done cost-effectively.

Controlled by a higher level software system such as an SDN Controller, a passive optical fabric can be created where all the optical circuits can be switched automatically and accurately by the SDN Controller.  Links can be prioritized in order of restoration, records are always up-to-date and accurately reflect the exact configuration of the physical infrastructure, even half-way through a failover.  Unlike the manual switch that can take days and introduce multiple patching errors an automatic robotic controlled switch running multiple robotic engines in parallel can complete the entire fiber failover quickly and efficiently in less than 2 hours.  It also allows a fully managed roll-back including moving different services at pre-determined windows, maintaining the accuracy of the configuration management, and ensuring the system is restored back exactly as it was before the restoration failover.