The company said the global outage that took Facebook and its other platforms offline for hours was caused by an error during routine maintenance.
Facebook’s Vice President of Infrastructure Santosh Janardhan said blog post That Facebook, Instagram and WhatsApp went dark “was not due to malicious activity, but due to an error of our own making.”
The problem occurred while engineers were doing day-to-day work on Facebook’s global Backbone network; Computers, routers and software in its data centers around the world, as well as the fiber-optic cables connecting them.
“During one of these routine maintenance jobs, an order was issued with the intention of assessing the availability of global backbone capacity,” Janardhan said on Tuesday, which inadvertently removed all connections in our Backbone network, causing Facebook data centers were effectively disconnected.”
Janardhan said Facebook’s systems are designed to catch such mistakes, but in this case a bug in the audit tool prevented it from properly handling commands.
That change also led to a second problem that made it impossible to access Facebook’s servers, even though they were operational.
Janardhan said engineers scrambled to fix the problem on site, but it took time because of the extra layers of protection. Data centers are “hard to get into, and once you’re in, the hardware and routers are designed in such a way that they are difficult to modify even if you have physical access.”
Once connectivity was restored, services were gradually brought back to avoid traffic surge that could lead to more accidents.