Yesterday during the day he registered a major disservice for all applications and services of the Facebook group. The company says the problems were caused by: “Configuration changes to routers that coordinate traffic between data centers”† These changes have had different consequences for the communication between the different servers of the American group. “We want to assure you that we believe that the cause of this error is a bad configuration change. We have no indication that user data was compromised during the outage”Facebook also reassures.
However, the indications remain vague and you should turn to Cloudflare to better understand what happened.
Cloudflare is an Internet content distribution network that is mainly responsible for acting as an intermediary between the Internet user and the server, in particular to protect sites against DDoS attacks. Using its technical expertise, Cloudflare has posted a detailed explanation of the outage that Facebook experienced.
However, before going any further, some basic concepts should be considered.
DNS provides the IP address of the website, its location, while BGP provides the path to reach that destination. DNS, for Domain Name System, is a service at the root of the Internet that allows you to transform a URL like: “Facebook.com” or “mobile-magazine.it “ in an IP address – such as 18.104.22.168 – to tell your browser which address. For its part, BGP – for Border Gateway Protocol – will tell your browser the way to reach the destination address.
Facebook was online
Specifically, it seems that it is a BGP configuration change that is at the origin of the Facebook services down. “BGP allows a network – such as Facebook – to promote its presence to other networks that are part of the Internet. Access providers and other networks can no longer find the Facebook network and is therefore not available”said Cloudflare. Facebook basically stayed online, but the internet operators no longer knew how to access it.
For Facebook employees, this change to the BGP configuration has had other consequences that explain the slow pace of the fix. First, the employees were unable to communicate via email and were unable to reconfigure the servers remotely.
Everything returned to normal after the managers were able to enter the data centers where they could reconfigure the BGP configuration and make the various services visible again to the entire internet network.