Technical / Why were Facebook, Instagram and WhatsApp down for over 6 hours last night?

Facebook has said that the configuration changes on the backbone routers that coordinate network traffic between its data centres caused issues that led to more than six-hour outage of its services including WhatsApp and Instagram. Facebook added that the cause of the outage also impacted its internal tools and systems, complicating its attempts to quickly diagnose and resolve the problem.

Vikrant Shekhawat : Oct 05, 2021, 11:25 AM
Technical Desk: Facebook, its messaging platform WhatsApp and its photo-sharing app Instagram were all hit by a massive outage impacting millions of users worldwide before being restored after more than six hours. Facebook said late on Monday that it has been working to restore access to its services and is “happy to report they are coming back online now." The company apologised and thanked its users for bearing with it but did not say what might have caused the outage, which began around 8:45pm IST and was one of the longest failures in recent memory. Downdetector, which monitors internet issues, said the Facebook outage was the largest it had seen, with more than 10.6 million reports worldwide.

Facebook, its messaging platform WhatsApp and its photo-sharing app Instagram were all hit by a massive outage impacting millions of users worldwide before being restored after more than six hours. Facebook said late on Monday that it has been working to restore access to its services and is “happy to report they are coming back online now." The company apologised and thanked its users for bearing with it but did not say what might have caused the outage, which began around 8:45pm IST and was one of the longest failures in recent memory. Downdetector, which monitors internet issues, said the Facebook outage was the largest it had seen, with more than 10.6 million reports worldwide.

Shares of Facebook fell 4.9 per cent on Monday, their biggest daily drop since last November, and according to ad measurement firm Standard Media Index, Facebook was losing about $545,000 in US ad revenue per hour during the outage. Some of Facebook’s internal applications, including the company’s own email system, were also hit. Bloomberg reported that Twitter and Reddit users also said that employees at the company’s Menlo Park, California, campus were unable to access offices and conference rooms that required a security badge.

Facebook acknowledged that “some people are having trouble accessing (the) Facebook app” and said it was working on restoring access but did not elaborate either on the reason behind the outage or the number of users affected. Instagram head Adam Mosseri tweeted that it feels like a “snow day” and Mike Schroepfer, Facebook’s outgoing chief technology officer, blamed “networking issues”.

Reuters cited several Facebook employees, who declined to be named, as saying that they believed that the outage was caused by an internal routing mistake to an internet domain. The failures of internal communication tools and other resources that depend on that same domain in order to work added to the issue, they said. According to several security experts, the Facebook, WhatsApp and Instagram disruption could be the result of an internal mistake and added that sabotage by an insider would be theoretically possible. "Facebook basically locked its keys in its car," tweeted Jonathan Zittrain, director of Harvard's Berkman Klein Center for Internet & Society.

The message on Facebook's webpage suggested an error in the Domain Name System (DNS), which allows web addresses to take users to their destinations or converts domain names like “facebook.com” to the actual internet protocol addresses of the corresponding website. According to Wired, an error in DNS records can make it impossible to connect to a website.

Alex Stamos, a former chief security officer at Facebook, told Wired that the cause of the issue is “probably a bad configuration or code push to the network management system,” “This isn’t supposed to happen,” Stamos added. “Facebook's outage appears to be caused by DNS; however that's a just symptom of the problem,” Troy Mursch, chief research officer of cyberthreat intelligence company Bad Packets, told Wired. The fundamental issue, Mursch says—and other experts agree—is that Facebook has withdrawn the so-called Border Gateway Protocol route that contains the IP addresses of its DNS nameservers.

Several internet infrastructure experts told Wired that the likeliest answer was a misconfiguration on Facebook’s part. “It appears that Facebook has done something to their routers, the ones that connect the Facebook network to the rest of the internet,” John Graham-Cumming, CTO of internet infrastructure company Cloudflare, said.