Prince, who has a history of being open about his company's mistakes, said he had confirmed "that the issue was caused by a mistaken configuration we were applying to a router during a routine update".
He said there was no attack. "It was not a failure of the router software." In a detailed blog post, Cloudflare's John Graham-Cumming said: "A configuration error in our backbone network caused an outage for Internet properties and Cloudflare services that lasted 27 minutes.
The bogus tweet.
"We saw traffic drop by about 50% across our network. Because of the architecture of our backbone, this outage didn’t affect the entire Cloudflare network and was localised to certain geographies.
Mistakes happen. The root problem was we didn’t have systems in place to keep them from causing a widespread issue. That’s a problem of leadership that I am more responsible for than the engineer who made the typo.— Matthew Prince ? (@eastdakota) July 18, 2020
"This configuration contained an error that caused all traffic across our backbone to be sent to Atlanta. This quickly overwhelmed the Atlanta router and caused Cloudflare network locations connected to the backbone to fail."
He said the locations affected were San Jose, Dallas, Seattle, Los Angeles, Chicago, Washington, DC, Richmond, Newark, Atlanta, London, Amsterdam, Frankfurt, Paris, Stockholm, Moscow, St. Petersburg, São Paulo, Curitiba, and Porto Alegre. Other locations were not affected.
Clearly Iran hacked a cloudflare engineer's brain with satellite based mind control in order to DDoS the internet via a typo— MalwareTech (@MalwareTechBlog) July 18, 2020
Graham-Cummings provided the following timeline for the incident:
20:25: Loss of backbone link between EWR and ORD (6.25am Saturday AEDT)
20:25: Backbone between ATL and IAD is congesting
21:12 to 21:39: ATL attracted traffic from across the backbone
21:39 to 21:47: ATL dropped from the backbone, service restored
21:47 to 22:10: Core congestion caused some logs to drop, edge continues operating
22:10: Full recovery, including logs and metrics
Just realized I missed Twitter doing round 2 of "let's attribute a random outage to a non-existent DDoS attack citing some meme pewpew map as evidence"— MalwareTech (@MalwareTechBlog) July 18, 2020
To a question on Twitter from one Adriano Maia as to whether he would react the same way if there was a major breach, Prince said the company's method of operation had been the same for a long time and pointed to a 2012 blog to illustrate his point.
Cloudflare outage was a config error. Mistakes happen. Outage was about 20 mins.— Kevin Beaumont (@GossiTheDog) July 17, 2020
Cloudflare usually very open about mistakes and learnings. https://t.co/rW5Hm8NOSX