Many of these routes instead transited through DQE Communications, a small company in Pennsylvania.
The teams at @verizon and @noction should be incredibly embarrassed at their failings this morning which impacted @Cloudflare and other large chunks of the Internet. It’s absurd BGP is so fragile. It’s more absurd Verizon would blindly accept routes without basic filters.— Matthew Prince ? (@eastdakota) June 24, 2019
Cloudflare's Tom Strickx said in a blog post that the problem had been magnified by the involvement of a so-called BGP optimiser product from a company known as Noction. The problems began at about 10:30 UTC (8.30pm Monday night AEST) and were sorted two hours later.
"This was the equivalent of Waze routing an entire freeway down a neighbourhood street — resulting in many websites on Cloudflare, and many other providers, to be unavailable from large parts of the Internet," Strickx said. "This should never have happened because Verizon should never have forwarded those routes to the rest of the Internet."
"For example, our own IPv4 route 22.214.171.124/20 was turned into 126.96.36.199/21 and 188.8.131.52/21. It’s as if the road sign directing traffic to 'Pennsylvania' was replaced by two road signs, one for 'Pittsburgh, PA' and one for 'Philadelphia, PA'.
"By splitting these major IP blocks into smaller parts, a network has a mechanism to steer traffic within their network, but that split should never have been announced to the world at large. When it was, it caused today’s outage."
DQE was using a BGP optimiser and announced such specific routes to its customer Allegheny Technologies; this information was sent to its other transit provider, which happened to be Verizon, Strickx said.
Verizon advertised these "better" routes to the entire Internet; they were "better" because they were more granular and more specific.
"The leak should have stopped at Verizon. However, against numerous best practices outlined below, Verizon’s lack of filtering turned this into a major incident that affected many Internet services such as Amazon, Fastly, Linode and Cloudflare," Strickx said.
about that .. pic.twitter.com/G6RbVk7AMS— nuuls (@nuulss) June 24, 2019
More than eight hours after the incident, Strickx said Cloudflare had not heard back from Verizon despite trying to make contact both through phone calls and email.
Launtel chief executive Damian Ivereigh told iTWire that Australian sites were not affected a great deal by the incident.
"From what I can tell we were not affected very much, I was not personally online at the time and I haven't seen any posts in our Facebook Users Group complaining of slow speeds etc," the head of the small Launceston-based ISP said in response to an inquiry.
He said: "Given the anatomy of the leak - namely a route in the US being badly optimised by a small provider in the US - it would make sense that we would not be affected much.
"Cloudflare uses different addresses for different areas of the world. The optimiser would have only been manipulating routes for destinations close to it (because it only wants to optimise routes to high-traffic areas - people in the US don't send much traffic to Cloudflare in Oz).
"Thus it would have only been the US cloudflare servers whose routes would have been affected. Given that Australian users don't often talk to US cloudflare servers (they talk to local ones), it would not have affected us much."
iTWire has contacted Verizon for comment.
Disclosure: iTWire uses Cloudflare's services.