Wednesday, 05 June 2019 08:28

Google says wrongly applied configuration change caused outage Featured

Google says wrongly applied configuration change caused outage Image by 200 Degrees from Pixabay

The outage that Google experienced on Monday AEST was caused by a configuration change that was pushed out to more servers than intended, the company says in a blog post.

Engineering vice-president Benjamin Sloss wrote on Tuesday US time that the configuration change was meant to be pushed out to a small number of servers in a single region, but was incorrectly sent to a larger number across several neighbouring regions.

This caused those regions to to stop using more than half of their available network capacity.

Sunday's incident affected multiple services in Google Cloud, G Suite and YouTube.

A little more than 10 years ago, Google had a major hiccup due to a configuration snafu, when it reported every search result as being from a site that was infested with malware.

As iTWire's David Williams explained at the time, "It seems one poor Google tech — who I suspect now has to bring in donuts for the rest of the office — accidentally added a single "/" to the list of bad sites, on a line all by its lonesome self.

"When Google's search results check for matches the solitary slash actually rings true for everything – every URL with a slash in it registered as malware according to this search term. The solution wasn't to reboot anything, it was to take out that little one liner from the bad sites register."

Of Sunday's screw-up, Sloss wrote: "The network traffic to/from those regions then tried to fit into the remaining network capacity, but it did not.

"The network became congested, and our networking systems correctly triaged the traffic overload and dropped larger, less latency-sensitive traffic in order to preserve smaller latency-sensitive traffic flows, much as urgent packages may be couriered by bicycle through even the worst traffic jam."

He said the issue was detected "within seconds" but took much longer to fix than the target of a few minutes.

"Once alerted, engineering teams quickly identified the cause of the network congestion, but the same network congestion which was creating service degradation also slowed the engineering teams’ ability to restore the correct configurations, prolonging the outage," Sloss said.

"The Google teams were keenly aware that every minute which passed represented another minute of user impact, and brought on additional help to parallelise restoration efforts."

As to the impact, Sloss said YouTube had seen a 2.5% drop in views for an hour, while Google Cloud Storage showed a 30% reduction in traffic. About 1% of Gmail users had experienced issues.

"With all services restored to normal operation, Google’s engineering teams are now conducting a thorough post-mortem to ensure we understand all the contributing factors to both the network capacity loss and the slow restoration," he said.

"We will then have a focused engineering sprint to ensure we have not only fixed the direct cause of the problem, but also guarded against the entire class of issues illustrated by this event."

But his statement about the end of the issue may be a little premature. The Google Cloud status page has the following legend at the end: "We're investigating an issue with Google Compute Engine Persistent Disk in us-east4-b and us-east4-c. Affected customers may observe IO errors on Persistent Disks attached to instances and/or may fail to create PD snapshots in us-east4-b and us-east4-c.

"The issue should be resolved for majority of users and we expect a full resolution in the near future. We're waiting on our final changes to propagate. We will provide another status update by Tuesday, 2019-06-04 17:10 US/Pacific with current details." That update should be out at 10.10am AEST if it is on time.


As part of our Lead Machine Methodology we will help you get more leads, more customers and more business. Let us help you develop your digital marketing campaign

Digital Marketing is ideal in these tough times and it can replace face to face marketing with person to person marketing via the phone conference calls and webinars

Significant opportunity pipelines can be developed and continually topped up with the help of Digital Marketing so that deals can be made and deals can be closed

- Newsletter adverts in dynamic GIF slideshow formats

- News site adverts from small to large sizes also as dynamic GIF slideshow formats

- Guest Editorial - get your message out there and put your CEO in the spotlight

- Promotional News and Content - displayed on the homepage and all pages

- Leverage our proven event promotion methodology - The Lead Machine gets you leads

Contact Andrew our digital campaign designer on 0412 390 000 or via email



Security requirements such as confidentiality, integrity and authentication have become mandatory in most industries.

Data encryption methods previously used only by military and intelligence services have become common practice in all data transfer networks across all platforms, in all industries where information is sensitive and vital (financial and government institutions, critical infrastructure, data centres, and service providers).

Get the full details on Layer-1 encryption solutions straight from PacketLight’s optical networks experts.

This white paper titled, “When 1% of the Light Equals 100% of the Information” is a must read for anyone within the fiber optics, cybersecurity or related industry sectors.

To access click Download here.


Sam Varghese

website statistics

Sam Varghese has been writing for iTWire since 2006, a year after the site came into existence. For nearly a decade thereafter, he wrote mostly about free and open source software, based on his own use of this genre of software. Since May 2016, he has been writing across many areas of technology. He has been a journalist for nearly 40 years in India (Indian Express and Deccan Herald), the UAE (Khaleej Times) and Australia (Daily Commercial News (now defunct) and The Age). His personal blog is titled Irregular Expression.



Recent Comments