Engineering vice-president Benjamin Sloss wrote on Tuesday US time that the configuration change was meant to be pushed out to a small number of servers in a single region, but was incorrectly sent to a larger number across several neighbouring regions.
This caused those regions to to stop using more than half of their available network capacity.
Sunday's incident affected multiple services in Google Cloud, G Suite and YouTube.
As iTWire's David Williams explained at the time, "It seems one poor Google tech — who I suspect now has to bring in donuts for the rest of the office — accidentally added a single "/" to the list of bad sites, on a line all by its lonesome self.
"When Google's search results check for matches the solitary slash actually rings true for everything – every URL with a slash in it registered as malware according to this search term. The solution wasn't to reboot anything, it was to take out that little one liner from the bad sites register."
Of Sunday's screw-up, Sloss wrote: "The network traffic to/from those regions then tried to fit into the remaining network capacity, but it did not.
"The network became congested, and our networking systems correctly triaged the traffic overload and dropped larger, less latency-sensitive traffic in order to preserve smaller latency-sensitive traffic flows, much as urgent packages may be couriered by bicycle through even the worst traffic jam."
He said the issue was detected "within seconds" but took much longer to fix than the target of a few minutes.
"Once alerted, engineering teams quickly identified the cause of the network congestion, but the same network congestion which was creating service degradation also slowed the engineering teams’ ability to restore the correct configurations, prolonging the outage," Sloss said.
"The Google teams were keenly aware that every minute which passed represented another minute of user impact, and brought on additional help to parallelise restoration efforts."
As to the impact, Sloss said YouTube had seen a 2.5% drop in views for an hour, while Google Cloud Storage showed a 30% reduction in traffic. About 1% of Gmail users had experienced issues.
"With all services restored to normal operation, Google’s engineering teams are now conducting a thorough post-mortem to ensure we understand all the contributing factors to both the network capacity loss and the slow restoration," he said.
"We will then have a focused engineering sprint to ensure we have not only fixed the direct cause of the problem, but also guarded against the entire class of issues illustrated by this event."
But his statement about the end of the issue may be a little premature. The Google Cloud status page has the following legend at the end: "We're investigating an issue with Google Compute Engine Persistent Disk in us-east4-b and us-east4-c. Affected customers may observe IO errors on Persistent Disks attached to instances and/or may fail to create PD snapshots in us-east4-b and us-east4-c.
"The issue should be resolved for majority of users and we expect a full resolution in the near future. We're waiting on our final changes to propagate. We will provide another status update by Tuesday, 2019-06-04 17:10 US/Pacific with current details." That update should be out at 10.10am AEST if it is on time.