No. 1 Story

Technology reinforces generation gap

If you believe that technology could be bridging the generation gap, think again. According to Deloitte’s first State of the Media report it’s as stark as ever.

read more

Related Articles

Adoption of cloud computing has reached a tipping point  - but don’t expect legacy...
In yet another blow to the Facebook IPO this week, following the withdrawal of...
Recruitment technology and social media have played a significant role in growing business in...
It's no longer unusual for a household or small business to use a mixed...
It's no longer unusual for a household or small business to use a mixed...

Google index grows to one trillion pages

Your IT - Home IT

The current index would be even bigger than that astonishing 1 trillion number if Google did not actively filter out the multiple URLs with exactly the same page content. "Even after removing those exact duplicates, we saw a trillion unique URLs" Alpert and Hajaj say, adding "the number of individual web pages out there is growing by several billion pages per day."

The truth is that nobody knows exactly how big the web is or how many absolutely unique pages it contains. It can only ever be a best guess metric because even Google has to admit it simply does not have the resources or time to look at them all.

"Strictly speaking" Google says "the number of pages out there is infinite." By way of example it offers the case of web calendars which often incorporate a link to 'the next day' activities. If Google followed these, it argues, it would be stuck in a forever search loop. "We're not doing that, obviously, since there would be little benefit to you."

In fact, Google did not index every one of that trillion pages claim either because many of them are reported to be very similar to each other, or contain auto-generated content that is not if much interest to the general web searching public.

Google does claim to have the most comprehensive index of any search engine however, and we have no inclination to argue with them there. But imagine just how much better it could be if it were to index the so called Deep Web.

Back in the year 2000, when the Google index hit a billion pages remember, a University of Michigan study was claiming that the Deep Web contained something in the region of 550 billion individual documents.

Do the math on that to take account of the new 1 trillion pages Google index figure, and that's what we call really big...