Google’s Misleading Blog Post: The Size Of The Web And The Size Of Their Index A
Posted by: admin in Company and Industry NewsIn a blog post today Google says they’ve identified 1 trillion one-of-a-kind URLs on the internet. It’s actually more, they state, but some web pages have multiple URLs with exactly the same content or URLs that are auto-generated copies of each other. What they note way down in the fourth paragraph, however, is that they don’t […]
In a blog post this day Google states they’ve identified 1 trillion one-of-a-kind URLs on the web. It’s actually more, they state, but some web pages have multiple URLs with exactly the same content or URLs that are auto-generated copies of each other.
What they note way down in the fourth paragraph, however, is that they don’t actually index all of those pages, so you can’t find them on Google. Estimates on the true size of the Google index are a mere 40 billion pages or so.
Why don’t they index all the pages they’ve found? Some of them are spam. But it’s also very pricey to index sites. And the fact that Google indexes many news sites, blogs and other rapidly changing web sites every 15 minutes makes all that indexing even more costly. So they make value judgment on what to actually index and what not to. And most of the web is left out.
Google also says “But we’re proud to have the most comprehensive index of any search engine.”
That may be true this day, but it probably won’t be true next week (check back here then). Google knows that as well as we do, and that’s why they posted this this day.
Crunch Network: CrunchGear drool over the sexiest new gadgets and hardware.
Popularity: 1% [?]















Entries (RSS)