When it comes to Google, the size of the web and the size of their index are apparently very different.
What’s interesting to recognise here is that Google cannot afford to index ALL of the web. Coupled with the fact that we are losing, irrevocably, information that defines us a larger humanity or as identity groups and individuals, it just begs the question as to whether all this information has contributed to an equal growth in knowledge.
I think not.
I’ve raised a number of questions that trouble me very deeply as someone deeply interested in saving the knowledge generated, used, abused and ignored in a peace process. Terabytes of information hugely pertinent to researchers, historians and scholars of a process as multi-faceted and complex as peacebuilding are often to be found in disparate proprietary systems with limited access, proprietary formats with encryption keys residing with those at risk themselves of being killed, badly managed archives, perishable media and aren’t backed up – to name just a few of the problems.
I was caught by the fact that what people consider the web is actually what Google defines as the web:
But it’s also very expensive to index sites. And the fact that Google indexes many news sites, blogs and other rapidly changing web sites every 15 minutes makes all that indexing even more expensive. So they make value judgment on what to actually index and what not to. And most of the web is left out.
I find that last bit positively frightening.