Cognition for the web?

When we designed the first secure collaboration platform for Sri Lankan peace stakeholders during the CFA using Groove Virtual Office (as part of the One Text process) a significant problem faced was searching for and indexing information. GVO didn’t have tags, so the taxonomy we determined had to work for everyone. This worked better for those experienced in searching for information online as opposed to those who were new to PCs.

During the course of operations, we used a single alpha version of a tool designed to make sense out of information, the Semantic Navigator by ISX Corporation. It never really got off the ground – for starters, it was massively power and memory hungry, ruling it out for anything other than machines with the highest spec. 

Peacebuilding lends itself to a semantic database. It is in theory and praxis interlinked with a range of issues, actors, processes and places. The traditional keyword based search is incredibly frustrating to manage information related to a peace process, simply because keyword based search engines, including Google, cannot and do not understand the nature of the information they index. 

As Ars Technica notes,

If you aren’t familiar with the concept of semantic search, it is, in a nutshell, the exploration and harnessing of the meaning of words to provide more effective search results. There is a growing perception that current keyword- and link-based technologies used by most of the large and even not-so-large search companies like Google, Yahoo, and even Ask.com, have outgrown their usefulness because they don’t understand anything about the actual words used in a query. A word like “cold” could mean many things, from the physical state of an environment to having the sniffles.

 

Enter Cognition

Cognition’s technology is built on over 20 years of research into the semantics of the English language, and  “understands” four million semantic contexts (word meanings that create the context for interpreting other related words), over 536,000 word senses (word and phrase meanings), 75,000 concept classes (or synonym classes of word meanings), 7,500 nodes in the technology’s ontology or classification scheme, and 506,000 word stems (roots of words) for the English language.
Cognition Semantics
Cognition Semantics
Check out this video to see just how amazing Cognition’s capabilities are. The narration is stilted and goes for the heavy sell, but it’s easy to see the potential of this technology for complex, long-term, multi-stakeholder processes such as peacebuilding and conflict transformation.
For a live example of Cognition, click here
After years of honing search skills using Boolean operators on Google, it’s strangely tremendously difficult at first to use natural language search operators. The results for questions like “What are the ethnic groups in Sri Lanka?” give you results from Wikipedia that keenly approximate a search operation using boolean operators on Google. 
Another interesting demonstration of Cognition’s technology can be found in this (US) case law index
The application of Cognition’s natural language processing technology to an online peace library would be astounding, with the one obvious limitation being that it would be limited to content in English. Still, this is the future of web search and indexing. 

How much of information is too much information?

When it comes to Google, the size of the web and the size of their index are apparently very different.

What’s interesting to recognise here is that Google cannot afford to index ALL of the web. Coupled with the fact that we are losing, irrevocably, information that defines us a larger humanity or as identity groups and individuals, it just begs the question as to whether all this information has contributed to an equal growth in knowledge. 

I think not.

I’ve raised a number of questions that trouble me very deeply as someone deeply interested in saving the knowledge generated, used, abused and ignored in a peace process. Terabytes of information hugely pertinent to researchers, historians and scholars of a process as multi-faceted and complex as peacebuilding are often to be found in disparate proprietary systems with limited access, proprietary formats with encryption keys residing with those at risk themselves of being killed, badly managed archives, perishable media and aren’t backed up – to name just a few of the problems. 

I was caught by the fact that what people consider the web is actually what Google defines as the web:

But it’s also very expensive to index sites. And the fact that Google indexes many news sites, blogs and other rapidly changing web sites every 15 minutes makes all that indexing even more expensive. So they make value judgment on what to actually index and what not to. And most of the web is left out.

Emphasis mine. 

I find that last bit positively frightening.