The Defense Advanced Research Projects Agency (DARPA) isn’t known for thinking small, and DARPA has turned its attention (and budget) to a massive task: developing a set of software engines that can transcribe, translate, and summarize both text and speech without training or human intervention. The program, called the Global Autonomous Language Exploitation (GALE), attempts to address the lack of qualified linguists and analysts who know important languages like Mandarin and Arabic.
When bid solicitations went out last year, they told interested parties that DARPA wanted three separate modules built. The first handles the transcription of spoken languages into text. The second is a translation module that can convert foreign text into English, and the third is a “distillation” engine that can answer questions and summarize information provided by the other two modules.
This except, from a Wired news story today, is exciting for a number of reasons. Before we go into them however, this story also begs the question – don’t US government agencies ever talk to each other? I blogged earlier of an equally ambitious and related research project that is developing software that would let the government monitor negative opinions of the United States or its leaders in newspapers and other publications overseas.
Now if only Homeland Defense and DARPA collaborated instead of competed, we might actually have something half-decent instead millions of dollars being spent on overlapping research. And we are talking about millions of dollars – US$ 16 million spent on GALE in the first year alone, with the organisations that are part of it, at least at present, commanding research budgets that run into billions of dollars (such as IBM Corp.). As another news story on GALE points out:
GALE’s goal is to deliver, by 2010, software that can almost instantly translate Arabic and Mandarin Chinese with 90 to 95 percent accuracy.
DARPA’s project, in its sheer ambition, sets out to achieve a task that in civilian use, can revolutionise the manner in which information is translated into knowledge for, say, foreign / third party mediators involved in brokering a peace agreement in the middle east or in any region that communicates in a language they are not well versed in. In Strong Angel III, I saw an inkling of what’s possible today with existing technology to transcribe, in real time, Arabic TV broadcasts into English. If this were to be developed further, and GALE is a harbinger of very large scale information analysis systems to come, we may well be dealing with the emergence of expert systems that can be tailor made to deliver contextual on demand feedback for complex questions faced by peace negotiators at every stage in a peace process.
This is, of course, based on the assumption that this technology will eventually make it into civilian hands. It would be terribly wasteful if the billions of dollars spent on research is only limited to the development of embargoed tools.
In my earlier post, I explored the power and potential of a search mashup that combined Podzinger, Riya, Clusty and The Living Library. I should add GALE to the list.