The past few years have shown that U.S. government intelligence goes only so far. One of the biggest challenges is recognizing vital information in foreign languages — and acting quickly on it. That’s why the military would love software that can listen to TV broadcasts or phone conversations and read Web sites in Arabic and Chinese, translate them into English and summarize the key elements for humans.
But each of those steps has long bedeviled computer scientists. Perfecting them and combining them — well, that is “DARPA hard.” That means it’s difficult even by the extreme standards of the Pentagon’s next-generation technology arm, the Defense Advanced Research Projects Agency.
Last year DARPA launched a project that aims to create that real-time translation software. It’s called GALE, for Global Autonomous Language Exploitation. And on top of GALE’s technical challenges, DARPA added some twists.
It hired three teams of researchers to chase the problem for up to five years. Each year, their progress would be evaluated, and the worst-performing team could be eliminated. Or the program could be shut down entirely.
DARPA often threatens to cut — “downselect” in its lingo — people from a project. But in the small world of speech-to-text and machine-translation researchers, being booted off the GALE island would be an unfamiliar blow.
That’s because DARPA’s three choices for GALE contestants were among the best of the best: IBM Corp., backed by a $6 billion annual research budget; SRI International, a $300 million, nonprofit research organization based in Silicon Valley; and BBN Technologies Inc., a $200 million research contractor headquartered in Cambridge.
Principal Scientist Ralph Weishedel, left, Chief Scientist and GALE Principal Investigator, John Makhoul, center, and Principal Scientist Richard Schwartz, are seen at BBN Technologies, in Cambridge, Mass.
GALE might have threatened the most havoc at BBN — which let The Associated Press observe its progress for a rare peek into the frequently secretive work done for DARPA.
BBN executives and other investors purchased the company from its former owner, Verizon Communications Inc., in 2004, and are fighting to grow it in hopes of remaining independent. While BBN wants to diversify — it gets about 80 percent of its revenue from the military — this is not the time to be losing big deals like GALE, which brought BBN $16 million in the first year.
Being ejected would be “unthinkable,” said John Makhoul, the head of BBN’s GALE team.
“I cannot entertain that idea right now,” he said several months before DARPA’s first evaluation. “It’s just so drastic that we just don’t think about it.”
GALE commanded about two dozen of BBN’s roughly 400 researchers, so Makhoul felt sure that if DARPA booted BBN, a lot of people would lose their jobs, possibly even him. That sounded like an extreme prospect for one of the leading minds in his field, but Makhoul wasn’t so sure: “Given the pressures the company is under right now, I don’t know.”
Creating the teams
A display in the lobby boasts that BBN is “where wizards work.” The company — formerly called Bolt, Beranek and Newman, its founders — might best be known for its seminal 1960s work on the computer network that became the Internet. More recently, BBN drew acclaim early in the Iraq war when it developed and deployed a sniper-detection system in just two months.
But the company also is a longtime hub for speech-recognition and translation technologies. In fact, the IBM and SRI teams in GALE were headed by men who had come to their current employers from BBN, where each worked with Makhoul.
Even with all this expertise, BBN, SRI and IBM needed help. In a frenzy of phone calls and e-mails shortly after GALE was announced, representatives from each site raced to line up subcontractors at top university labs around the world — including people who had been rivals during previous government projects.
“This is a little like the making of sausage,” said David Israel, who headed SRI’s team.
BBN nabbed people at Cambridge University, the universities of Maryland and Southern California and a French lab, among others. IBM got Carnegie Mellon, Johns Hopkins, Brown and Stanford, plus researchers at Maryland not tied to BBN. SRI’s links included European and Asian schools, Columbia and the universities of California and Washington.
In fact, for all of GALE’s linguistic complexity, it might have paled next to what each team faced merely in combining the work done by the outside people brought aboard. Each partner would focus on particular steps — encoding an aspect of Chinese or Arabic grammar into a computer algorithm, for example.
“We’ve never had a project of this complexity,” BBN researcher Owen Kimball said in April. “You’re going to see people ripping their hair out.”
The GALE evaluation was still months off, but the team — heavily made up of immigrant engineers who had undertaken their own personal language projects in coming to America — was hunkering down.
“Put it this way: You can get your e-mail answered right away at 3 a.m. — by a lot of people,” said computer scientist Long Nguyen.
His colleague Spyros Matsoukas was hoping to do as much as possible before his wife had their third baby in June, right when BBN would be fine-tuning its software for the GALE test. He offered a what-can-you-do sort of smile. The last time BBN faced a huge project deadline, he had to tell his children he couldn’t take them swimming for a while.
Nguyen chimed in that he once slept at the office three nights in a row. That’s nothing, someone added: Rich Schwartz, a longtime BBN researcher, stayed up three nights in a row, no sleep. On multiple occasions.
“He’s too old to do that now,” Makhoul cracked.
GALE’s goal is to deliver, by 2010, software that can almost instantly translate Arabic and Mandarin Chinese with 90 to 95 percent accuracy.
That might be impossible. Humans might not even be that precise. Consider all the ways we mishear each other, or fail to grasp idioms, or apply one subjective interpretation instead of another. Why else do new translations of “Don Quixote” keep emerging, 400 years after it was written?
Fortunately for the GALE teams, they didn’t have to be near 95 percent right away. In the first year, they were expected to translate Arabic and Mandarin speech with 65 percent accuracy; with text the goal was 75 percent.
How hard was that? Before GALE, BBN boasted that it could automatically translate foreign news broadcasts with better than 80 percent accuracy. But DARPA wants translations not only from such controlled, well-articulated sources. GALE incorporates man-on-the-street interviews and raucous colloquial chats on the Web.
That’s where things get tricky. Background noise, dialects, accents, slang, short words like “on” or “of” that most speakers don’t bother to clearly enunciate — these are the stuff of nightmares for speech-recognition and machine-translation engineers.
Not to mention that Chinese and Arabic are structured very differently than English, making them a pain to translate.
“Arabic has this property: ‘He gave it to her’ would be one word. Little pieces in the one word capture lots of meaning,” said Salim Roukos, IBM’s GALE chief. Meanwhile, tense and gender are absent in Chinese.
To wring improvements from their translation software, the GALE teams fed their computers huge pools of sample broadcasts and texts in Arabic and Chinese. As the machines were exposed to more and more foreign sentences, they analyzed the content and structure, compiling an ever-deeper library of how words are spoken and the rules governing the languages.
Or so the researchers hoped. The name of the game is to fine-tune the computer process, known as an algorithm, that does the language analysis. Programming missteps can cause a computer to gain minimal insight from the new language data it is fed. It could even get worse at its translation task.
“It’s sort of trial and error, guided by intuitions and some knowledge,” BBN’s Schwartz said.
Though that’s not how it gets described in computer scientists’ meetings. “Rewrote the forward pass of the decoder algorithm to be a recursive transversal over the hypergraph, rather than a loop over spans,” one BBN programmer assured his team in a May presentation.
Tweaking the algorithms
Speech recognition, machine translation and language distillation don’t harbor many secret recipes. Everyone knows what everyone else is trying to do — tweak algorithms over and over.
The defining element of GALE — the government’s evaluation — was on the honor system, in keeping with the field’s open nature. The teams got the test in June — thousands of hours of audio and millions of pages in Arabic and Mandarin — and were expected to turn in their results later.
DARPA judges scored the computer translations by counting the number of human edits that the sentences needed in order for them to have the correct meaning. By this measure, the results largely met DARPA’s demands of 75 percent accuracy for text translation and 65 percent for speech.
The BBN-led team produced 75.3 percent accuracy with Arabic text, 75.2 percent in Chinese. It scored 69.4 percent in Arabic speech; 67.1 percent in Mandarin. IBM scored higher with Arabic text and SRI scored higher in Mandarin.
Then came the distillation section: open-ended questions posed to each team’s computers — based on 600,000 documents in Arabic, Chinese and English.
“How did Israel react to the Hamas election victory?” was one such question. “Describe attacks in Kuwait,” was another.
DARPA wanted to see how well the computers replicated human performance on such questions, including how precisely they could recall certain facts.
Here, too, the computers managed some articulate responses. “Since Jan. 10 (2005), police have clashed with Muslim fundamentalists and pursued them around the country, killing eight militants and arresting scores of others,” went one BBN response to the Kuwait question.
But it was not until three months later — after all three teams began working on year two of GALE in case they were picked to continue — that the researchers got DARPA’s ruling about who passed.
Tightening the screws
So who got rejected? No one.
At least not yet.
DARPA Director Anthony Tether and GALE program manager Joseph Olive decided each team had shown significant progress worth continuing to track.
But they did tighten the screws. In addition to expecting better translation accuracy in each of GALE’s four remaining years, DARPA will measure that performance more stringently. Now a high level of accuracy must be sustained over a very high percentage of a document. A bad patch of computer translation cannot be averaged away.
Just days after being informed of the new framework, Makhoul already had his eye on the next GALE evaluation, in June, and how his team would deliver the performance DARPA — and BBN — needed.
“It’s the same feeling again,” he said. “The pressure — it’s not off. It’s higher, in fact. Now the goals are harder for the second year than they were before.”