How Microsoft is breaking the universal language barrier

S&L: Discussion of matters pertaining to theoretical and applied sciences, and logical thought.

Moderator: Charon

Post Reply
User avatar
rhoenix
The Artist formerly known as Rhoenix
Posts: 7998
Joined: Fri Dec 22, 2006 4:01 pm
17
Location: "Here," for varying values of "here."
Contact:

#1 How Microsoft is breaking the universal language barrier

Post by rhoenix »

popsci.com wrote:Earlier this week, roughly 50,000 Skype users woke up to a new way of communicating over the Web-based phone- and video-calling platform, a feature that could’ve been pulled straight out of Star Trek. The new function, called Skype Translator, translates voice calls between different languages in realtime, turning English to Spanish and Spanish back into English on the fly. Skype plans to incrementally add support for more than 40 languages, promising nothing short of a universal translator for desktops and mobile devices.

The product of more than a decade of dedicated research and development by Microsoft Research (Microsoft acquired Skype in 2011), Skype Translator does what several other Silicon Valley icons—not to mention the U.S. Department of Defense—have not yet been able to do. To do so, Microsoft Research (MSR) had to solve some major machine learning problems while pushing technologies like deep neural networks into new territory.

Their lofty goal: To make it possible for any human on Earth to communicate with any other human on Earth, with no linguistic divide. “Skype has always been about breaking down barriers,” says Gurdeep Pall, Skype’s corporate vice president. “We think with Skype Translator we’ll be able to fill a gap that’s existed for a long time, really since the beginning of human communication.”

Microsoft has a long institutional relationship with machine translation, one that reaches back to MSR’s earliest days. The machine learning group is one of the oldest within MSR, says MSR Strategy Director Vikram Dendi. Bill Gates funded the group and made it a priority.

The “a computer on every desk and in every home” mantra that ruled Microsoft’s thinking at the time created an challenge for MSR. More data was being created in more places—and in more languages—than every before, Dendi says, and Microsoft researchers were tasked with creating translation engines to tackle the problem. To this day, Dendi says, one of the largest troves of untouched machine-translated text on the Internet is Microsoft’s help forums, which are translated into dozens of languages using translation engines developed in-house.

But that’s text. Translating spoken language—and especially doing so in real time—requires a whole different set of tools. Spoken words aren’t just a different medium of linguistic communication; we compose our words differently in speech and in text. Then there’s inflection, tone, body language, slang, idiom, mispronunciation, regional dialect and colloquialism. Text offers data; speech and all its nuances offers nothing but problems.

To create a working speech-to-speech translation technology, MSR researchers knew they would have to teach their system to not only translate one word to the same word in another language based on a standard set of rules, but to understand the meaning of words and sentences. They would have to teach the machine, and the machine would have to learn.

There’s more than one way to train a computer on language, says MSR Corporate Vice President Peter Lee, but there’s also more than one way for human language to trip up a computer. MSR took a multi-faceted approach. “It’s combination of understanding the language—syntax and structure and meaning—but also a statistical matching process,” he says. “If I say ‘I like ice cream,’ you know that it probably means what it means. But if I say ‘oh, that fumble was the straw that broke the camel’s back,’ if you do a word-for-word translation into another language it probably wouldn’t make much sense.”

This gets at the core of the machine translation problem: understanding and translating meaning, not just words. MSR researchers get around this by mapping words and whole phrases across languages using statistical probability. They started building their body of knowledge using text, any text that’s already been translated—textbooks, EU parliament speeches, etc. That allows the translation engine to set a baseline and begin figuring out which phrases—even those that don’t translate literally—overlap.

To translate an English phrase like “the straw that broke the camel’s back” into, say, German, the system looks for probabilistic matches, selecting the best solution from a number of candidate phrases based on what it thinks is most likely to be correct. Over time the system builds confidence in certain results, reducing errors. With enough use, it figures out that an equivalent phrase, “the drop that tipped the bucket,” will likely sound more familiar to a German speaker.

This kind of probabilistic, statistical matching allows the system to get smarter over time, but it doesn’t really represent a breakthrough in machine learning or translation (though MSR researchers would point out that they’ve built some pretty sophisticated and unique syntax parsing algorithms into their engine). And anyhow, translation is no longer the hardest part of the equation. The real breakthrough for real-time speech-to-speech translation came around in 2009, when a group at MSR decided to return to deep neural network research in an effort to enhance speech recognition and synthesis—the turning of spoken words into text and vice versa.

Deep neural networks (DNNs)—biologically inspired computing paradigms designed more like the human brain than a classical computer—enable computers to learn observationally through a powerful process known as deep learning. But at the beginning of the last decade building DNN-based systems proved difficult. Many researchers turned to other solutions with more near-term promise.

For something like a decade, machine translation performance stagnated. “There was a full 10-year period in which we were working really hard and discovering new things every day, but the quality of our system wasn’t improving,” Lee says. “Then we finally hit a tipping point.” MSR had never fully walked away from DNN research, and when a group of machine translation researchers began actively pursuing them as a means of creating faster, more efficient speech recognition engines, they experienced the breakthrough they’d long sought. DNN technology had come a long way, and scientists at MSR and elsewhere had by this point been able to develop sophisticated machine learning models via DNNs that performed more like neurons in the human brain than traditional computers. “Returning to DNNs was crucial,” Dendi says. “If there is a single breakthrough, that’s it.”

New DNN-based models that learn as they go proved capable of building larger and more complex bodies of knowledge about the data sets they were trained on—including things like language. Speech recognition accuracy rates shot up by 25 percent. Moreover, DNNs are fast enough to make real-time translation a reality, as 50,000 people found out this week.

Not that users would notice. All this technological wizardry happens in the background. When one party on a Skype Translator call speaks, his or her words touch all of those pieces, traveling first to the cloud, then in series through a speech recognition system, a program that cleans up unnecessary “ums” and “ahs” and the like, a translation engine, and a speech synthesizer that turns that translation back into audible speech. Half a beat after that person stops speaking, an audio translation is already playing while a text transcript of the translation displays within the Skype app.

Skype Translator isn’t perfect. It still gets hung up on idioms it doesn’t understand, or turns of phrases that are uncommon, or the fact that most of us speak our mother tongues with a certain degree of disregard for proper pronunciation, sentence structure, or diction. Lee and his colleagues at Skype aren’t bothered by this. They’re more interested to see how the system evolves with tens of thousands of users not only testing its limitations but teaching it new aspects of speech and human interaction that MSR hasn’t yet considered.

“We feel pretty good about it,” Lee says. “But when this thing gets out in the wild, who knows what happens?”
This... I have to admit I'm very impressed by. Yes, it's being funded and developed by Microsoft, the same company responsible for Windows ME, Microsoft Bob, and Windows 8. I acknowledge this. However, what they're doing with Skype is very impressive to me.

In fact, to me, this is an example of the true power of the Internet - giving everyone a common platform upon which to communicate with and share information with one another.
"Before you diagnose yourself with depression or low self-esteem, make sure that you are not, in fact, just surrounded by assholes."

- William Gibson


Josh wrote:What? There's nothing weird about having a pet housefly. He smuggles cigarettes for me.
User avatar
Josh
Resident of the Kingdom of Eternal Cockjobbery
Posts: 8114
Joined: Mon Jun 06, 2005 4:51 pm
19
Location: Kingdom of Eternal Cockjobbery

#2 Re: How Microsoft is breaking the universal language barrier

Post by Josh »

Microsoft has achieved some awesome shit, they just mix it in with flopping commercial failures.
When the Frog God smiles, arm yourself.
"'Flammable' and 'inflammable' have the same meaning! This language is insane!"
GIVE ME COFFEE AND I WILL ALLOW YOU TO LIVE!- Frigid
"Ork 'as no automatic code o' survival. 'is partic'lar distinction from all udda livin' gits is tha necessity ta act inna face o' alternatives by means o' dakka."
I created the sound of madness, wrote the book on pain
User avatar
White Haven
Disciple
Posts: 752
Joined: Sat May 20, 2006 10:45 am
18
Location: Richmond Virginia, the Capitol of Treason
Contact:

#3 Re: How Microsoft is breaking the universal language barrier

Post by White Haven »

Screw you, man, Bob was awesome.
ImageImageChronological Incontinence: Time warps around the poster. The thread topic winks out of existence and reappears in 1d10 posts.

Out of Context Theatre, this week starring rhoenix
-'I need to hit the can, but if you wouldn't mind joining me for number two, I'd be grateful.'
User avatar
Comrade Tortoise
Exemplar
Posts: 4832
Joined: Thu Jun 09, 2005 1:33 am
19
Location: Land of steers and queers indeed
Contact:

#4 Re: How Microsoft is breaking the universal language barrier

Post by Comrade Tortoise »

Given the vast inferiority of Bing translate compared to google for text translation, my expectations for the accuracy of real time translation are.. not high.
"Nothing in biology makes sense except in the light of evolution."
- Theodosius Dobzhansky

There is no word harsh enough for this. No verbal edge sharp and cold enough to set forth the flaying needed. English is to young and the elder languages of the earth beyond me. ~Frigid

The Holocaust was an Amazing Logistical Achievement~Havoc
User avatar
Josh
Resident of the Kingdom of Eternal Cockjobbery
Posts: 8114
Joined: Mon Jun 06, 2005 4:51 pm
19
Location: Kingdom of Eternal Cockjobbery

#5 Re: How Microsoft is breaking the universal language barrier

Post by Josh »

Pretty much nothing starts out of the gate in perfected form. The fact that we're moving in this direction is pretty huge, all the same.

Technology is continually making the world a smaller place. Which is fine with me, because that just points us up to where we need to be going.
When the Frog God smiles, arm yourself.
"'Flammable' and 'inflammable' have the same meaning! This language is insane!"
GIVE ME COFFEE AND I WILL ALLOW YOU TO LIVE!- Frigid
"Ork 'as no automatic code o' survival. 'is partic'lar distinction from all udda livin' gits is tha necessity ta act inna face o' alternatives by means o' dakka."
I created the sound of madness, wrote the book on pain
Post Reply