Header Ziff Davis Enterprise
Advertisement
Advertisement
Tuesday, January 29, 2008 1:13 PM/EST

Recognizing Speech Recognition

dragon.PNG

Speech recognition technology has made advances in the past few years, but has that growth been enough to call it a success?


Early last year I launched the eWEEK Emerging Technology site with a list of 10 emerging technologies that flopped. While the list included some technologies that no one argued with--hi there, Microsoft Bob and the CueCat--there were some choices that elicited more than a few protests.

Speech recognition, in particular, was one technology that drew more than a few cries of foul. Several readers pointed out the importance of speech recognition to people with disabilities, and I received several invitations from speech recognition software leader Nuance to take a fresh look at the current state of speech recognition.

That is exactly what I've done, and I have to say that I am impressed in many ways with the current state of speech recognition technology. However, I don't know if I'm impressed enough to change my perception of it as a technology that hasn't lived up to its promises--a flop, if you will.

On the plus side, speech recognition is seeing a boom in hype and high profile implementations that hasn't been seen since its heyday in the late 1990s. Leading the charge are the omnipresent commercials and ads touting the Microsoft Sync feature found in some cars, which makes it possible to control music and other car features with simple voice commands (what the ads don't say is that the underlying technology for Sync comes from Nuance).

A Commanding Voice


In many ways, command is the top functionality for speech recognition. In most cases, it doesn't require training and even people who don't think they want to dictate memos and letters to a computer see the value of being able to use simple voice commands like "call Bob" or "play Icky Thump."

While I didn't get a chance to try out speech commands on a Sync-enabled car, I did test it out using Nuance's Dragon NaturallySpeaking 9.5 and a smart phone with voice command features enabled.

In my tests, voice commands worked well, at least as far as recognizing what I wanted done. On the PC with Dragon installed, I sometimes had to repeat commands, but all in all, it worked.

The voice commands on the phone were both more impressive and more frustrating. Using voice commands to dial numbers is a classic win-win situation and worked well in my tests, making it possible to say "call Jane Morris mobile" and have the phone dial her cell number.

The phone I tested also had Nuance-enabled voice command features that made it possible to do many different tasks, including sending e-mails, doing Web searches, and adding calendar appointments.

This feature worked well when it came to recognition, but was frustrating in delivery. That's because the actual technology is server-based. This means I would say a command, it would route into the cloud, and then come back to my phone. In tests, most commands took 30 to 45 seconds before delivering a result. When you need to be hands free, this is a necessary inconvenience, but in most other cases it was much too long to wait.

But all in all, I was impressed with voice command capabilities, especially in non-PC areas. On the PC it worked well, though to be honest, the voice command features in my old OS/2 Warp system were nearly as good.

Speaking Clearly


So what about speech recognition? In this area I can unequivocally say that the results and experience were greatly superior to the voice recognition I used several years ago.

speech training

One of the biggest improvements was in training. In the old days, training the system to your voice could take hours or even days. In Dragon NaturallySpeaking 9.5, I was done training in about 15 minutes.

After this short training session, the results I had in dictation were pretty good. I did several tests, including some long dictated texts, and the error rate was what I considered acceptable, really not much worse than if I typed a couple of paragraphs without going back to fix errors.

So after using the current generation of speech recognition, I can clearly say that it is much improved and works very well. So why doesn't that change my disappointed view of speech recognition?

Well, in one area it's the current reality versus the original promise. In the mid-1990s, many people claimed that speech recognition would take over offices, that you'd walk into a business and everyone would be talking to their computers instead of typing. Sorry, but that isn't likely to happen anytime soon.

However, an even bigger issue is that, while speech recognition has improved over the past 10 or so years, I don't think it has improved enough.

If you look back to other technologies from 1997, such as the Web, enterprise applications or mobile phones, the changes have been radical. Compared to these technologies, speech recognition has seen only modest gains.

In part I blame this on the lack of competition. In the 1990s, there were several major companies competing in the area of speech recognition. However, these competitors either failed (some spectacularly, as in the case of Lernout & Hauspie) or turned their attention elsewhere (IBM, I'm looking at you).

While this limited competition has been good for dominant vendors like Nuance, it generally isn't good for innovation. Nothing spurs a technology area into interesting and rewarding innovations quite like tough competition.

But yes, I've heard the calls and, yes, I agree that speech recognition is improved over past capabilities.

However, if the question is do I think that speech recognition will deliver on its original grand promises and become ubiquitous in all forms of computing, then I think the answer is still too garbled to know for sure.

TrackBack

TrackBack

http://etech.eweek.com/cgi-bin/mte/mt-tb.cgi/12582

Comments (7)

Dear Jim,

Here (website above) is one area where speech recognition is being utilised in a more all-encompassing way.

Cheers,

Adrian

Andy Green :

Hi Jim,

Everyone would love 100% accuracy. The reality is that never happens. Because of that, the acceptability of accuracy rates is highly subjective. What one person finds satisfactory, another will say it is horrible.

Can you tell us what was acceptable to you? Was it 90%, 75%, 60%?

Thanks!

Andy

Jim Rapoza :

I'll run another test when I get back to the system that has Dragon on it but it was around 85 percent. It definitely improved the more you use it especially as you get used to how it wants to hear you talk.

Frank Eden Dibble :

Fascinated by the technology, I have used NaturallySpeaking since version 1. One should never buy version 1 of anything but at least it was good for a laugh. At that time there was only an American version and I soon learnt of the enormous difference between the American language and the English language.

I now couldn't live without it. I am retired. Before that I had people to do typing for me. If you can type well and like it, don't bother. if you don't like typing, it is now very good. If you have problems, it is probably due to your poor enunciation. With version 1 I wondered if I should take elocution lessons. The more powerful your computer is, the better it runs and this was a problem in the early days. You had to really wait for your dictation to appear on the screen. It cannot be instant, but it is pretty close to it and it likes you to dictate in long phrases. It likes conversational language. Consequently, when I dictate like a lawyer, that is thinking carefully about each word, it is not quite as good but otherwise I am able to sit back, relax and dictate enormously long e-mails to bore my friends with.

I do not use it a great deal for commands. It is usually quicker to use a keyboard and mouse, although when in one program, such as the NaturallySpeaking word processing program, I will often command, for example, "Start Microsoft Word".

All this has been dictated with NaturallySpeaking directly into this window which is not designed for voice recognition but it was a lot easier than typing.

I thoroughly recommend it for anyone like me. By the way, buy the "Preferred". The cheaper version is a bit light on and the dearer ones you don't need. The kit includes a good headset and you do need a good one so don't bother stealing the program unless you have one.

Jack :

February 28, 2008

Hello and hi, Jim ; Loved your review of the speech recognition software. You know, an installed automobile amplification product (I have to look that up soon) for the speech recognition capable telephone could result in a faster response time, but I don't know what product (if one exists) would be best suited for such a "process".

I'm thinking of a device that focuses WiFi and/or cell tech signals and channels them directly to a base network server or automatically seeks the fastest way through multiple channels - a la James Bond or Get Smart or Matt Helm(smile). Also, in a more practical sense, the late 80's and 90's sonic amp "boom box" tech for CD's and multiple CD players in cars may offer some help along such lines, ya think?smile. Peace

Joe Jeffrey :

Let me offer a contrarian view. There are many cases in which a product is devised by people in love with the technology for its own sake, at the expense of actual usability. Digital car radio tuning comes to mind, in which you have to hit a button 10 or 20 times, rather than making the ergonomically much easier move of reach-twist. I suspect that in most cases speech recognition is such a case: it's primarily a technical stunt, to solve a problem that is not really a problem. There are obvious exceptions, of course, such as for people with disabilities. But, empirically, more and more people can type faster and faster, a trend that can only increase as younger and younger children learn it, and if you type fast speech recognition is of no value. And even perfect voice recognition will not handle homonyms: hair/hare, meet/meat, beet/beat, bye/buy, etc. That's the much, much harder problem of language understanding.

Bob LeMay :

I agree somewhat with Mr. Jeffrey--there are limited applications for speech recognition, which is why there is limited (or no) competition.

I also tried Dragon NaturallySpeaking 1.0, and it worked well enough. However, mouse and keyboard is much faster (for those so trained).

For those situations where hands-free is beneficial--driving--or required--handicap or injury--speech recognition is useful. Or for automated response, transcription, or translation it can reduce the human effort involved.

But for everyday word processing and computer commands...well, I wouldn't want to work in an office where everybody was constantly whispering to their computer all day. It's bad enough when they use their speakerphones.

Post a Comment

 
 



Most Recent Blogs

Emerging Technology
SEARCH
Google Labs
Testing Out Google Labs 
Review: Several new and interesting projects have been added to Google Labs.

WEB TECHNOLOGY
Firefox
Firefox 3: The Next-Gen Web Browser 
Review: Firefox 3 has new capabilities that will change the way that the Web is used.

Advertisement
Advertisement