Computer-based speech recognition, whereby you speak to a computer and the computer understands what you're saying, is often portrayed as a thing of the future. For instance, "Star Trek" fans have seen Captain Picard walk up to a replicator on the Enterprise and say, "Earl Grey, hot." Within seconds, a steaming cup of tea appears, just the way he likes it. But you don't have to wait for the 24th century to use speech recognition. The technology exists now, and you can use it on your PC.
I tested a selection of the major players in the two main categories of speech recognition products. Among dictation packages, I tested Dragon Systems' NaturallySpeaking, IBM's ViaVoice Gold, and Kurzweil VoicePro, which is licensed by Alpha Software. The three command recognition programs I tested were Alpha's Kurzweil Voice Commands, Listen for Windows from Verbex, and VoicE-mail from Wizzard Software. These products work primarily with word processors, not other business applications.
The Talk of the Town
Ideally, PC-based speech recognition products would serve a vast potential market, including poor typists, disabled users who have limited use of their hands, and people who have or want to avoid repetitive strain injuries. And let's not forget people who hate being chained to their keyboards.
Based on my experiences, however, I think most users will want to wait. As Computer Currents' Ergonomic Office columnist also found (see her columns at www.currents.net/magazine/national/1603/ergo1603.html and in this issue at www.currents.net/magazine/national/1605/ergo1605.html), only someone who's willing to spend weeks or months with the software will achieve the accuracy the programs promise. Today's best examples of usable speech recognition exist in telephony and other products; read all about them in the "Speech Recognition's State of the Art" sidebar.
Figures of Speech
The terms "speech recognition" and "voice recognition" are often used interchangeably when talking about these products, but actually they're not synonymous. Speech recognition software reacts to what you say, transcribing your words or running an application. Voice recognition has more to do with using your voice for identification purposes. Instead of using a PIN or your mother's maiden name to access your bank account over the phone, for instance, you'd use your voiceprint. Products such as Keyware Technology's VoiceGuardian (www.keywareusa.com) and Nuance Communication's Verifier (www.nuance.com) will take a voiceprint of your speech over the telephone to verify your identity.
There are two categories of PC-based speech recognition software: dictation and command recognition. Dictation is the most compelling of the two. An effective program could make typing into your word processor a thing of the past. Instead, all you'd do is speak into a microphone and watch the computer transform your words into text on-screen. Command recognition lets you cut the sacred link between you and your mouse: You speak system and program commands to your computer, and the computer carries them out.
PC-based speech recognition still has a long way to go, because its job isn't an easy one. It takes a lot of computing power to "read" your voice and assign meaning to the sounds. And depending on one's accent and speech patterns, the same word can sound different when spoken by four different people. That's why the programs need to be trained so extensively in order to recognize your particular voice. Today's applications are powerful enough to handle these challenges-but only just. The six programs I tried required constant training. I had to stop frequently to correct errors, and their accuracy was nowhere near what the companies promised in their promotional materials.
The Sound and the Fury
Most PC-based speech recognition products have certain characteristics in common. First, you need a powerful computer to run them. For instance, Kurzweil VoicePro requires a 90MHz Pentium with 24MB of RAM. But these requirements are modest compared to those for IBM's ViaVoice Gold, which recommends a 150MHz Pentium MMX with 32MB of RAM. You need a 16-bit sound card to use with the microphone you'll also need (most programs include one). The current generation of products does not require special cards as their predecessors did. Also note that you generally can't use these products right out of the box. You'll have to train the program by talking to it until it develops a training profile of your speech patterns.
Going through the training sessions required for a speech recognition product can be incredibly tedious, and it's only the beginning. Once you finish the initial training, the program continues to learn as you speak to it. You may have to put up with recognition errors and the tiresome process of correcting them for weeks until the program accumulates enough samples to recognize most of what you say without error. Sometimes new errors creep in because people aren't always consistent in the way they pronounce the same word. And you can't ignore any errors, because the computer will consider a neglected error to be correct.
As with any program, you have to learn its command structure. It's a bit harder with speech recognition programs, since the computer is always listening. Commands have to be spoken rather quickly so they don't get confused with text; for instance, you'd say, "close parentheses" as one word rather than two. If you have to fumble for the right command, extraneous noise may be recorded as unwanted text or actions on the screen.
Say What?
Speech recognition programs have several built-in limitations. Because they strive to catch every nuance of your speech, they are very sensitive to background noise. Don't plan on using a speech recognition program on a factory floor; you have to work in a quiet place. If the phone rings, or if someone comes into your office to chat, you'll have to turn off your mike or you may get some strange results on the screen. Even in a quiet environment, you have to speak clearly and steadily to the computer. Otherwise, the program may try to translate your heavy breathing, coughs, and stray "ums" and "ers" as words. I wonder how distracting a row of office cubicles filled with people talking to their PCs would be.
Most of the programs I reviewed address the noise problem by including a noise-canceling microphone with an attached headset. These microphones really don't cancel noise; at best they may reduce it a bit. But the software is often optimized for the included microphone, so you should use that one or one recommended by the vendor. I found none of the headsets especially comfortable; they all had tight headbands.
Just as it's not a great idea to share your toothbrush, it's not smart to let anyone else speak to the computer using your training profile. Speech recognition programs use this profile to adapt to your voice and pronunciation. Let someone else use it, and the program will get confused. Luckily, all the speech recognition programs I tested can store multiple training profiles, so each user can develop his or her own. A tip: Be sure to back up your voice profile often. (Many products offer backup capabilities.) If you lose these files, which are constantly being updated as the system learns more about your voice, you'll have to start the painful training process from scratch.
Finally, although these products may save wear and tear on your hands and arms by reducing keyboard and mouse use, you still have to worry about straining your voice from speaking all day. And whereas correcting the inaccuracies of typing is simple-just retype the word-correcting errors with speech recognition software involves multiple steps, which could drive even the most patient person bananas.
Computer, Take a Memo
Until recently, most dictation products relied on discrete speech. This means you had to dictate one ... word ... at ... a ... time, pausing between each word. The latest products understand continuous speech, so you can speak at a more or less normal rate. Continuous speech products, as you might guess, require more computing power than discrete speech recognition products do.
The three major dictation engines come from IBM, Dragon, and Lernout & Hauspie, which uses the Kurzweil name. IBM and Dragon have continuous speech products on the market. L&H licenses a discrete speech product to Alpha Software, and its first branded continuous speech product is scheduled for release in the near future.
All dictation programs come with a dictionary, but you'll have to teach the program words it doesn't know, such as technical terms or proper names. The more words the computer keeps in its active memory, the more computing power you'll need. You must dictate all punctuation marks, capitalization, and information about where to begin new paragraphs. You have to learn a dictation style that distinguishes between text and formatting instructions. Some programs make you dictate into their own modified word processors and cut and paste the results into your regular application. Others let you dictate directly into any word processing program.
Testing, 1, 2, 3, 4, 5, 6, 7 ...
Ideally, testing these products would take months, which would be enough time to undergo rigorous and constantly refined training with each one. Alas, I did not have that luxury. In each case, I went through the required training for each program and then spent several hours dictating and correcting text. I read each program a passage from two documents: one business and one literary. For business, I used the beginning of the letter to shareholders from the 1996 annual report of the U.S. Surgical Corporation. For literature, I dictated the G-rated beginning of D.H. Lawrence's "Lady Chatterley's Lover." Both pieces had a few tricky words, but nothing out of the realm of normal language. Speech recognition programs' dictionaries are supposedly geared toward the language of business, however, and I expected to get more errors with the literary passage. I read each passage four times, correcting errors after each session.
Dictation programs include a spell mode, in which you spell out any unfamiliar words using the phonetic alphabet (A = alpha, B = bravo, and so on). For my tests, I decided to forgo this hand-holding option. Instead, I taught the computer unfamiliar words by pronouncing them. This method made it easier for me to determine how well the programs learned words they were unfamiliar with.
I tested each product using its own recommended microphone; my computer, a 266MHz Pentium II with 64MB of memory; and my voice. Even though I'm a native Philadelphian (yo!), my voice has a typical American accent. Furthermore, I am a trained singer and actor who went through years of voice and dictation lessons.
After each reading of the business or literary sample, I counted the errors the program made. Ideally, of course, a speech recognition program would make no errors at all. But given the relative newness of this PC genre, I would accept the error level of an accurate typist: no more than three mistakes per 100 words. None of the programs I tested attained this level of accuracy until the third or fourth try-if at all. After all the testing, my favorite product was ... none of them. The best dictation product I tested was an Internet-based system called CyberTranscriber. (See the "CyberTranscriber Speaks Your Language" sidebar).
Dragon NaturallySpeaking Deluxe
Dragon NaturallySpeaking Deluxe, a continuous speech product, costs nearly $500 more than any other application I reviewed. While it has a superior interface, it fails to offer significantly better recognition.
This "deluxe" package includes NaturallySpeaking, a continuous dictation product, and DragonDictate, a discrete dictation package that can be used concurrently to issue system and program commands. I tested only NaturallySpeaking. NaturallySpeaking is also available in a stand-alone, Personal Edition for $159, but the Deluxe Edition has some additional features, including multiuser capability.
NaturallySpeaking loads slowly, but once it's up and running, it sports a much better designed interface than IBM's ViaVoice Gold, especially in the correction mode. Unlike with ViaVoice Gold, with NaturallySpeaking you can use voice commands to move to a misunderstood word, issue a command to correct the word, and choose from a list to select the right word. You have to type a word only if it's not on the list. If the program confuses two words, you can train the computer on the spot to hear how you pronounce each one. The program will read text back to you. NaturallySpeaking's computer voice has a British accent.
The program provides a good multimedia tutorial that familiarizes you with the program. Training is still boring, but the samples you use, including funny excerpts by Scott Adams and Dave Barry, liven things up. In fact, I had to stop training for a moment until I stopped laughing.
NaturallySpeaking can also read word processing files to learn new words. One great feature is when it finds unfamiliar words, it shows you both the word and its context. When you choose the words to add to the dictionary, the program asks you to pronounce them.
The first reading of the business document yielded eight errors. A second training trimmed the errors to seven, but they were all new mistakes. The third time wasn't the charm: I got eight errors again. The program confused "are" and "our." I trained the program in the differences in pronunciation, but it still confused them. The final test yielded an acceptable three errors.
Results for the first reading of the literary sample were terrible and unintentionally amusing. The 25 errors included "ecstatic ozone" for "the cataclysm" and "Muslim" for "must live." After I corrected all the errors, the next session yielded 20 mistakes. The program had learned "Chatterley" and "cataclysm" but still could not distinguish between "ours" and "hours" or "knew" and "new." The third trial still contained 20 errors, as did the final test; a far worse performance than either IBM's or Alpha's. NaturallySpeaking never did get "ours" or "Constance" right. List price: $695. Dragon Systems Inc., 800/825-5897, www.naturalspeech.com.
IBM ViaVoice Gold
This continuous speech program costs $546 less than Dragon NaturallySpeaking Deluxe, but matched NaturallySpeaking's accuracy on the business document. It did significantly better with the literary sample. Its interface isn't as well designed as NaturallySpeaking's, however; error correction is too dependent on the mouse.
Training consists of reading a series of sentences: first a tutorial on the program and then a short ghost story by Mark Twain. You can dictate within major word processors or the program's own dictation pad. But IBM's system is the clunkiest of the three I evaluated. Double-click an error and a dialog box pops up. The dialog box often, but not always, contains a list of words that the program thinks could be correct. If the word is on the list, you click it. You can also click to capitalize or delete a word. When you click a word, you'll hear how you pronounced it. If the word is not on the list, you must type it in an input box. If the program does not know the word, it asks you to pronounce it so it can add the word to the dictionary.
In the first test of the business letter, ViaVoice Gold made 10 errors in 119 words. After correcting those errors, a second reading yielded eight errors-some new errors as well as repeats. After a third session, errors dropped to three, and in the final test there were also three errors.
ViaVoice must not have been an English major. Its first reading of D.H. Lawrence yielded 26 errors out of 150 words. Of course, it had to be taught words like "Chatterley." After I corrected the first set of errors, the second reading yielded 11 errors, and the program learned Mrs. Chatterley's name. The second teaching session brought 13 errors, and the final test yielded six-better, but hardly perfect. The program had trouble with "Flanders." I was frustrated because I could not drill that word into its memory. Also, it only learns pronunciation during dictation correction. Unlike NaturallySpeaking, ViaVoice has no separate procedure to teach the program how you pronounce a word.
The program does have some well thought out features. You can create shortcuts, macro phrases that will type multiple words. And like NaturallySpeaking, ViaVoice can read word processing files to add new words to its dictionary. However, you're not prompted to teach it how to pronounce them. The reading module won't recognize many word processing formats-only text, RTF, and Word 6. The program can also read your document to you in typical computer-like speech. Documentation is adequate, but there is no tutorial. List price: $149. IBM, 800/426-4968, www.ibm.com/viavoice.
Kurzweil VoicePro
Alpha's top-of-the-line product, Kurzweil VoicePro, is licensed from Lernout & Hauspie. The program combines voice dictation with command recognition for major Windows programs. Unlike the Dragon and IBM products, this one recognizes only discrete speech and therefore has lower system requirements. However, it also yielded the most errors on the first reads of both samples. Even after four sessions, it still made an unacceptable number of errors on what should have been the easiest sample-the business document.
VoicePro comes with a good tutorial. Initial training is limited mainly to teaching the program system commands. If you want to use this program to reduce repetitive strain injury, however, be aware that the initial training involves more than 900 mouse clicks.
With discrete speech products, you have to pause between each word. If you don't, the program will try to make one word out of several you spoke. Therefore, dictation takes a lot more time.
With VoicePro, you can correct errors on the spot. Or if you use Word or WordPerfect, you can correct them later by placing your cursor at the beginning or end of the incorrect word. To manage system resources, you can set how many words the program keeps in memory for later correction. If you encounter too many recognition problems, you can fine-tune how the program hears your words by adjusting accuracy (at the expense of speed), controlling the microphone input level, and by changing the time gap between words.
Correcting on the fly works best, but it's a slow process. A wizard lets you specify the type of error the program made and then tries to solve it, but it requires far too much work on your part. For instance, the program had confused "hostels" and "hospitals," so I told the wizard that two words were being confused. To correct the error I had to scroll through the entire dictionary twice: first to choose the correct word and then to find the word VoicePro thought it was. Then I had to say each word three times, clicking the mouse each time.
In the initial business-letter test, VoicePro made an unacceptable 20 errors, including words it failed to transcribe at all. After retraining, VoicePro's errors dropped to 13. I learned that you have to watch the screen constantly, because sometimes the program fails to hear a word and just stops, resulting in missing words. Errors dropped to eight with the third training session but climbed to nine with the fourth lesson.
In the initial literary reading, the program generated a hefty 32 errors. After I taught it words such as "Chatterley" and "Constance," errors fell to 22. I shut down the computer at this point. When I fired up VoicePro for its third training session, the program initiated the second stage of training, called enrolling, which consists of reading 400 different words and clicking the mouse after each one. Then I read a series of numbers twice, necessitating even more mouse clicks. Enrollment helped, though; for the third reading, errors dropped to nine, and the final test had only seven errors.
There is no doubt that VoicePro can learn, but its discrete dictation method is far too slow. Also, VoicePro's correction routine requires that you use the mouse too much (unless you correct every error on the fly). This program is poorly suited for a business environment, and even home users will find its slowness frustrating. List price: $200. Alpha Software, 800/451-1018, www.alphasoftware.com.
Something to Talk About
I gave one of my speech recognition programs to my UPS delivery man. He was eager to try it, since he loves computers but hates to type. About a week later, I asked him how it went. Frustrated with the training and the high error rate, he had erased the program from his system. "I guess it's back to Mavis Beacon Teaches Typing," he said.
I sympathized. Whether I used the $150 ViaVoice Gold or the $695 NaturallySpeaking, there were far too many errors, and those errors often involved simple words. Because the engines are different, their recognition errors differed, too. Words that baffled one program sailed right through on another.
Unfortunately, the companies' ambitious promises for these programs didn't pan out. IBM says ViaVoice Gold is great for business, home, or school. Dragon's NaturallySpeaking Deluxe purportedly offers "advanced speech recognition." And Alpha claims that, with Voice Pro, "you can start dictating immediately, and out-of-the-box accuracy can be 90 percent or higher." And they all imply there are no restrictions on the type of material you can throw at these programs. None of these claims rang true after my tests.
The ultimate problem is that these dictation programs require far too much training to work accurately. It will take several more generations before such software becomes a useful business or productivity tool. So unless you have a lot of time on your hands or special physical needs, I wouldn't bother with a dictation product yet.
Obey My Commands ... Please?
Speech recognition programs that translate what you say into system commands have a less daunting task because they have fewer words to learn. But they don't work straight out of the box. You'll have to spend time training them and correcting errors.
Kurzweil Voice Commands for Microsoft Word
This is a niche program for people who use Microsoft Word 7.0 or Word 97 for Windows 95. Its purpose is to let you format and manipulate text within Word using your voice. But-and this is a big but-you can't dictate new material. All you can do is manipulate what you've already written.
Voice Commands uses continuous speech and requires little training. For once, a vendor's claims were justified. The program recognizes key words in your speech, so you can speak naturally and request the same action in multiple ways. For instance, you can say, "Increase the font size by three points" or "Make the size of the font bigger by three." If you're stuck, a good help system will guide you.
The program knows most major Word commands. You can change font size and type style, adjust spacing and alignment, add pagination, and create columns, but you can't teach the program Word macros or new commands. While the program knows how to change the font to Arial or Times New Roman, you can't teach it to change to a font that doesn't come with Windows, such as Palatino. The best it can do is open the Format menu. Then you choose the new font with your mouse.
If the program has difficulty understanding a word, you can train it on the fly. There is no loathsome training session. If you have a problem, toll-free technical support is available.
When I first ran the program, it wouldn't work at all. The support person was stymied until he discovered that I had Internet Explorer 4.0 on my system. The two programs conflict, but a patch at Alpha's Web site fixed the problem.
The big question is who would want this program? It only works with existing documents, and all it can do is format them. Formatting a document is not that big a deal. But if you find it burdensome, Voice Commands will certainly do the job for you. Street price: $60. Alpha Software, 800/380-1234, www.alphasoftware.com.
Listen for Windows
Voice Commands is a one-trick pony. Listen for Windows is much more ambitious. It contains 20,000 voice commands for Windows 95, its applets, and applications such as Word, Works, Netscape Navigator, Quicken, WordPerfect, and Quattro. Since there are so many commands, you will have to go through a long and boring training session: "Font size 22. Font size 23. Font size 24."
On the plus side, the program is context sensitive, displaying a menu of relevant commands for the application you've loaded. You can add commands to that menu. You can also create customized menus of commands and develop complex voice-activated macros.
Some of the command sets seem silly. Why would you want to use voice commands for a paint program? Also, far too many of the command sets are for old versions of software such as WordPerfect 6.1 and Quattro Pro 5. You'd have to develop customized menus for newer applications.
Listen does work pretty well. The computer responded to my requests, but it took time for me to learn a whole new command set for each application, and I had to constantly check the command list to see if the option I wanted was available.
Setting up new commands is doable, but it may require a fair amount of editing, training, and checking. Remember, you can't enter data; you can only execute program commands. For instance, Navigator commands include scrolling, going to your home location, or requesting a Net search. You can't go to a specific URL without setting up a macro. If you want to surf, it's back to your keyboard and mouse.
With Listen, you won't throw away your mouse and keyboard, but you can use them less often. Personally, I think this program's training process makes it more trouble than it's worth, but if the idea of commanding your PC with the sound of your own voice is appealing, it's the most versatile command package I've tried. The company also sells a separate Windows 3.1 version. Note that this is the only program I tested that does not include a microphone. The company recommends a Bell Labs product on its Web site. Street price: $50. Verbex, 800/275-8729, www.verbex.com/lfwspec.htm.
VoicE-Mail
Email would seem to be a natural fit for speech recognition products. If it were done right, you'd just tell the computer to get your mail and read it to you. Then you could dictate replies or send new email. VoicE-Mail is designed to automate the email process, but it does not succeed.
The first problem is compatibility or lack thereof. VoicE-Mail isn't an add-on to any major email program but a stand-alone application-and a poor one at that. It offers no email filters or HTML support for inserting hot links into your messages. And for some reason, the program's window for reading and composing letters is narrow.
Several features that you'd expect in a voice-activated program are missing. You can't speak your email recipient's name; you must click names from the address book with your trusty mouse. And the program can't read your mail to you, a capability that's been around for years.
The program uses IBM's older discrete-speech recognition engine, which means you have to dictate slowly, word by word, and make corrections with the mouse. The program claims 90 percent accuracy out of the box; I obtained nowhere near that. The ultimate result: You'll spend a lot of time training the software through an obsolete technology.
I was able to ask the program to get my mail, but sending it was another story. It tried to send and then stopped, even though all my settings were correct. A call to technical support did not help, and the technician wanted my email password, which I would not give. He then promised to investigate the problem and call back the next business day. I never heard from him. But even if that problem had been solved, this is a program to avoid. List price: $50. Wizzard Software, 412/621-0902, www.wizzardsoftware.com.
All Talk and No Action
Right now, despite vendor claims, speech recognition programs are not instant productivity tools. They're a lot of work, even after initial training, because you have to continually retrain the software to improve its accuracy.
This software will become truly useful only when you can operate it without training or profiles. The technology is getting closer. Many of the telephony projects I found require no training, although they, too, are far from perfect.
If you like to tinker and the idea of dictation or command-activation software appeals to you, by all means go for it. Voice Commands, with its specialized command set and narrow focus, does an especially good job for a narrow audience. But if you are looking for a real productivity tool, wait a few years.
© 1998 Saul Feldman. All rights reserved.
A frequent contributor to Computer Currents, Saul Feldman (sdfeldman@earthlink.net) talks to his PC, regardless of whether a microphone is attached.
Speech Recognition Report Card
None of the speech recognition products I tested would go to the head of the class, but some were better than others in the quality of training, accuracy of recognition, and ease of error correction. The dictation products, being more complex, were graded more rigorously for accuracy than the command products were. Each program's final error count for the business sample made up 80 percent of its grade. The final error count for the literary sample made up 20 percent of its grade.
DICTATIONCOMMAND
Dragon NaturallySpeakingIBM ViaVoice GoldKurzweil VoicePro
Voice CommandsVerbex Listen for WindowsWizzard VoicE-mail
Price$695$149$200$60$50$50
TrainingBB-CADB-
AccuracyC+B+C-AAC
Error CorrectionA-CB-AA-C
Training Helps, But It's Not Always Enough
None of the three dictation packages I tested made it through the samples error-free, even after four tries. The errors that cropped up in the final samples shown below illustrate the inaccuracies you may encounter with any speech recognition product.
Business sample (from the 1996 Annual Report of the U.S. Surgical Corporation)
The original text:
The health care industry continues to change and polarize.
Cost containment, efficiency of operation, and shorter hospital stays are the forces that drive hospital administration today. In this restrictive environment, United States Surgical Corporation improved its financial position significantly, increased sales, and expanded its marketing programs to help customers achieve their objectives. Major technological breakthroughs and product innovations in new surgical specialties provided revenue during 1996 and a solid platform for further growth.
Kurzweil VoicePro's final try (nine errors):
The health care industry continues to change and polarize.
Cost containment, efficiency of operation, and shorter hospital stays are the forces that Dr. hospital administration today. You this restrictive environment, United States Surgical Corp. improved its financial position significantly, increased sales, and expanded marketing programs to help customers achieved their objectives. Major technological reasons and product innovations he knew surgical [missing word] provided revenue during 1996 and a solid .4 for further growth.
Literary sample (from "Lady Chatterley's Lover' by D. H. Lawrence)
The original text:
Ours is essentially a tragic age, so we refuse to take it tragically. The cataclysm has happened, we are among the ruins, we start to build up new little habits, to have new little hopes. It is rather hard work: there is now no smooth road to the future: but we go round, or scramble over the obstacles. We've got to live, no matter how many skies have fallen.
This was more or less Constance Chatterley's position. The war had brought the roof down over her head. And she had realized that one must live and learn.
She married Clifford Chatterley early in 1917, when he was home for a month on leave. They had a month's honeymoon. Then he went back to Flanders: to be shipped over to England again six months later, more or less in bits. Constance, his wife, was then 23 years old, and he was 29.
Dragon NaturallySpeaking's final try (20 errors):
Hours is essentially a tragic age, so we've refuse to take it tragically. The cataclysm has happened, we're among the rulings, we started buildup knew little habits, to have knew little hopes. It is rather hard work: there is now know smooth road into the future: only go around, or scramble over the obstacles. We've got to live, no matter how he skies to fallen.
This was more [missing word] less Constants Chatterley's position. The war had brought the roof down overhead. This you realize that one was living learn.
She married Clifford Chatterley [missing word] in 1917, when he was home for among phone we've. Data months honeymoon. Anyone back to Flanders: to be shipped over to England again six months later, more [missing word] less in bits. Constants, his wife, was then 23 years old, and he was 29.
CyberTranscriber Speaks Your Language
After my disappointment with the PC-based dictation products I tested, I dictated the same two documents again. I didn't use a fancy noise-canceling microphone. I didn't train the software to recognize how I pronounced any word. After one session, the result was 100 percent accuracy. There was no teaching it how to spell "Chatterley" or being frustrated because the program couldn't distinguish between "hour" and "our." And I performed this test with one of the worst microphones around: my cheap telephone. I had my wife read the same material. Her results were just as good.
What were we using? CyberTranscriber (www.cybertranscriber.com), a speech recognition program that works over the Internet. Simply dial a toll-free number and read your document over the telephone. You can also dictate using a microphone connected to your PC or a Voice-It tape recorder. Then save the session as a WAV file and email it to CyberTranscriber. In about six hours (depending on the speed of Internet mail and your place in the queue), you'll receive the transcribed document via email in RTF format. Word for Windows users can download software that plays back the dictated file word for word.
The system was originally developed for the British military working under battlefield conditions. The processing is done by networked Pentiums in England. Prices aren't cheap: You pay $29.95 to join the service and $3.50 for one double-spaced page. (There's a $9.95 monthly minimum.) But you can try the service free with no obligation; the company doesn't even ask for your credit card number.
Dictating over the phone can be a problem. You can't really correct errors, but you can pause and restate something you misspoke and correct it in the file you get back. CyberTranscriber isn't fast, but its accuracy runs rings around the three PC-based dictation products I tested.
-SF
Speech Recognition's State of the Art
Speech recognition is one of the hottest areas of high technology. Forbes magazine estimates that in two years, speech recognition will be a $2 billion industry. Most of this $2 billion will be spent on applications using telephones or controlling microprocessors in everything from cars to medical instruments.
Some of the most exciting work in this field is found in telephony. United Parcel Service uses this technology to let customers track packages. Just call 800/742-5877, tell the computer your tracking number, and in seconds it will give you the package's shipping status. UPS estimates that this technology reduces the average tracking call from three minutes to 90 seconds.
You can also dial 888/729-3366, say the name of a stock, and get its latest price from a computer that listens to your request and gives you an up-to-date quote. This free experimental service is run by Applied Language Technologies (www.altech.com). Granted, it isn't perfect. The computer couldn't tell the difference between "Shurgard" and "Sunguard" or "Harland" and "Harley-Davidson." But when I asked for the price of a group of stocks by giving the computer their symbols, it didn't miss a beat. The company is also demonstrating the fully automated Music Mall. Call 617/428-6989 and tell the computer the name of your favorite recording artist. It will read you a list of music CDs by that artist. Choose one; you'll hear a sample until you say, "enough." Then you can purchase it or ask to hear another performer.
If you call the Computer Currents switchboard after hours, you have to spell the name of the person you want to talk to on your phone's keypad, get the extension, and enter it. With PureSpeech's (www.speech.com) Pentium-powered PureRequest system, just say the person's name, and the computer connects you to her. You could also use this system to name a product, service, or department and get routed to the right location.
Intellivoice (www.intellivoice.com) is offering a voice recognition solution for truly hands-free car phones. With the TalkDial system, simply recite the number or the name of the party you're calling
to be connected.
Unisys is developing the Natural Language Assistant. The company claims its system will allow people to ask questions in normal conversational style. The system will adapt to the caller rather than make the caller adapt to the system. Unisys' first such product is the Mortgage Assistant. You ask and answer questions about mortgages, and the computer gives you relevant data. The Mortgage Assistant will maintain a record of each call and transfer you to a human being upon requestÑor if it hears certain trigger phrases, such as "jumbo mortgage." Unisys demonstrations this system in RealAudio at its Web site (www.marketplace.unisys.com/nlu/homenla.html).
Speech recognition is also a hot topic at universities. Good reports of current research are found at sites maintained by schools such as Carnegie Mellon University (www.speech.cs.cmu.edu/speech), MIT (www.sls.lcs.mit.edu), U.C. Berkeley (www.icsi.berkeley.edu/real/speech.html), and Mississippi State University (www.isip.msstate.edu).
-SF