Skip to content

Tweaking Text To Speech (TTS)

We're evaluating Text To Speech (TTS) as a way of quickly and economically adding 'voice' to courses.  Mileage definitely varies on the quality of "natural voices".  If you're not familiar with it, you can buy voices based on gender and language (even "american" versus "UK" English).  Right now we're evaluating NeoSpeech's VoiceText, NextUp's TextAloud, and hopefully Loquendo's TTS software. 
Anyone familiar with any of these TTS apps?  Any suggestions on sites to find information on comparing the different packages or tweaking the text to get the best inflections and pronunciations?
Here are the three we're evaluating at this time:
NextUp's TextAloud seems like the most practical and inexpensive option. It's currently $29.95for the basic software, $25 for two ATT natural voices, so it's $54.95 total for a single license.  It had some trouble with acronyms, but I could create and store custom pronunciations.  It has a 30 day trail, but not with the ATT Natural Voices, so it's very robotic sounding.  I'm considering buying it just to see how the ATT voices sound.
NeoSpeech's VoiceText seems very simple, with little ability to tweak, and it's pricey, even for us, as internal users without any resell value added to the licenses it's 1500/annual for three licenses (Pricing based on intended usage as well as the number of licenses needed).  The interface is simpler than the less expensive TextAloud, and doesn't seem as intuitive. They have a 30 day trial (a bit convoluted to install).  I was able to create 33 fairly short WAV files, and could tweak the text to decent sounding audio.  Not sold on it though, and certainly not on that price.  Waiting to find out if the "server" component which is where pronunciation is edited, is an additional cost. My content contains chemical abbreviations and terms and lots of acronyms so custom pronunciation is a must.
Loquendo's TTS demo on their website was by far had the most superior pronunciation and inflection, and even has emotional emphasizers and human related sound effects (coughing, etc.).  While I could tell it was an artificial voice, it was much more natural than the others.  It has more languages supported, as well, which is a definite plus for me.  However, I'm still waiting on pricing details and whether or not they have an evaluation version.  The company is in Italy so it's taking some time to get responses. 


Testers wanted for text to speech (TTS) mod

I would really appreciate a few people taking time out to help me test a basic text to speech facility I've written for TGE. I'd like some feedback about installation and other problems before I submit it as a resource. It  was developed on a Win2K/VC6 system and should be compatible with the wndows backup software others, but you never know.It should be easy enough to install, provided you are set up to compile the SDK. TTS makes no changes to the standard TGE files and only one file needs to be compiled and linked with TGE. Linux and Mac users must also compile the speech engine but that is developed on linux and has the necessary make files.


Hi, I would be interested in

Hi, I would be interested in testing your Text To Speech Software. I am currently a student at Nanyang Technological University, and own licenses to both TGE and TGEA, although i am working mainly on TGEA. My current project involves using TGEA to create Multiuser Environments for Educational Purposes. I would like to check is this TTS engine based on the flite resource in the Garage Games website ? As that has produced unsatisfactory results for us.


Exactly. As an in-house

Exactly. As an in-house elearning group, our budget doesn't allow for high end production, and a lot of the material is needed with quick turnaround.I just tested the link you sent, and the voice inflection is fairly good, but sounds very tinny.  I'm waiting on the voices from the TextAloud purchase to test those, but I'm hoping they compare with VoiceText, as the tool seems more flexible, and at a fraction of the price.Still no details on an eval version of loquendo. If you're curious, click on the link for Loquendo to test some text. I'm hoping it's cheaper than NeoSpeech's VoiceText, because it sounds much better.And I'll keep everyone posted. This could be an inexpensive way for courses that might not otherwise be budgeted for voice to include it. 


IBM ViaVoice

Just to make sure I am on the same page, you are trying to use a TTS tool to “convert” text to a natural sounding speech recording.  You would then take this electronic audio version and synchronize it with your learning endeavor.  So essentially you are saving the cost and time of using voice talent to record the audio.  Correct?  If so, you are way ahead of the curve.
I haven’t looked at TTS in quite a few years.  Back in the dark ages (2004), IBM had the best product (ViaVoice) for what you are considering.  Here is a link to a real-time demo: 
You might try putting in some of your more challenging phrases and terms.  
My only concern with IBM’s product is cost.  In the past, it was… expensive.  Also support could be an issue.  I would definitely look for forums or secondary sources of support in addition to formal IBM support contracts.
Please post back what you have learned!!!  You have a very interesting concept and we would LOVE to hear how you resolved the technical pronunciation issue.


Pricing and Sound Quality

In terms of pricing, Loquendo is apparently very premium. We have not been able to get a quote or a demo, but we also indicated we are not a reseller, so it's possible they just don't think they're worth it.  An eLearning shop we work with that is a reseller was quoted 12K for Loquendos TTS product, as well as some yearly fees.
Apparently we may be able to get a better deal on NeoSpeech's VoiceText, too, if we negotiate with them. Which is good, because their interface is extremely basic and for 1500 annually, I expect some bells and whistles, and not have to dig into the application files to make simple changes to effect a single wav file. has a user forum for products they develop as well as resell, so that seems to be a good source on TTS, and voices.  Their TextAloud is extremely cheap in comparison (under $60 for one license and two natural voices). 
I did find out some tips on improving pronunciation and inflection for TTS, a lot of which involves (creative) use of punctuation.  In a week or two, perhaps I can put together a blog post on this for ELC; I do plan on one for my personal blog. 
Using TTS could be a great economic way of adding value to courses, from being able to quickly change content for content still being developed, to speeding up production time. 


Which product did you go with?

Hey Jenn, I've just started using a TTS with Character Builder to make Flash objects to include in projects.  I'm interested in hearing about how you decided to handle and how did learners feel about the technology.

glqxz9283 sfy39587p01