Robot sounds in text

Industry-Leading TTS Voices

At ReadSpeaker, we have a passion for developing high-quality TTS voices. In fact, expert third party industry observers rate the US English ReadSpeaker TTS voice as being the most accurate on the market. The enthusiastic feedback we receive from our customers confirms that we deliver the very best TTS solutions for successful online, offline, embedded, and server-based applications around the world. Our commitment to providing outstanding TTS solutions is made possible by our uncompromising production process, designed to guarantee the quality levels that have earned ReadSpeaker TTS the trust of customers from across countries and markets.

Table of Contents Show

Industry-Leading TTS Voices
How Our TTS Voices Are Made
Neural Voices

How Our TTS Voices Are Made

To create our speech personas, we select and record professional voice talents. Once a voice talent has been selected, she or he works with our voice development team for several days or weeks, depending on the type of voice, or the voice technology, we want to use. A diverse script is used for the recordings, designed to contain all the sound patterns of the language in development. The team closely monitors the recording process to check for consistency in pronunciation, accentuation, and style.

USS Voices

Until about 2019, all our high quality voices were made using a technology called Unit Selection Synthesis (USS). These voices are still used in most of our SaaS solutions, such as webReader and docReader. To create a USS voice, the audio resulting from recording the voice talent is segmented into smaller units, such as sentences, words, syllables, phonemes (speech sounds such as individual vowel and consonant sounds).
A rich mark-up is added to this database of speech units, which is to say information is added to the units about the stress (did the unit come from a stressed or from an unstressed syllable?), the position in the word or sentence, etc.
The technical team works its magic on this process using a powerful combination of Artificial Intelligence and machine learning technologies on big amounts of data to optimize annotations. Our state-of-the-art methodologies are augmented by the linguistic expertise of our team. The resulting database is used by the ReadSpeaker TTS engine to convert text into speech spoken by the TTS voice: segments (units) of speech are selected and glued together in such a way that high-quality synthetic speech is produced.
This is how a new ReadSpeaker TTS voice persona is born. However, the process doesnt end there. One of ReadSpeakers unique characteristics is our ongoing improvement process. Through a system of high-quality feedback and a thorough Quality Assurance process by mother-tongue experts, imperfections are continuously corrected.

Neural Voices

In parallel, ReadSpeaker creates so-called neural voices, using techniques based on deep learning AI technology. This revolutionary method involves mapping linguistic properties to acoustic features using Deep Neural Networks (DNNs). An iterative learning process minimises objectively measurable differences between the predicted acoustic features and the observed acoustic features in the training set. One of the advantages of the new DNN TTS method is that the acoustic database can be much smaller than for a USS voice. Only a few hours of recorded speech are needed for a neural voice, compared to at least three times as many for a good quality USS voice. Also, the resulting speech is generally smoother and even more human-like. This makes developing new, smart ReadSpeaker TTS voices with even more lifelike, expressive speech and customizable intonation faster than ever.

If your strategy is to offer an exclusive customer experience and you want to take your brand appeal to a new level, one of the most powerful ways to differentiate yourself is by using a custom voice to represent you. A custom voice sets your brand apart and creates a powerful bond with your customers across your various communication touchpoints. If a preferred celebrity or other talent reflects your brand best and you want to be able to use their voice anytime you need it, ReadSpeaker can create a custom TTS voice powered by our leading-edge speech engine, to give your brand instant recognition in the voice user interface.