Text to Speech (TTS) is speech processing where text is converted into speech. These are text-driven speech synthesis systems that generate a synthetic speech from a text that is intended to sound understandable and natural.
TTS systems analyze the text and use it to assemble the smallest meaning-giving speech units, the phonemes, into spoken words and sentences. For this purpose, the text is analyzed linguistically, broken down into the smallest text units, so-called graphemes, which are then converted into phonemes. After conversion, the synthetic speech is given a rhythm and stresses.
Text-to-speech voice processing is supported by a voice-user interface(VUI). Text-to-Speech can distinguish between different types of speakers with different voice tones. The spectrum ranges from children's voices to women's voices to men's voices, and can accommodate different voice pitches and moods.