Automatic Speech Recognition(ASR) or Automatic Voice Recognition( AVR) is the technical adaptation of speech analysis in the form of automatic interpretation of human speech. Automatic Speech Recognition includes the recognition of speech, keywords and sentences and their meaning as well as the identification of a speaker for security-relevant functions such as access authorization or authorization.

Automatic speech recognition systems are characterized by the number of words stored, by the ability to understand different speakers, by the possibility of learning words, and by accuracy.

Human pronunciation is not exact and depends on the speed of speech, individual pronunciation and the flow of speech, which does not have a clear separation between sound, syllable and word boundaries. In automatic speech recognition, the computer must recognize the speech and pronunciation to some extent. Corresponding systems work with Natural Language Processing( NLP), which deals with natural language processing, emphasis, understanding and semantics of words and sentences, forming an interaction between computer and natural language, between human and machine. Such systems work with recognition of whole words or single phonemes. In the whole-word method, whole words are spoken aloud and stored. When speech is input, the computer then searches the memory for a corresponding word pattern. The situation is different with phonemic speech recognition, which is much more precise and divides each word into individual phonemes.

A typical use scenario for automatic speech recognition is communication with a digital voice assistant. Through automatic speech recognition, the voice assistant recognizes that the user wants to communicate with it: Hello Alexa.

