CNN networks lend themselves to machine learning and are used in artificial intelligence in image recognition, audio processing and speech recognition. Their operation is based on human reasoning. They consist of several layers: Convolutional layer, Pooling layer and fully meshed layer. The convolutional and pooling layers can exist several times in a row and form meshed subnetworks.
The convolutional layer has various filtering functions for the input data. For example, the Convolutional layer can extract the data input based on certain features. For example, for graphics, such a filter function can filter pixel size, specific colors, or geometric features. Feature maps are formed that are different by weighting individual features. In speech recognition, feature extraction can be, for example, phonemes, pitches, speech rhythm, or other characteristic speech features.
In the pooling layer, the number of features and their specific characteristics is reduced. Furthermore, the data set is relieved of irrelevant data to increase the processing speed. The third and last layer, the fully meshed layer, is linked to all features of the two previously mentioned layer levels and to the output features.