Deep Learning - Convolutional Neural network
Convolutional Neural Network or CNN is a deep learning artificial Neural Network that is commonly used for Image analysis, facial detection, machine translation, or even text summarisation and many more.
What Convolutional Neural Network or CNN is?
Well, this is an artificial Neural network or a charm of deep learning this has some kind of specialization for being able to pick out or detect patterns and make sense of them. This is what makes it so useful for image analysis, Facial Recognition, or other things like that.
What makes it different from a standard multilayer perception or MLP?
CNN has some hidden (convolutional) layers these layers are what precisely makes the CNN stand out in the list of others. CNN does have some other non-convolutional layers as well but its’ main component or we can say that its ‘specialization’ is the convolutional layers that make it so different to get a better performance or better image analysis or pattern detection. These are the base of CNN.
What are the Convolutional layers? How do they work? What is Convolutional Operation?
Well, just like other layers, Convolutional layers Receives the input, transform it in some way, and then turn it into some kind of output, this operation is called the convolutional operation. But unlike other layers, convolutional layers have a specified number of filters, (each one of them) to detect the shapes and patterns in an image.
In other words, Convolutional neural networks consist of input and output layers and several hidden layers. A CNN's hidden layer usually consists of a series of convolutional layers that combine with multiplication or other dot product. The activation function is usually the RELU layer, followed by additional convolutions such as the pooling layer, fully connected layer, and normalization layer. It is called a hidden layer because the inputs and outputs are masked by the activation function and the final convolution. The whole working process of CNN is defined or explained Below
1. Convolution layer:
When programming a CNN, the input is a tensor with shape (number of images) x (image height) x (image width) x (image depth). Then, after passing through the convolution layer, the image is abstracted into a feature map of shape (number of images) x (feature map height) x (feature map width) x (feature map channel). The convolutional layer within the neural network should have the following properties:
- The convolution kernel is defined by width and height (second parameter).
- Several input channels and output channels (second parameter).
The depth of the convolution filter (input channel) should be equal to the number of channels in the input function map (depth).
The convolution layer convolves the input and passes the result to the next layer. This is similar to the response of neurons in the visual cortex to specific stimuli. Each convolutional neuron processes data only for its receptive field. A fully connected feedforward neural network can be used to learn features and classify data, but applying this architecture to images is not practical. Due to the very large input size associated with the image, where each pixel is an associated variable, a very large number of neurons are required even in shallow (as opposed to depth) architectures. For example, a fully connected layer for a (small) image of size 100 x 100 will have 10,000 weights for each neuron in the second layer. The convolution operation provides a solution to this problem as it reduces the number of available parameters, allowing you to deepen your network with fewer parameters. For example, a tiling region of size 5 x 5, each with the same shared weight, regardless of the image size, only needs 25 learnable parameters. Using normalized weights for fewer parameters can avoid gradient loss and blast gradient problems that appear during backpropagation in traditional neural networks.
2. Pooling Layer:
Convolutional networks can include local or global pooling layers to simplify basic calculations. The pooling layer reduces the dimension of the data by combining the output of a cluster of neurons in one layer into a single neuron in the next layer. Local pooling typically combines 2 x 2 small clusters. Global pooling works on all neurons in the convolutional layer. Also, pooling can calculate the maximum or average. Maximum pooling uses the maximum of each neuron cluster in the previous layer. Average pooling uses the average value of each cluster of neurons in the previous layer.
3. Fully Connected Layer:
A fully connected layer connects all neurons in one layer to all neurons in another layer. In principle, it is the same as a conventional multilayer perceptron neural network (MLP). The flattened matrix classifies the image through fully connected layers.
Receptive Fields:
In a neural network, each neuron receives input at some location in the previous layer. In a fully connected layer, each neuron receives input from all elements of the previous layer. In the convolutional layer, neurons receive input only from a limited subregion of the previous layer. Usually, the subareas are square (e.g. size 5 x 5). The input area of the neuron is called the receptive field. So, in a fully connected layer, the acceptance field is the entire previous layer. In the convolutional layer, the receiving area is smaller than the entire previous layer. In the field of acceptance, the sub-area of the original input image is getting bigger and bigger as the network architecture deepens. This is because it repeatedly applies convolution that takes into account the value of a specific pixel and the value of some surrounding pixels.
Weights of Neurons:
Each neuron in the neural network computes an output value by applying a specific function to the input value coming from the accept field of the previous layer. The function applied to the input values is determined by the weight vector and bias (usually real). In neural networks, training takes place by iteratively adjusting these biases and weights.
The vector of weights and biases is called a filter and represents a specific function of the input (for example, a specific shape). A feature of CNN is that many neurons can share the same filter. This reduces memory usage because a single bias and single weight vector are used by all accepting fields that share that filter, as opposed to each accepting field that has its own bias and vector weight.
Now, Let’s talk about the basic Mathematical formula for CNN
The neural network can be divided into two kinds, the biological neural network is one of them, and an artificial neural network is another kind. Here mainly introduces an artificial neural network. An artificial neural network is a data model that processes information and is similar in structure to the synaptic connections in the brain. A neural network is composed of many neurons; the output of the previous neuron can be used as the input of the latter neuron. The corresponding formula is as follows
This unit is also called the Logistic regression model. When many neurons are linked together, and when they were layered, the structure can now be called a neural network model. Figure 1 shows a neural network with hidden layers.
In this neural network, X1, X2, X3 are the input of the neural network. +1 is the offset node, also known as the intercept term. The leftmost column of this neural network model is the input layer of the neural network, the rightmost column of which is the output layer of the neural network. The middle layer of the network model is a hidden layer, which is fully connected between the input layer and the output layer. The values of all the nodes in the network model cannot be seen in the training sample set. By observing this neural network model, we can see that the model contains a total of 3 input units, 3 hidden units, and 1 output unit.
Now, use n1 to represent the number of layers in the neural network, and the number of layers in this neural network is 3. Now mark each layer, the first layer can be expressed by Ll, then the output layer of the neural network L1, its output layer is Lnl, in this neural network, the following parameters exist:
Wij(1) are the connection parameter between the jth cell of layer 1 and i th cell of layer l+1, and bi1 is the offset of i th cell of layer 1+1. In this neural network model, set ai(1) to represent the output value of the first few cells in this layer. Let l denote this layer and i the first few cells in this layer.
Given that the set of parameters W and b have been given, we can use the formula hw,b(x) to calculate the output of this neural network. The following formulas are calculation steps:
The calculation of forwarding propagation is as shown in equation (3). Neural network training methods and Logistic regression model is similar, but due to the multi-layered neural network, but also the need for gradient descent + chain derivation rule.
Too Long to Read (-.-) ZZZ? I kept the first half of the article to be the general briefing and next half into the mathematical calculation.
(Note: Few limitations with LinkedIn article editor in adding formulas)