Introduction to Convolutional Neural Networks (CNNs).pptx

MACHINE
LEARNING –
CONVOLUTIONAL
NEURAL NETWORK

Introduction to Computer Vision
 Computer vision is concerned with the automatic extraction, analysis and
understanding of useful information from a single image or a sequence of
images.
- The British Machine Vision Association and Society for Pattern Recognition (BMVA)
(or)
 It is an interdisciplinary field that deals with how computers can be made to
gain high-level understanding from digital images or videos.
- Wikipedia
2

What is CNN(Convolution Neural Network)
3
● It is a class of deep learning.
● Convolutional neural network (ConvNet’s or CNNs) is one of the main
categories to do images recognition, images classifications, objects
detections, recognition faces etc.,
● It is similar to the basic neural network. CNN also have learnable
parameter like neural network i.e., weights, biases etc.
● CNN is heavily used in computer vision
● There 3 basic components to define CNN
○ The Convolution Layer
○ The Pooling Layer
○ The Output Layer (or) Fully Connected Layer

Basic Structure of
CNN
• Input Layer: Accepts input images as
pixel data.
• Convolutional Layer: Applies filters
to extract features.
• ReLU Layer: Introduces non-linearity
to the network.
• Pooling Layer: Reduces spatial
dimensions of feature maps.
• Fully Connected Layer: Final layer for
classification.

Convolutional Layer
• Filters/Kernels:
Detect specific
features in input
images.
• Stride:
Controls the
movement of
filters across the
input.
• Padding: Adds
pixels around
the input to
maintain
dimensions.
• Output:
Produces
feature maps
indicating
detected
features.

Convolution Layer
7
Images source: Analytics
Vidhya

Padding in CNN
• Zero Padding: Adds zeros
around the input image to
preserve dimensions.
• Valid Padding: No padding,
reduces the size of output
feature maps.
• Role: Helps preserve edge
information during
convolution.

9
The concept of stride :
● The weight of a matrix moves 1 pixel at a time is called as stride 1 (as we did in above
case).
What if we increase the stride value?

10
• As we can see in above image the increase in the stride
value decreases the size of the image (which may
cause in losing the features of the image).
• Padding the input image across it solves our problem,
we add more than one layer of zeros around the image
in case of higher stride values.

11
• when the input of 6x6 is padded around with zeros we get the output with same
dimensions of 6x6 this is known as ‘Same Padding’.
● The middle 4x4 pixel remains the same, here we have retained the more information from
borders and also preserved the size of image.

Pooling Layer
• Purpose: Reduces dimensionality
and computation in the network.
• Max Pooling: Selects the maximum
value from each pooling region.
• Average Pooling: Takes the average
value from each pooling region.
• Impact: Retains important features
while reducing overfitting.

Basic Mathematics of CNN (B&W
Image)
• Convolution: Applies a filter matrix
across the image to detect features.
• Example: Sliding a 3x3 filter over a
grayscale image, producing a feature
map.
• ReLU: Applies non-linearity after
convolution.
• Pooling: Reduces the size of the
resulting feature map.

Basic Mathematics of CNN (Colored
Image)
• Convolution: Applies the same filter across each
RGB channel.
• Result: Produces a combined feature map from
all channels.
• Example: Sliding a filter across an RGB image and
summing up feature maps.
• Pooling: Reduces the size of the resulting feature
map while preserving important information.

Fully Connected Layer
• Purpose: Flattens the output and connects to a fully
connected layer.
• Function: Combines features for final classification.
• Uses: Softmax or sigmoid activation functions for output.

Types of CNN
● Based on the problems, we have the different CNN’s which are used in
computer vision.
● The five major computer vision techniques which can be addressed using
CNN.
■ Image Classification
■ Object Detection
■ Object Tracking
■ Semantic Segmentation
■ Instance Segmentation
16

Types of CNN
Image Classification:
● In an image classification we can use the traditional CNN models or there also
many architectures designed by developers to decrease the error rate and
increasing the trainable parameters.
■ LeNet (1998)
■ AlexNet (2012)
■ ZFNet (2013)
■ GoogLeNet19 (2014)
■ VGGNet 16 (2014)
17

LeNet-5 Architecture
• Designed for handwritten digit
recognition (MNIST dataset).
• Structure: 2 convolutional
layers, 2 subsampling layers, 2
fully connected layers.
• Key Feature: Simple and
efficient, early CNN model.

AlexNet Architecture
• Winner of the ImageNet
competition in 2012.
• Structure: 5 convolutional layers, 3
fully connected layers.
• Features: Uses ReLU, dropout, and
data augmentation.
• Impact: Revolutionized deep
learning and computer vision.

VGG-16 Architecture
• Uses 16 layers (13
convolutional, 3 fully connected).
• Features: Smaller filters (3x3)
with deeper networks.
• Strength: Achieves high
accuracy with a simple structure.

ResNet Architecture
• Introduces Residual Learning to
combat vanishing gradients.
• Structure: Skip connections or
shortcuts between layers.
• Impact: Allows very deep
networks (e.g., ResNet-50,
ResNet-101).

Inception (GoogLeNet)
Architecture
• Introduces Inception modules:
parallel convolutional filters.
• Structure: Multiple filter sizes (1x1,
3x3, 5x5) in parallel.
• Impact: Efficient and scalable for
large-scale image recognition.

Transfer Learning
• Concept: Uses a pre-trained model on a new but related
task.
• Benefits: Speeds up training, requires less data, and
improves performance.
• Example: Using a pre-trained model like ResNet for a new
image classification task.

Object Localization
• Purpose: Identifies the location of objects within an image.
• Methods: Bounding box regression, Region Proposal
Networks (RPNs).
• Applications: Object detection, image segmentation.

Landmark Detection
• Definition: Detects specific key
points or landmarks within an image.
• Applications: Facial recognition,
medical imaging (e.g., key anatomical
points).
• Methods: CNNs used to detect and
regress the position of landmarks.

Applications of Computer Vision
● Computer vision, an AI technology that allows computers to understand
and label images, is now used in convenience stores, driverless car
testing, daily medical diagnostics, and in monitoring the health of crops
and livestock.
● Different use cases found in the computer vision as follows
■ Retail and Retail Security
■ Automotive
■ Healthcare
■ Banking
■ Agriculture 26

Conclusion
• CNNs have revolutionized computer vision tasks.
• Architectures like LeNet, AlexNet, VGG, ResNet, and
Inception paved the way for modern image processing.
• Transfer learning, object localization, and landmark
detection expand the versatility of CNNs.

Introduction to Convolutional Neural Networks (CNNs).pptx

Recommended

More Related Content

Similar to Introduction to Convolutional Neural Networks (CNNs).pptx (20)

Recently uploaded (20)

Introduction to Convolutional Neural Networks (CNNs).pptx

Editor's Notes