SlideShare a Scribd company logo
You Only Look Once (YOLO):
Unified Real-Time Object Detection
Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi
University of Washington, Allen Institute for AI, Facebook AI Research
~ Ashish
Previously : Object Detection by Classifiers
● DPM (Deformable Parts Model)
○ Sliding window → classifier (evenly spaced locations)
● R-CNN
○ Region proposal --> potential BB
○ Run classifiers on BB
○ Post processing (refinement, eliminate, rescore)
● YOLO
○ Resize image, run convolutional network, non-max suppression
YOLO : Object Detection as Regression Problem
● output: Bounding box coordinates and Class Probabilities
● Single Neural Network
● Benefits:
○ Extremely Fast (one NN + 45 frames per sec), twice more mAP.
○ Global Reasoning (knows context, less background errors)
○ Generalizable Representations (train natural images, test art-work, applicable new domain)
Unified Detection
● Feature Extraction
○ Predict all class BB simultaneously
● SxS Grid
○ Each cell predicts B bounding boxes + Confidence Score
● Confidence Score
○ Confidence is IOU between predicted box and any ground truth box =
● Class Probability
● Tensor
Detection Process (YOLO) Grid SXS
S = 7
Confidence Score
Each grid cell predicts B bounding boxes and confidence scores for those boxes.
If a cell has an object , then confidence score = Intersection over union (IOU)
between the predicted box and the ground truth.
Detection Process (YOLO)
Each cell predicts B boxes(x,y,w,h) and
confidences of each box: P(Object)
.(x,y)
w
h
B = 2
Prob. that box contains an
object P1, P2
No
Object
Each cell predicts Bounding Boxes and Confidence
.(x,y)
Each cell also predicts class probability
Bicycle
Dog
Car
E.g. Dog :
0.8
Car : 0
Bicycle : 0
E.g. Dog : 0
Car : 0
Bicycle : 0.7
E.g. Dog : 0
Car :
0.7
Bicycle : 0
Bounding Boxes + Class Prediction
.(x,y)
P (class) = P (class|object) x P(object) Thresholding
Model
These predictions are encoded
as Tensor of dimension
(SxSx(Bx5+C))
SxS grid,
C = class probability,
B= no of bounding boxes.
Network Design
● Inspired by the GoogLeNet (image classification)
● 24 convolutional layers followed by 2 fully connected layers
● Fast YOLO uses 9 convolutional layers (instead of 24)
Training
1. Pretrain on ImageNet 1000 dataset
2. 20 convolutional layers + an average pooling layer + a fully connected layer
3. Trained for 1 week, accuracy 88% (ImageNet 2012 validation dataset)
4. Convert model to perform detection
5. Added 4 convolutional layer + 2 fully connected layer + increased input resolution from 224 x 224 to
448 x 448.
6. Final layer predicts class probabilities + BB.
7. Linear activation function (final layer), Relu (all other layers)
8. Sum of squared error as loss function (easy to optimise)
Loss Function
Training - Validation
1. Train network for 135 epochs on the training and validation data sets from PASCAL
VOC 2007 AND 2012
2. Testing data VOC 2007 & 2012
3. Batch size = 64, momentum = 0.9, decay = 0.0005
4. Learning rate :
a. First few epochs , raise LR 10^-3 to 10^-2
b. Model diverges if starting LR is high due to unstable gradient
c. first 75 epoch, LR 10^-2
d. next 30 epochs, LR 10^-3
e. next 30 epochs, LR 10^-4
5. To avoid overfitting:
a. Dropout layer with rate 0.5
b. For Data Augmentation, scaling and translation up to 20% of original image size
Inference
● On PASCAL VOC YOLO predicts 98 BB per image and class probability for
each box.
● Objects near border are localised by multiple cells
○ Non Maximal suppression can be used to fix these multiple detections (Non-max suppression is a
way to eliminate points that do not lie in important edges. )
■ Adds 2 to 3% to mAP
Limitation of YOLO
● Struggle with small objects
● Struggles with difference aspects and ratio of objects
● Loss function treats error in different size of boxes same
Comparison with other Real time Systems:
● DPM : disjoint pipeline (sliding window, features, classify, predict BB) -
YOLO concurrently
● R-CNN : region proposal , complex pipeline ( predict bb, extract
features, non-max suppression) - 40 sec per image (2000 BB) : YOLO
: 98 BB
● Deep Multibox : cnn, cannot do general detection
● OverFeat : cnn, disjoint system, no global context
● MultiGrasp : similar in design (YOLO) , only find a region
Experiments
● PASCAL VOC
2007
● Realtime :
○ YOLO VS DPM 30
Hz
VOC 2007 Error Analysis
Combining Fast R-CNN and YOLO
● YOLO makes fewer background
mistakes than Fast R-CNN
● This combination doesn’t benefit
from the speed of YOLO since
each model is run separately and
then combine the results.
VOC 2012 Results
● YOLO struggles with small objects (bottle, sheep, tv/monitor)
● Fast R-CNN + YOLO : Highest performing detection methods
Generalizability: Person Detection in Artwork
● YOLO has good performance on VOC 2007
● Its AP degrades less than other methods when applied to artwork.
● Artwork / Natural Images are very different on a pixel level but very similar in terms of size and
shape, so YOLO predicts good bounding boxes and detections.
Results
Darknet (YOLO) Results on random images
Ad

More Related Content

What's hot (20)

Yolo
YoloYolo
Yolo
Sourav Garai
 
Yolo
YoloYolo
Yolo
Bang Tsui Liou
 
You only look once
You only look onceYou only look once
You only look once
Gin Kyeng Lee
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental Improvement
Jinwon Lee
 
Yolo releases gianmaria
Yolo releases gianmariaYolo releases gianmaria
Yolo releases gianmaria
Deep Learning Italia
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
Usman Qayyum
 
You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)
Universitat Politècnica de Catalunya
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detection
Wenjing Chen
 
Yolov5
Yolov5 Yolov5
Yolov5
Hochschule Bonn-Rhein-Sieg
 
Yolov3
Yolov3Yolov3
Yolov3
SHREY MOHAN
 
Yolov3
Yolov3Yolov3
Yolov3
VincentWu105
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
Jihong Kang
 
Yol ov2
Yol ov2Yol ov2
Yol ov2
Bang Tsui Liou
 
A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaA Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi Kerola
Preferred Networks
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
Amar Jindal
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
Sushant Shrivastava
 
YOLO v1
YOLO v1YOLO v1
YOLO v1
오 혜린
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
Jinwon Lee
 
End-to-End Object Detection with Transformers
End-to-End Object Detection with TransformersEnd-to-End Object Detection with Transformers
End-to-End Object Detection with Transformers
Seunghyun Hwang
 
Deep learning based object detection
Deep learning based object detectionDeep learning based object detection
Deep learning based object detection
chettykulkarni
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental Improvement
Jinwon Lee
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
Usman Qayyum
 
You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)
Universitat Politècnica de Catalunya
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detection
Wenjing Chen
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
Jihong Kang
 
A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaA Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi Kerola
Preferred Networks
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
Amar Jindal
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
Sushant Shrivastava
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
Jinwon Lee
 
End-to-End Object Detection with Transformers
End-to-End Object Detection with TransformersEnd-to-End Object Detection with Transformers
End-to-End Object Detection with Transformers
Seunghyun Hwang
 
Deep learning based object detection
Deep learning based object detectionDeep learning based object detection
Deep learning based object detection
chettykulkarni
 

Similar to You only look once (YOLO) : unified real time object detection (20)

“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
Edge AI and Vision Alliance
 
Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...
Universitat de Barcelona
 
物件偵測與辨識技術
物件偵測與辨識技術物件偵測與辨識技術
物件偵測與辨識技術
CHENHuiMei
 
Deep image retrieval learning global representations for image search
Deep image retrieval  learning global representations for image searchDeep image retrieval  learning global representations for image search
Deep image retrieval learning global representations for image search
Universitat Politècnica de Catalunya
 
object detection paper review
object detection paper reviewobject detection paper review
object detection paper review
Yoonho Na
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Anchor free object detection by deep learning
Anchor free object detection by deep learningAnchor free object detection by deep learning
Anchor free object detection by deep learning
Yu Huang
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
Edge AI and Vision Alliance
 
Eye deep
Eye deepEye deep
Eye deep
sveitser
 
Original SOINN
Original SOINNOriginal SOINN
Original SOINN
SOINN Inc.
 
Review: You Only Look One-level Feature
Review: You Only Look One-level FeatureReview: You Only Look One-level Feature
Review: You Only Look One-level Feature
Dongmin Choi
 
Anatomy of YOLO - v1
Anatomy of YOLO - v1Anatomy of YOLO - v1
Anatomy of YOLO - v1
Jihoon Song
 
Classification of Object Detection Algorithms
Classification of Object Detection AlgorithmsClassification of Object Detection Algorithms
Classification of Object Detection Algorithms
VaishuRaj4
 
Cahall Final Intern Presentation
Cahall Final Intern PresentationCahall Final Intern Presentation
Cahall Final Intern Presentation
Daniel Cahall
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
MLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningMLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, Captioning
Charles Deledalle
 
Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptx
fahmi324663
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
Jinwon Lee
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
ssuser3aa461
 
3D Multi Object GAN
3D Multi Object GAN3D Multi Object GAN
3D Multi Object GAN
Yu Nishimura
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
Edge AI and Vision Alliance
 
Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...Deep image retrieval - learning global representations for image search - ub ...
Deep image retrieval - learning global representations for image search - ub ...
Universitat de Barcelona
 
物件偵測與辨識技術
物件偵測與辨識技術物件偵測與辨識技術
物件偵測與辨識技術
CHENHuiMei
 
Deep image retrieval learning global representations for image search
Deep image retrieval  learning global representations for image searchDeep image retrieval  learning global representations for image search
Deep image retrieval learning global representations for image search
Universitat Politècnica de Catalunya
 
object detection paper review
object detection paper reviewobject detection paper review
object detection paper review
Yoonho Na
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Anchor free object detection by deep learning
Anchor free object detection by deep learningAnchor free object detection by deep learning
Anchor free object detection by deep learning
Yu Huang
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
Edge AI and Vision Alliance
 
Original SOINN
Original SOINNOriginal SOINN
Original SOINN
SOINN Inc.
 
Review: You Only Look One-level Feature
Review: You Only Look One-level FeatureReview: You Only Look One-level Feature
Review: You Only Look One-level Feature
Dongmin Choi
 
Anatomy of YOLO - v1
Anatomy of YOLO - v1Anatomy of YOLO - v1
Anatomy of YOLO - v1
Jihoon Song
 
Classification of Object Detection Algorithms
Classification of Object Detection AlgorithmsClassification of Object Detection Algorithms
Classification of Object Detection Algorithms
VaishuRaj4
 
Cahall Final Intern Presentation
Cahall Final Intern PresentationCahall Final Intern Presentation
Cahall Final Intern Presentation
Daniel Cahall
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
MLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningMLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, Captioning
Charles Deledalle
 
Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptx
fahmi324663
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
Jinwon Lee
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
ssuser3aa461
 
3D Multi Object GAN
3D Multi Object GAN3D Multi Object GAN
3D Multi Object GAN
Yu Nishimura
 
Ad

More from Entrepreneur / Startup (13)

R-FCN : object detection via region-based fully convolutional networks
R-FCN :  object detection via region-based fully convolutional networksR-FCN :  object detection via region-based fully convolutional networks
R-FCN : object detection via region-based fully convolutional networks
Entrepreneur / Startup
 
Machine Learning Algorithms in Enterprise Applications
Machine Learning Algorithms in Enterprise ApplicationsMachine Learning Algorithms in Enterprise Applications
Machine Learning Algorithms in Enterprise Applications
Entrepreneur / Startup
 
OpenAI Gym & Universe
OpenAI Gym & UniverseOpenAI Gym & Universe
OpenAI Gym & Universe
Entrepreneur / Startup
 
Build a Neural Network for ITSM with TensorFlow
Build a Neural Network for ITSM with TensorFlowBuild a Neural Network for ITSM with TensorFlow
Build a Neural Network for ITSM with TensorFlow
Entrepreneur / Startup
 
Understanding Autoencoder (Deep Learning Book, Chapter 14)
Understanding Autoencoder  (Deep Learning Book, Chapter 14)Understanding Autoencoder  (Deep Learning Book, Chapter 14)
Understanding Autoencoder (Deep Learning Book, Chapter 14)
Entrepreneur / Startup
 
Build an AI based virtual agent
Build an AI based virtual agent Build an AI based virtual agent
Build an AI based virtual agent
Entrepreneur / Startup
 
Building Bots Using IBM Watson
Building Bots Using IBM WatsonBuilding Bots Using IBM Watson
Building Bots Using IBM Watson
Entrepreneur / Startup
 
Building chat bots using ai platforms (wit.ai or api.ai) in nodejs
Building chat bots using ai platforms (wit.ai or api.ai) in nodejsBuilding chat bots using ai platforms (wit.ai or api.ai) in nodejs
Building chat bots using ai platforms (wit.ai or api.ai) in nodejs
Entrepreneur / Startup
 
Building mobile apps using meteorJS
Building mobile apps using meteorJSBuilding mobile apps using meteorJS
Building mobile apps using meteorJS
Entrepreneur / Startup
 
Building iOS app using meteor
Building iOS app using meteorBuilding iOS app using meteor
Building iOS app using meteor
Entrepreneur / Startup
 
Understanding angular meteor
Understanding angular meteorUnderstanding angular meteor
Understanding angular meteor
Entrepreneur / Startup
 
Introducing ElasticSearch - Ashish
Introducing ElasticSearch - AshishIntroducing ElasticSearch - Ashish
Introducing ElasticSearch - Ashish
Entrepreneur / Startup
 
Meteor Introduction - Ashish
Meteor Introduction - AshishMeteor Introduction - Ashish
Meteor Introduction - Ashish
Entrepreneur / Startup
 
R-FCN : object detection via region-based fully convolutional networks
R-FCN :  object detection via region-based fully convolutional networksR-FCN :  object detection via region-based fully convolutional networks
R-FCN : object detection via region-based fully convolutional networks
Entrepreneur / Startup
 
Machine Learning Algorithms in Enterprise Applications
Machine Learning Algorithms in Enterprise ApplicationsMachine Learning Algorithms in Enterprise Applications
Machine Learning Algorithms in Enterprise Applications
Entrepreneur / Startup
 
Build a Neural Network for ITSM with TensorFlow
Build a Neural Network for ITSM with TensorFlowBuild a Neural Network for ITSM with TensorFlow
Build a Neural Network for ITSM with TensorFlow
Entrepreneur / Startup
 
Understanding Autoencoder (Deep Learning Book, Chapter 14)
Understanding Autoencoder  (Deep Learning Book, Chapter 14)Understanding Autoencoder  (Deep Learning Book, Chapter 14)
Understanding Autoencoder (Deep Learning Book, Chapter 14)
Entrepreneur / Startup
 
Building chat bots using ai platforms (wit.ai or api.ai) in nodejs
Building chat bots using ai platforms (wit.ai or api.ai) in nodejsBuilding chat bots using ai platforms (wit.ai or api.ai) in nodejs
Building chat bots using ai platforms (wit.ai or api.ai) in nodejs
Entrepreneur / Startup
 
Ad

Recently uploaded (20)

2.3 Genetically Modified Organisms (1).ppt
2.3 Genetically Modified Organisms (1).ppt2.3 Genetically Modified Organisms (1).ppt
2.3 Genetically Modified Organisms (1).ppt
rakshaiya16
 
hypermedia_system_revisit_roy_fielding .
hypermedia_system_revisit_roy_fielding .hypermedia_system_revisit_roy_fielding .
hypermedia_system_revisit_roy_fielding .
NABLAS株式会社
 
Smart City is the Future EN - 2024 Thailand Modify V1.0.pdf
Smart City is the Future EN - 2024 Thailand Modify V1.0.pdfSmart City is the Future EN - 2024 Thailand Modify V1.0.pdf
Smart City is the Future EN - 2024 Thailand Modify V1.0.pdf
PawachMetharattanara
 
Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025
Antonin Danalet
 
Evonik Overview Visiomer Specialty Methacrylates.pdf
Evonik Overview Visiomer Specialty Methacrylates.pdfEvonik Overview Visiomer Specialty Methacrylates.pdf
Evonik Overview Visiomer Specialty Methacrylates.pdf
szhang13
 
Jacob Murphy Australia - Excels In Optimizing Software Applications
Jacob Murphy Australia - Excels In Optimizing Software ApplicationsJacob Murphy Australia - Excels In Optimizing Software Applications
Jacob Murphy Australia - Excels In Optimizing Software Applications
Jacob Murphy Australia
 
Slide share PPT of NOx control technologies.pptx
Slide share PPT of  NOx control technologies.pptxSlide share PPT of  NOx control technologies.pptx
Slide share PPT of NOx control technologies.pptx
vvsasane
 
Generative AI & Large Language Models Agents
Generative AI & Large Language Models AgentsGenerative AI & Large Language Models Agents
Generative AI & Large Language Models Agents
aasgharbee22seecs
 
Working with USDOT UTCs: From Conception to Implementation
Working with USDOT UTCs: From Conception to ImplementationWorking with USDOT UTCs: From Conception to Implementation
Working with USDOT UTCs: From Conception to Implementation
Alabama Transportation Assistance Program
 
David Boutry - Specializes In AWS, Microservices And Python.pdf
David Boutry - Specializes In AWS, Microservices And Python.pdfDavid Boutry - Specializes In AWS, Microservices And Python.pdf
David Boutry - Specializes In AWS, Microservices And Python.pdf
David Boutry
 
Empowering Electric Vehicle Charging Infrastructure with Renewable Energy Int...
Empowering Electric Vehicle Charging Infrastructure with Renewable Energy Int...Empowering Electric Vehicle Charging Infrastructure with Renewable Energy Int...
Empowering Electric Vehicle Charging Infrastructure with Renewable Energy Int...
AI Publications
 
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdfLittle Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
gori42199
 
Personal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.pptPersonal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.ppt
ganjangbegu579
 
Modeling the Influence of Environmental Factors on Concrete Evaporation Rate
Modeling the Influence of Environmental Factors on Concrete Evaporation RateModeling the Influence of Environmental Factors on Concrete Evaporation Rate
Modeling the Influence of Environmental Factors on Concrete Evaporation Rate
Journal of Soft Computing in Civil Engineering
 
Machine foundation notes for civil engineering students
Machine foundation notes for civil engineering studentsMachine foundation notes for civil engineering students
Machine foundation notes for civil engineering students
DYPCET
 
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdfML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
rameshwarchintamani
 
ATAL 6 Days Online FDP Scheme Document 2025-26.pdf
ATAL 6 Days Online FDP Scheme Document 2025-26.pdfATAL 6 Days Online FDP Scheme Document 2025-26.pdf
ATAL 6 Days Online FDP Scheme Document 2025-26.pdf
ssuserda39791
 
Nanometer Metal-Organic-Framework Literature Comparison
Nanometer Metal-Organic-Framework  Literature ComparisonNanometer Metal-Organic-Framework  Literature Comparison
Nanometer Metal-Organic-Framework Literature Comparison
Chris Harding
 
22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf
22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf
22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf
Guru Nanak Technical Institutions
 
Machine Learning basics POWERPOINT PRESENETATION
Machine Learning basics POWERPOINT PRESENETATIONMachine Learning basics POWERPOINT PRESENETATION
Machine Learning basics POWERPOINT PRESENETATION
DarrinBright1
 
2.3 Genetically Modified Organisms (1).ppt
2.3 Genetically Modified Organisms (1).ppt2.3 Genetically Modified Organisms (1).ppt
2.3 Genetically Modified Organisms (1).ppt
rakshaiya16
 
hypermedia_system_revisit_roy_fielding .
hypermedia_system_revisit_roy_fielding .hypermedia_system_revisit_roy_fielding .
hypermedia_system_revisit_roy_fielding .
NABLAS株式会社
 
Smart City is the Future EN - 2024 Thailand Modify V1.0.pdf
Smart City is the Future EN - 2024 Thailand Modify V1.0.pdfSmart City is the Future EN - 2024 Thailand Modify V1.0.pdf
Smart City is the Future EN - 2024 Thailand Modify V1.0.pdf
PawachMetharattanara
 
Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025
Antonin Danalet
 
Evonik Overview Visiomer Specialty Methacrylates.pdf
Evonik Overview Visiomer Specialty Methacrylates.pdfEvonik Overview Visiomer Specialty Methacrylates.pdf
Evonik Overview Visiomer Specialty Methacrylates.pdf
szhang13
 
Jacob Murphy Australia - Excels In Optimizing Software Applications
Jacob Murphy Australia - Excels In Optimizing Software ApplicationsJacob Murphy Australia - Excels In Optimizing Software Applications
Jacob Murphy Australia - Excels In Optimizing Software Applications
Jacob Murphy Australia
 
Slide share PPT of NOx control technologies.pptx
Slide share PPT of  NOx control technologies.pptxSlide share PPT of  NOx control technologies.pptx
Slide share PPT of NOx control technologies.pptx
vvsasane
 
Generative AI & Large Language Models Agents
Generative AI & Large Language Models AgentsGenerative AI & Large Language Models Agents
Generative AI & Large Language Models Agents
aasgharbee22seecs
 
David Boutry - Specializes In AWS, Microservices And Python.pdf
David Boutry - Specializes In AWS, Microservices And Python.pdfDavid Boutry - Specializes In AWS, Microservices And Python.pdf
David Boutry - Specializes In AWS, Microservices And Python.pdf
David Boutry
 
Empowering Electric Vehicle Charging Infrastructure with Renewable Energy Int...
Empowering Electric Vehicle Charging Infrastructure with Renewable Energy Int...Empowering Electric Vehicle Charging Infrastructure with Renewable Energy Int...
Empowering Electric Vehicle Charging Infrastructure with Renewable Energy Int...
AI Publications
 
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdfLittle Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
gori42199
 
Personal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.pptPersonal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.ppt
ganjangbegu579
 
Machine foundation notes for civil engineering students
Machine foundation notes for civil engineering studentsMachine foundation notes for civil engineering students
Machine foundation notes for civil engineering students
DYPCET
 
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdfML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
rameshwarchintamani
 
ATAL 6 Days Online FDP Scheme Document 2025-26.pdf
ATAL 6 Days Online FDP Scheme Document 2025-26.pdfATAL 6 Days Online FDP Scheme Document 2025-26.pdf
ATAL 6 Days Online FDP Scheme Document 2025-26.pdf
ssuserda39791
 
Nanometer Metal-Organic-Framework Literature Comparison
Nanometer Metal-Organic-Framework  Literature ComparisonNanometer Metal-Organic-Framework  Literature Comparison
Nanometer Metal-Organic-Framework Literature Comparison
Chris Harding
 
Machine Learning basics POWERPOINT PRESENETATION
Machine Learning basics POWERPOINT PRESENETATIONMachine Learning basics POWERPOINT PRESENETATION
Machine Learning basics POWERPOINT PRESENETATION
DarrinBright1
 

You only look once (YOLO) : unified real time object detection

  • 1. You Only Look Once (YOLO): Unified Real-Time Object Detection Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi University of Washington, Allen Institute for AI, Facebook AI Research ~ Ashish
  • 2. Previously : Object Detection by Classifiers ● DPM (Deformable Parts Model) ○ Sliding window → classifier (evenly spaced locations) ● R-CNN ○ Region proposal --> potential BB ○ Run classifiers on BB ○ Post processing (refinement, eliminate, rescore) ● YOLO ○ Resize image, run convolutional network, non-max suppression
  • 3. YOLO : Object Detection as Regression Problem ● output: Bounding box coordinates and Class Probabilities ● Single Neural Network ● Benefits: ○ Extremely Fast (one NN + 45 frames per sec), twice more mAP. ○ Global Reasoning (knows context, less background errors) ○ Generalizable Representations (train natural images, test art-work, applicable new domain)
  • 4. Unified Detection ● Feature Extraction ○ Predict all class BB simultaneously ● SxS Grid ○ Each cell predicts B bounding boxes + Confidence Score ● Confidence Score ○ Confidence is IOU between predicted box and any ground truth box = ● Class Probability ● Tensor
  • 5. Detection Process (YOLO) Grid SXS S = 7
  • 6. Confidence Score Each grid cell predicts B bounding boxes and confidence scores for those boxes. If a cell has an object , then confidence score = Intersection over union (IOU) between the predicted box and the ground truth.
  • 7. Detection Process (YOLO) Each cell predicts B boxes(x,y,w,h) and confidences of each box: P(Object) .(x,y) w h B = 2 Prob. that box contains an object P1, P2 No Object
  • 8. Each cell predicts Bounding Boxes and Confidence .(x,y)
  • 9. Each cell also predicts class probability Bicycle Dog Car E.g. Dog : 0.8 Car : 0 Bicycle : 0 E.g. Dog : 0 Car : 0 Bicycle : 0.7 E.g. Dog : 0 Car : 0.7 Bicycle : 0
  • 10. Bounding Boxes + Class Prediction .(x,y) P (class) = P (class|object) x P(object) Thresholding
  • 11. Model These predictions are encoded as Tensor of dimension (SxSx(Bx5+C)) SxS grid, C = class probability, B= no of bounding boxes.
  • 12. Network Design ● Inspired by the GoogLeNet (image classification) ● 24 convolutional layers followed by 2 fully connected layers ● Fast YOLO uses 9 convolutional layers (instead of 24)
  • 13. Training 1. Pretrain on ImageNet 1000 dataset 2. 20 convolutional layers + an average pooling layer + a fully connected layer 3. Trained for 1 week, accuracy 88% (ImageNet 2012 validation dataset) 4. Convert model to perform detection 5. Added 4 convolutional layer + 2 fully connected layer + increased input resolution from 224 x 224 to 448 x 448. 6. Final layer predicts class probabilities + BB. 7. Linear activation function (final layer), Relu (all other layers) 8. Sum of squared error as loss function (easy to optimise)
  • 15. Training - Validation 1. Train network for 135 epochs on the training and validation data sets from PASCAL VOC 2007 AND 2012 2. Testing data VOC 2007 & 2012 3. Batch size = 64, momentum = 0.9, decay = 0.0005 4. Learning rate : a. First few epochs , raise LR 10^-3 to 10^-2 b. Model diverges if starting LR is high due to unstable gradient c. first 75 epoch, LR 10^-2 d. next 30 epochs, LR 10^-3 e. next 30 epochs, LR 10^-4 5. To avoid overfitting: a. Dropout layer with rate 0.5 b. For Data Augmentation, scaling and translation up to 20% of original image size
  • 16. Inference ● On PASCAL VOC YOLO predicts 98 BB per image and class probability for each box. ● Objects near border are localised by multiple cells ○ Non Maximal suppression can be used to fix these multiple detections (Non-max suppression is a way to eliminate points that do not lie in important edges. ) ■ Adds 2 to 3% to mAP
  • 17. Limitation of YOLO ● Struggle with small objects ● Struggles with difference aspects and ratio of objects ● Loss function treats error in different size of boxes same
  • 18. Comparison with other Real time Systems: ● DPM : disjoint pipeline (sliding window, features, classify, predict BB) - YOLO concurrently ● R-CNN : region proposal , complex pipeline ( predict bb, extract features, non-max suppression) - 40 sec per image (2000 BB) : YOLO : 98 BB ● Deep Multibox : cnn, cannot do general detection ● OverFeat : cnn, disjoint system, no global context ● MultiGrasp : similar in design (YOLO) , only find a region
  • 19. Experiments ● PASCAL VOC 2007 ● Realtime : ○ YOLO VS DPM 30 Hz
  • 20. VOC 2007 Error Analysis
  • 21. Combining Fast R-CNN and YOLO ● YOLO makes fewer background mistakes than Fast R-CNN ● This combination doesn’t benefit from the speed of YOLO since each model is run separately and then combine the results.
  • 22. VOC 2012 Results ● YOLO struggles with small objects (bottle, sheep, tv/monitor) ● Fast R-CNN + YOLO : Highest performing detection methods
  • 23. Generalizability: Person Detection in Artwork ● YOLO has good performance on VOC 2007 ● Its AP degrades less than other methods when applied to artwork. ● Artwork / Natural Images are very different on a pixel level but very similar in terms of size and shape, so YOLO predicts good bounding boxes and detections.
  • 25. Darknet (YOLO) Results on random images
  翻译: