SlideShare a Scribd company logo
CONFIDENTIAL
Quantization and Training of
Neural Networks for Efficient
Integer-Arithmetic-Only Inference
[Jacob et al. from Google 2017]
Ryo Takahashi
2
Motivation
Let’s get deeper into
optimized arithmetic
inside Neural Networks!!
3
Approaches to CNN deployment on mobile platform
● Approach 1: computation/memory-efficient network architecture
l e.g. MobileNet[arXiv:1704.04861], SqueezeNet[arXiv:1602.07360]
● Approach 2: quantization (Today’s topic)
l definition: quantize weights and activations from float into lower bit-depth format
l benefit: save memory/power use, speed up inference
Existing works Issues
• Ternary weight networks [arXiv:1605.04711]
• Binary Neural networks [arXiv:1602.02505]
• Their baseline architectures are over-parameterized
- fat architectures (e.g. VGG) are easy to compress
- it’s still unclear that their schemes are applicable
to modern light-weight architectures (e.g. MobileNet)
- they are verified only in classification tasks, which
are tolerant to quantization errors unlike regression
• NOT efficient on common hardware (e.g. CPU)
- bit-shifts/counts based conv. provides
benefit only on custom hardware (e.g. FPGA, ASIC)these works can approximate conv. by bit-shifts/counts
4
● improve latency-vs-accuracy tradeoffs of MobileNets on common hardware
a) Integer-arithmetic-only inference
- why convert weight and activation to not int8 but uint8 ?
- why keep the bit-depth of biases to 32bit?
b) Quantization-aware training
- quantize weight and activation during training unlike calibration
c) Evaluation in ImageNet classification and COCO object detection
Proposal: Integer-arithmetic-only quantization
5
OSS Contribution
● This work is included in Google’s ML
software stack:
l TensorFlow (Model optimization)
l TensorFlow Lite (Case studies)
l Android NN
light
weightfat
big accuracy drop small accuracy drop
this work
↓
6
Quantization scheme
● Equation:
l where:
l r : real value
l q : quantized value
l S : scale (learned in training)
l Z : zero-point (learned in training)
● Data structure in C++
l create struct QuantizedBuffer for
each weight and activation
l each buffer has different S and Z
e.g. QType=uint3
Whey we can say integer-only-arithmetic
in spite of this float S ?
7
● Consider 𝑋" = 𝑋$ % 𝑋& where
which be rewritten as:
where:
𝑀 is empirically in (0,1)
where:
𝑛 is a non-negative integer
𝑀) is a fixed-point value of typedef int32_t q31_t; // Q-format
Integer-arithmetic-only matrix multiplication
𝑋* =
𝑟*
(),))
⋯ 𝑟*
(),0)
⋮ 𝑟*
(2,3)
⋮
𝑟*
(0,))
⋯ 𝑟*
(0,0)
,
Conv. & Affine get free from float-arithmetic
by approximating 𝑀 by int32_t
these 𝑁" addition can be
factored-out from
calculation for each 𝑞"
this 2𝑁" arithmetic operation
stays in the inner loop
8
Implementation of a typical fused layer
(1) Accumulate products in
• quantize bias-vectors by not uint8 but int32
- reason: quantization errors in bias-vectors tend to be overall errors
because their elements are added to many output activations
(2) Scale down int32_t to uint8
a. multiplying the fixed-point value 𝑀)
b. 𝑛 bit-shift
c. saturating cast to [0, 255]
(3) Apply activation functions
• mere clamp uint8_t because MobileNets use only ReLU and ReLU6
(2) (1) (3)
9
Quantization-aware training
● Motivation: post-quantization has difficulties in handling:
l large differences (> 100×) in ranges of weights for each output channels
l outlier weight values
● Approach: simulate integer-quantization effects during training
Step-1: Create a floating-point
graph as usual
Step-2: Insert fake quantization operations, which
downcast tensors to fewer bits in float
After training
proceeds…
As for activations, in addition to
simulation, aggregate them via
exponential moving averages (EMA)
10
Experiments with MobileNets
● CPU: Snapdragon 835
l march: ARM big.LITTLE [Cortex-(A73|A53)]
l optimize by ARM NEON
ImageNet:
• in the LITTLE core, the accuracy gap at 33ms (30FPS)
is quite substantial (~10%)
COCO:
• MobileNet SSD was used
• up to a 50% reduction in inference time
with a minimal loss in accuracy (−1.8% relative)
• The INT8 quantization deals well with regression
tasks
big LITTLE
11
Summary & My perspective
● Summary
l integer quantization benefits in common hardware like CPU
l quantization-aware training is crucial in quantizing modern light-weight
architectures (e.g. Mobile-Nets) and error-sensitive tasks such as regression
● My perspective
l integer quantization is the most moderate and generic scheme so far
l Extreme quantization like BNNs
is achievable only if parallel development
of hardware and software works
l Google
l NVIDIA
l Apple
Ad

More Related Content

What's hot (20)

Quantum neural network
Quantum neural networkQuantum neural network
Quantum neural network
surat murthy
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
Shuai Zhang
 
Autoencoders in Deep Learning
Autoencoders in Deep LearningAutoencoders in Deep Learning
Autoencoders in Deep Learning
milad abbasi
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
Christian Perone
 
Resnet
ResnetResnet
Resnet
ashwinjoseph95
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
홍배 김
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer Perceptrons
ESCOM
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
Ferdous ahmed
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An Overview
Poo Kuan Hoong
 
Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised Learning
Lukas Tencer
 
Activation functions
Activation functionsActivation functions
Activation functions
PRATEEK SAHU
 
Activation function
Activation functionActivation function
Activation function
Astha Jain
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descent
Muhammad Rasel
 
CONVOLUTIONAL NEURAL NETWORK
CONVOLUTIONAL NEURAL NETWORKCONVOLUTIONAL NEURAL NETWORK
CONVOLUTIONAL NEURAL NETWORK
Md Rajib Bhuiyan
 
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Sri Ambati
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
남주 김
 
Machine Learning Interpretability / Explainability
Machine Learning Interpretability / ExplainabilityMachine Learning Interpretability / Explainability
Machine Learning Interpretability / Explainability
Raouf KESKES
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
leopauly
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
Vajiheh Zoghiyan
 
Autoencoder
AutoencoderAutoencoder
Autoencoder
HARISH R
 
Quantum neural network
Quantum neural networkQuantum neural network
Quantum neural network
surat murthy
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
Shuai Zhang
 
Autoencoders in Deep Learning
Autoencoders in Deep LearningAutoencoders in Deep Learning
Autoencoders in Deep Learning
milad abbasi
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
Christian Perone
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
홍배 김
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer Perceptrons
ESCOM
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
Ferdous ahmed
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An Overview
Poo Kuan Hoong
 
Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised Learning
Lukas Tencer
 
Activation functions
Activation functionsActivation functions
Activation functions
PRATEEK SAHU
 
Activation function
Activation functionActivation function
Activation function
Astha Jain
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descent
Muhammad Rasel
 
CONVOLUTIONAL NEURAL NETWORK
CONVOLUTIONAL NEURAL NETWORKCONVOLUTIONAL NEURAL NETWORK
CONVOLUTIONAL NEURAL NETWORK
Md Rajib Bhuiyan
 
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Sri Ambati
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
남주 김
 
Machine Learning Interpretability / Explainability
Machine Learning Interpretability / ExplainabilityMachine Learning Interpretability / Explainability
Machine Learning Interpretability / Explainability
Raouf KESKES
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
leopauly
 
Autoencoder
AutoencoderAutoencoder
Autoencoder
HARISH R
 

Similar to Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference (20)

Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Duy-Hieu Bui
 
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
Bharath Sudharsan
 
Practical ML
Practical MLPractical ML
Practical ML
Antonio Pitasi
 
AI Lesson 39
AI Lesson 39AI Lesson 39
AI Lesson 39
Assistant Professor
 
Lesson 39
Lesson 39Lesson 39
Lesson 39
Avijit Kumar
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
Te-Yen Liu
 
Scolari's ICCD17 Talk
Scolari's ICCD17 TalkScolari's ICCD17 Talk
Scolari's ICCD17 Talk
NECST Lab @ Politecnico di Milano
 
attention is all you need.pdf attention is all you need.pdfattention is all y...
attention is all you need.pdf attention is all you need.pdfattention is all y...attention is all you need.pdf attention is all you need.pdfattention is all y...
attention is all you need.pdf attention is all you need.pdfattention is all y...
Amit Ranjan
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
csandit
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Ahmed Yousry
 
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud MLScaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
Seldon
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
DonghyunKang12
 
Map-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on MulticoreMap-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on Multicore
illidan2004
 
A White Paper On Neural Network Quantization
A White Paper On Neural Network QuantizationA White Paper On Neural Network Quantization
A White Paper On Neural Network Quantization
April Knyff
 
Netflix machine learning
Netflix machine learningNetflix machine learning
Netflix machine learning
Amer Ather
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deployment
taeseon ryu
 
Backpropagation and computational graph.pptx
Backpropagation and computational graph.pptxBackpropagation and computational graph.pptx
Backpropagation and computational graph.pptx
tintu47
 
VCE Unit 01 (1).pptx
VCE Unit 01 (1).pptxVCE Unit 01 (1).pptx
VCE Unit 01 (1).pptx
skilljiolms
 
Fa19_P1.pptx
Fa19_P1.pptxFa19_P1.pptx
Fa19_P1.pptx
Md Abul Hayat
 
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentation
Owin Will
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Duy-Hieu Bui
 
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
Bharath Sudharsan
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
Te-Yen Liu
 
attention is all you need.pdf attention is all you need.pdfattention is all y...
attention is all you need.pdf attention is all you need.pdfattention is all y...attention is all you need.pdf attention is all you need.pdfattention is all y...
attention is all you need.pdf attention is all you need.pdfattention is all y...
Amit Ranjan
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
csandit
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Ahmed Yousry
 
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud MLScaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
Seldon
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
DonghyunKang12
 
Map-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on MulticoreMap-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on Multicore
illidan2004
 
A White Paper On Neural Network Quantization
A White Paper On Neural Network QuantizationA White Paper On Neural Network Quantization
A White Paper On Neural Network Quantization
April Knyff
 
Netflix machine learning
Netflix machine learningNetflix machine learning
Netflix machine learning
Amer Ather
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deployment
taeseon ryu
 
Backpropagation and computational graph.pptx
Backpropagation and computational graph.pptxBackpropagation and computational graph.pptx
Backpropagation and computational graph.pptx
tintu47
 
VCE Unit 01 (1).pptx
VCE Unit 01 (1).pptxVCE Unit 01 (1).pptx
VCE Unit 01 (1).pptx
skilljiolms
 
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentation
Owin Will
 
Ad

Recently uploaded (20)

Adobe Audition Crack FRESH Version 2025 FREE
Adobe Audition Crack FRESH Version 2025 FREEAdobe Audition Crack FRESH Version 2025 FREE
Adobe Audition Crack FRESH Version 2025 FREE
zafranwaqar90
 
Do not let staffing shortages and limited fiscal view hamper your cause
Do not let staffing shortages and limited fiscal view hamper your causeDo not let staffing shortages and limited fiscal view hamper your cause
Do not let staffing shortages and limited fiscal view hamper your cause
Fexle Services Pvt. Ltd.
 
Time Estimation: Expert Tips & Proven Project Techniques
Time Estimation: Expert Tips & Proven Project TechniquesTime Estimation: Expert Tips & Proven Project Techniques
Time Estimation: Expert Tips & Proven Project Techniques
Livetecs LLC
 
From Vibe Coding to Vibe Testing - Complete PowerPoint Presentation
From Vibe Coding to Vibe Testing - Complete PowerPoint PresentationFrom Vibe Coding to Vibe Testing - Complete PowerPoint Presentation
From Vibe Coding to Vibe Testing - Complete PowerPoint Presentation
Shay Ginsbourg
 
Robotic Process Automation (RPA) Software Development Services.pptx
Robotic Process Automation (RPA) Software Development Services.pptxRobotic Process Automation (RPA) Software Development Services.pptx
Robotic Process Automation (RPA) Software Development Services.pptx
julia smits
 
Exchange Migration Tool- Shoviv Software
Exchange Migration Tool- Shoviv SoftwareExchange Migration Tool- Shoviv Software
Exchange Migration Tool- Shoviv Software
Shoviv Software
 
The Elixir Developer - All Things Open
The Elixir Developer - All Things OpenThe Elixir Developer - All Things Open
The Elixir Developer - All Things Open
Carlo Gilmar Padilla Santana
 
Medical Device Cybersecurity Threat & Risk Scoring
Medical Device Cybersecurity Threat & Risk ScoringMedical Device Cybersecurity Threat & Risk Scoring
Medical Device Cybersecurity Threat & Risk Scoring
ICS
 
Autodesk Inventor Crack (2025) Latest
Autodesk Inventor    Crack (2025) LatestAutodesk Inventor    Crack (2025) Latest
Autodesk Inventor Crack (2025) Latest
Google
 
Adobe InDesign Crack FREE Download 2025 link
Adobe InDesign Crack FREE Download 2025 linkAdobe InDesign Crack FREE Download 2025 link
Adobe InDesign Crack FREE Download 2025 link
mahmadzubair09
 
Why Tapitag Ranks Among the Best Digital Business Card Providers
Why Tapitag Ranks Among the Best Digital Business Card ProvidersWhy Tapitag Ranks Among the Best Digital Business Card Providers
Why Tapitag Ranks Among the Best Digital Business Card Providers
Tapitag
 
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
OnePlan Solutions
 
Solar-wind hybrid engery a system sustainable power
Solar-wind  hybrid engery a system sustainable powerSolar-wind  hybrid engery a system sustainable power
Solar-wind hybrid engery a system sustainable power
bhoomigowda12345
 
Buy vs. Build: Unlocking the right path for your training tech
Buy vs. Build: Unlocking the right path for your training techBuy vs. Build: Unlocking the right path for your training tech
Buy vs. Build: Unlocking the right path for your training tech
Rustici Software
 
Serato DJ Pro Crack Latest Version 2025??
Serato DJ Pro Crack Latest Version 2025??Serato DJ Pro Crack Latest Version 2025??
Serato DJ Pro Crack Latest Version 2025??
Web Designer
 
Digital Twins Software Service in Belfast
Digital Twins Software Service in BelfastDigital Twins Software Service in Belfast
Digital Twins Software Service in Belfast
julia smits
 
Memory Management and Leaks in Postgres from pgext.day 2025
Memory Management and Leaks in Postgres from pgext.day 2025Memory Management and Leaks in Postgres from pgext.day 2025
Memory Management and Leaks in Postgres from pgext.day 2025
Phil Eaton
 
Best HR and Payroll Software in Bangladesh - accordHRM
Best HR and Payroll Software in Bangladesh - accordHRMBest HR and Payroll Software in Bangladesh - accordHRM
Best HR and Payroll Software in Bangladesh - accordHRM
accordHRM
 
Beyond the code. Complexity - 2025.05 - SwiftCraft
Beyond the code. Complexity - 2025.05 - SwiftCraftBeyond the code. Complexity - 2025.05 - SwiftCraft
Beyond the code. Complexity - 2025.05 - SwiftCraft
Dmitrii Ivanov
 
Top Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdf
Top Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdfTop Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdf
Top Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdf
evrigsolution
 
Adobe Audition Crack FRESH Version 2025 FREE
Adobe Audition Crack FRESH Version 2025 FREEAdobe Audition Crack FRESH Version 2025 FREE
Adobe Audition Crack FRESH Version 2025 FREE
zafranwaqar90
 
Do not let staffing shortages and limited fiscal view hamper your cause
Do not let staffing shortages and limited fiscal view hamper your causeDo not let staffing shortages and limited fiscal view hamper your cause
Do not let staffing shortages and limited fiscal view hamper your cause
Fexle Services Pvt. Ltd.
 
Time Estimation: Expert Tips & Proven Project Techniques
Time Estimation: Expert Tips & Proven Project TechniquesTime Estimation: Expert Tips & Proven Project Techniques
Time Estimation: Expert Tips & Proven Project Techniques
Livetecs LLC
 
From Vibe Coding to Vibe Testing - Complete PowerPoint Presentation
From Vibe Coding to Vibe Testing - Complete PowerPoint PresentationFrom Vibe Coding to Vibe Testing - Complete PowerPoint Presentation
From Vibe Coding to Vibe Testing - Complete PowerPoint Presentation
Shay Ginsbourg
 
Robotic Process Automation (RPA) Software Development Services.pptx
Robotic Process Automation (RPA) Software Development Services.pptxRobotic Process Automation (RPA) Software Development Services.pptx
Robotic Process Automation (RPA) Software Development Services.pptx
julia smits
 
Exchange Migration Tool- Shoviv Software
Exchange Migration Tool- Shoviv SoftwareExchange Migration Tool- Shoviv Software
Exchange Migration Tool- Shoviv Software
Shoviv Software
 
Medical Device Cybersecurity Threat & Risk Scoring
Medical Device Cybersecurity Threat & Risk ScoringMedical Device Cybersecurity Threat & Risk Scoring
Medical Device Cybersecurity Threat & Risk Scoring
ICS
 
Autodesk Inventor Crack (2025) Latest
Autodesk Inventor    Crack (2025) LatestAutodesk Inventor    Crack (2025) Latest
Autodesk Inventor Crack (2025) Latest
Google
 
Adobe InDesign Crack FREE Download 2025 link
Adobe InDesign Crack FREE Download 2025 linkAdobe InDesign Crack FREE Download 2025 link
Adobe InDesign Crack FREE Download 2025 link
mahmadzubair09
 
Why Tapitag Ranks Among the Best Digital Business Card Providers
Why Tapitag Ranks Among the Best Digital Business Card ProvidersWhy Tapitag Ranks Among the Best Digital Business Card Providers
Why Tapitag Ranks Among the Best Digital Business Card Providers
Tapitag
 
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
OnePlan Solutions
 
Solar-wind hybrid engery a system sustainable power
Solar-wind  hybrid engery a system sustainable powerSolar-wind  hybrid engery a system sustainable power
Solar-wind hybrid engery a system sustainable power
bhoomigowda12345
 
Buy vs. Build: Unlocking the right path for your training tech
Buy vs. Build: Unlocking the right path for your training techBuy vs. Build: Unlocking the right path for your training tech
Buy vs. Build: Unlocking the right path for your training tech
Rustici Software
 
Serato DJ Pro Crack Latest Version 2025??
Serato DJ Pro Crack Latest Version 2025??Serato DJ Pro Crack Latest Version 2025??
Serato DJ Pro Crack Latest Version 2025??
Web Designer
 
Digital Twins Software Service in Belfast
Digital Twins Software Service in BelfastDigital Twins Software Service in Belfast
Digital Twins Software Service in Belfast
julia smits
 
Memory Management and Leaks in Postgres from pgext.day 2025
Memory Management and Leaks in Postgres from pgext.day 2025Memory Management and Leaks in Postgres from pgext.day 2025
Memory Management and Leaks in Postgres from pgext.day 2025
Phil Eaton
 
Best HR and Payroll Software in Bangladesh - accordHRM
Best HR and Payroll Software in Bangladesh - accordHRMBest HR and Payroll Software in Bangladesh - accordHRM
Best HR and Payroll Software in Bangladesh - accordHRM
accordHRM
 
Beyond the code. Complexity - 2025.05 - SwiftCraft
Beyond the code. Complexity - 2025.05 - SwiftCraftBeyond the code. Complexity - 2025.05 - SwiftCraft
Beyond the code. Complexity - 2025.05 - SwiftCraft
Dmitrii Ivanov
 
Top Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdf
Top Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdfTop Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdf
Top Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdf
evrigsolution
 
Ad

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

  • 1. CONFIDENTIAL Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference [Jacob et al. from Google 2017] Ryo Takahashi
  • 2. 2 Motivation Let’s get deeper into optimized arithmetic inside Neural Networks!!
  • 3. 3 Approaches to CNN deployment on mobile platform ● Approach 1: computation/memory-efficient network architecture l e.g. MobileNet[arXiv:1704.04861], SqueezeNet[arXiv:1602.07360] ● Approach 2: quantization (Today’s topic) l definition: quantize weights and activations from float into lower bit-depth format l benefit: save memory/power use, speed up inference Existing works Issues • Ternary weight networks [arXiv:1605.04711] • Binary Neural networks [arXiv:1602.02505] • Their baseline architectures are over-parameterized - fat architectures (e.g. VGG) are easy to compress - it’s still unclear that their schemes are applicable to modern light-weight architectures (e.g. MobileNet) - they are verified only in classification tasks, which are tolerant to quantization errors unlike regression • NOT efficient on common hardware (e.g. CPU) - bit-shifts/counts based conv. provides benefit only on custom hardware (e.g. FPGA, ASIC)these works can approximate conv. by bit-shifts/counts
  • 4. 4 ● improve latency-vs-accuracy tradeoffs of MobileNets on common hardware a) Integer-arithmetic-only inference - why convert weight and activation to not int8 but uint8 ? - why keep the bit-depth of biases to 32bit? b) Quantization-aware training - quantize weight and activation during training unlike calibration c) Evaluation in ImageNet classification and COCO object detection Proposal: Integer-arithmetic-only quantization
  • 5. 5 OSS Contribution ● This work is included in Google’s ML software stack: l TensorFlow (Model optimization) l TensorFlow Lite (Case studies) l Android NN light weightfat big accuracy drop small accuracy drop this work ↓
  • 6. 6 Quantization scheme ● Equation: l where: l r : real value l q : quantized value l S : scale (learned in training) l Z : zero-point (learned in training) ● Data structure in C++ l create struct QuantizedBuffer for each weight and activation l each buffer has different S and Z e.g. QType=uint3 Whey we can say integer-only-arithmetic in spite of this float S ?
  • 7. 7 ● Consider 𝑋" = 𝑋$ % 𝑋& where which be rewritten as: where: 𝑀 is empirically in (0,1) where: 𝑛 is a non-negative integer 𝑀) is a fixed-point value of typedef int32_t q31_t; // Q-format Integer-arithmetic-only matrix multiplication 𝑋* = 𝑟* (),)) ⋯ 𝑟* (),0) ⋮ 𝑟* (2,3) ⋮ 𝑟* (0,)) ⋯ 𝑟* (0,0) , Conv. & Affine get free from float-arithmetic by approximating 𝑀 by int32_t these 𝑁" addition can be factored-out from calculation for each 𝑞" this 2𝑁" arithmetic operation stays in the inner loop
  • 8. 8 Implementation of a typical fused layer (1) Accumulate products in • quantize bias-vectors by not uint8 but int32 - reason: quantization errors in bias-vectors tend to be overall errors because their elements are added to many output activations (2) Scale down int32_t to uint8 a. multiplying the fixed-point value 𝑀) b. 𝑛 bit-shift c. saturating cast to [0, 255] (3) Apply activation functions • mere clamp uint8_t because MobileNets use only ReLU and ReLU6 (2) (1) (3)
  • 9. 9 Quantization-aware training ● Motivation: post-quantization has difficulties in handling: l large differences (> 100×) in ranges of weights for each output channels l outlier weight values ● Approach: simulate integer-quantization effects during training Step-1: Create a floating-point graph as usual Step-2: Insert fake quantization operations, which downcast tensors to fewer bits in float After training proceeds… As for activations, in addition to simulation, aggregate them via exponential moving averages (EMA)
  • 10. 10 Experiments with MobileNets ● CPU: Snapdragon 835 l march: ARM big.LITTLE [Cortex-(A73|A53)] l optimize by ARM NEON ImageNet: • in the LITTLE core, the accuracy gap at 33ms (30FPS) is quite substantial (~10%) COCO: • MobileNet SSD was used • up to a 50% reduction in inference time with a minimal loss in accuracy (−1.8% relative) • The INT8 quantization deals well with regression tasks big LITTLE
  • 11. 11 Summary & My perspective ● Summary l integer quantization benefits in common hardware like CPU l quantization-aware training is crucial in quantizing modern light-weight architectures (e.g. Mobile-Nets) and error-sensitive tasks such as regression ● My perspective l integer quantization is the most moderate and generic scheme so far l Extreme quantization like BNNs is achievable only if parallel development of hardware and software works l Google l NVIDIA l Apple
  翻译: