This document proposes an approach for quantizing neural networks to integer values only in order to enable efficient inference on common hardware like CPUs. It involves: (1) Quantizing weights and activations to unsigned 8-bit integers during both training and inference, while keeping biases in 32-bit. (2) Performing "quantization-aware training" where the model is trained with quantization simulated to help handle outlier values. Experiments on MobileNets for ImageNet classification and COCO object detection showed up to 50% faster inference with minimal accuracy loss using this integer-only quantization approach.