Deep Learning Model Compression Using Nvidia SoCs and Intel Atom CPUs
Deep Learning model compression flow -- an interative process with multiple test points

Deep Learning Model Compression Using Nvidia SoCs and Intel Atom CPUs

Signalogic is building a "continuous integration, continuous deployment" (CICD) model compression solution to provide continuously available cloud training, compression, and testing for IoT and Edge deep learning products. Example product areas include smart cameras, personal security assistants, and industrial robotics. An example target application is a smart camera observing a crowded location, tracking persons carrying a briefcase or wearing a backpack.

Model compression and training requires multiple concurrent test runs that implement device specific operations (such as fixed-point calculations, weight quantization, matrix math, onchip memory tradeoffs, etc) on the actual target device. The process is iterative, with multiple test points, and requires extensive cloud computing resources, possibly with multiple compression models tested in parallel. Overall the process can be described as "HPC intensive", prohibitive to perform directly on embedded targets.

The primary solution objectives are (i) to train and compress optimized models in the cloud, verifying real-time performance and accuracy with "device accurate" testing, and producing hardware-ready models that can be instantly downloaded to run real-time inference on embedded targets, and (ii) to provide this functionality as always available, always on. Achieving these objectives will enable automated and continuous training, compression, and testing for embedded targets, and avoid hand optimization and other "manual touching" on embedded targets, which are notoriously time-consuming to debug.

As a secondary benefit of this approach, embedded targets may acquire new data as needed, submit acquired data to the cloud, and repeat the training and compression process flow. This "deployment feedback loop" is essential, as IoT and Edge deep learning products are expected to continuously adapt and improve their learning and thus their performance.

Currently the solution supports Nvidia Tegra "Parker" SoCs and Intel x5-3940 Atom CPUs. OpenCV algorithms are used to identify video frame "regions of interest" which are then classified in real-time by the deep learning model. From an IoT and Edge product perspective, the power consumption objective is 5 and 10 W and the performance objective is to classify up to four (4) 224 x 224 regions per 720p frame at 30 fps.  Multiple models and compression methods / algorithms are supported with varying tradeoffs between performance (frame rate, number of regions), accuracy, and power consumption. Multiple video inputs are possible with additional tradeoffs between higher SoC / CPU core count and product size and weight.

The process flow diagram in the above picture gives an idea of the CICD nature of the model compression process. This Github page has more information

To view or add a comment, sign in

More articles by Jeff Brower

Insights from the community

Others also viewed

Explore topics