What’s New in Deep Learning Research: Mobile Deep Learning with Google MnasNet

What’s New in Deep Learning Research: Mobile Deep Learning with Google MnasNet

Building deep learning models that can execute on mobile runtimes is a very active area of research in the artificial intelligence(AI) space. After all, mobile devices are a significant source of information and host of computations in the modern technology ecosystems. Among the deep learning techniques that have been trying to adapt to the mobile world, none is more relevant than convolutional neural networks(CNNs) given that they are a foundational block to image analysis methods which can unlock the door to many new scenarios for mobile apps. Google has been among the players leading the charge in the mobile deep learning space with research like Federated Learning or frameworks like TensorFlow Lite. Recently, researchers from the Google Brain team published a paper introducing MNasNet, a new method for designing CNN models that can effectively execute on mobile devices.

Mobile CNNs

Convolutional neural networks(CNNs) play a key role in many deep learning areas such as image classification, object detection and are also used as a dimensionality reduction step in many neural network architectures. Given the nature of the problem they are trying to solve and the complexity of convolutional operations, CNNs are notorious for being computationally expensive and not incredibly fast. As a result, it results very challenging to deploy state-of-the-art CNN models in resource-constrained environments such as mobile devices.

The obvious approach to design mobile CNNs is to create neural network architectures with limited depth and that utilize less expensive computations. This the core principle of methods such as depth convolution or group convolution. However, both approaches that resulted incredibly difficult to implement in practice as they require an upfront knowledge of the resource constraints of the environment which is almost impossible to model accurately in mobile devices. To design effective mobile CNNs, we need an approach that dynamically evaluate and adapt to resource changes in its underlying mobile runtime. There is another area of deep learning that its pretty good at dealing with that type of problem: reinforcement learning.

Enter Google MNasNet

The Google Brain research team working on MNasNet essentially reformulated the problem of designing mobile CNNs as a reinforcement learning problem. Conceptually, MNasNet uses an automated neural architecture search based on reinforcement learning for designing mobile CNNs. That’s a lot of buzzwords in a single sentence 😉. Google MNasNet is based on two fundamental ideas:

1) The problem of designing a mobile CNN is formulated as a multi-objective optimization problem that considers both accuracy and inference latency of CNN models.

2) MNasNet use architecture search with reinforcement learning to find the model that achieves the best trade-off between accuracy and latency.

The basic architecture of MNasNet consists mainly of three components: a RNN-based controller for learning and sampling model architectures, a trainer that builds and trains models to obtain the accuracy, and an inference engine for measuring the model speed on real mobile phones using TensorFlow Lite. As mentioned before, MNasNet formulates a multi-objective optimization problem that aims to achieve both high accuracy and high speed, and utilize a reinforcement learning algorithm with a customized reward function to find Pareto optimal solutions.

The key contribution of MNasNet is to reformulate the task of finding the right mobile CNN model for a mobile architecture as a reinforcement learning problem. Unlike previous architectures that optimize for indirect metrics such as the number of parameters, MNasNet runs the CNN models on real mobile devices and incorporate real world inference latency into the objective functions. By doing so, MNasNet is able to rapidly optimize a CNN model for a specific mobile environment.

In order to strike the right balance between search flexibility and search space size, Google MNasNet uses a novel factorized hierarchical search space, which factorizes a convolutional neural network into a sequence of blocks, and then uses a hierarchical search space to determine the layer architecture for each block. The MNetNet approach allows different layers to use different operations and connections. However, the model also forces all layers in each block to share the same structure, thus significantly reducing the search space size by orders of magnitude compared to a flat per-layer search space.

The Results

The Google Brain team tested MNasNet against state-of-the-art mobile CNN architectures and the results were very encouraging. Using datasets such as ImageNet or COCO object detection, MNasNet was able to achieve higher levels of accuracy than competitors while maintaining competitive levels of latency.

Google MNasNet is certainly a very creative approach to designing mobile CNNs. By incorporating real world latency information into the model’s objective function and use hierarchical search, MNasNet is able to find the best trade-off between accuracy and latency. It would be interesting to see how Google leverages MNasNet in their next generation mobile devices.

To view or add a comment, sign in

More articles by Jesus Rodriguez

Insights from the community

Others also viewed

Explore topics