Deep learning in a NutShell (Part-2)

Deep learning in a NutShell (Part-2)

We discussed about basics of Machine Learning and Deep Learning and types of Neural Networks in the previous article ( Deep Learning in a Nutshell Part-1 ) . In this article, we will briefly touch upon some of the internals of Neural networks like Loss Function, Optimizer and Activation Functions so we could quickly decide the best-fit for any specific use-case. Having said that, always remember that there is never any perfect model for any use-case.

"Essentially, all models are wrong, but some are useful." (Box, George E. P.; Norman R. Draper (1987))

Before we go through the internals, here is a quick recap on overview of the mapping between different types of inputs and appropriate network architectures:

  • Vector data: Densely connected network (dense layers).
  • Image data: 2D convnets.
  • Sound data (for example, waveform): Either 1D convnets (preferred) or RNNs.
  • Text data: Either 1D convnets (preferred) or RNNs.
  • Timeseries data: Either RNNs (preferred) or 1D convnets.
  • Other types of sequence data: Either RNNs or 1D convnets. Prefer RNNs if data ordering is strongly meaningful (for example, for timeseries, but not for text).
  • Video data: Either 3D convnets (if you need to capture motion effects) or a combination of a frame-level convnet for feature extraction followed by either an RNN or a 1D convnet to process the resulting sequences.
  • Volumetric data: 3D convnets.

Here are some basic definitions to help understand building blocks of a neural network -

  • Model Learning: Learning means a finding a combination of model parameters that  minimizes a loss function for a given set of training data samples and their corresponding targets. Learning happens by drawing random batches of data samples and their targets, and computing the gradient of the network parameters with respect to  the loss on the batch.The network parameters are then moved a bit (the magnitude of the move is defined by the learning rate) in the opposite direction from the  gradient.
  • Loss: The loss is the quantity you'll attempt to minimize during training, so it should  represent a measure of success for the task you are trying to solve.
  • Optimizer: The optimizer specifies the exact way in which the gradient of the loss will be used to update parameters: for instance, it could be the RMSProp optimizer (which can be used as a default), SGD with momentum, and so on.
  • Last Layer Activation Function: Activation function is used in every layer of Deep Neural Network (except in case of last layer of Linear Regression) to project input data onto a different hypothesis space. Last layer activation function plays an important role in projecting the output from the network into consumable form to get the desired results for our specific problem statement. For instance, if it is a classification problem, output should be the probability for different classes.

The diagram below can be used as a cheat-sheet while building Deep Learning models using Keras and Tensorflow for various problem statements to quickly put together a Deep Learning Model.

In the next article (which should be the last one in the series), we will read about some techniques to optimize the Deep Learning Model along with enumeration of some interesting use-cases.

Enjoy !

Other articles in this series -

Deep Learning in a Nutshell (Part-1)

Deep Learning in a Nutshell (Part-3)


To view or add a comment, sign in

More articles by Ankit Agarwal

Insights from the community

Others also viewed

Explore topics