Supervised machine learning - simplistic perspective on inference and programming trade-offs
Need for supervised machine learning models: The real world is analog and non-linear. Creating mathematical algorithms to work in a non-linear world is feasible but not easy. Supervised machine learning helps in dealing with problems that are not easily described using conventional/classical algorithms. The best example is images. Computer vision has greatly improved due to models created using machine learning.
What exactly are these models and how do they work?
Machine learning models as trainable probabilistic look up tables: Supervised machine learning is a resuscitated algorithm of the 20th century. If we give a supervised machine learning algorithm lots of data and associated answers (labels), it can magically come up with a set of rules. This is a trivial description of the algorithm. But it does sound magical ! The question is - how does it work? To understand more, let us use a low cost digital weather station as an example. This stand alone device (no internet connectivity) gets temperature and humidity data from an outdoor sensor and then makes a weather forecast.
In machine learning terminology, the weather station is running an inference of machine learning weather model created by the manufacturer. The weather station manufacturer uses a big data set of temperature and humidity conditions as data and and prevailing weather conditions as answers or labels. The machine learning algorithm comes up with a set of rules that associates temperature/humidity with weather conditions. These rules can be thought of as a mapping or Look Up Table. A micro-controller is programmed to execute this mapping ( or inference) to predict weather by using temperature and humidity as inputs.
As you can imagine the weather station is not always accurate, but it is not bad either. It can become better if we give it more data points to generate a better model. So does more data make a better model ? Yes, but not always ! Providing wide variety of data is extremely important. The model learns by mapping input to output and creates an internal look up table (stored in layers and nodes). Data provided to the model during training acts as scaffolds for strengthening the model's inference capability. In the example below, Google API is used to identify image of an Alpaca.
Trade-off: Most engineering designs involve some kind of trade-off. One of the ways, I like to discuss trade-off is to ask a group (who have driven a manual transmission car),
"How many can get a fully loaded car moving (from standstill) using 4/5th gear ?"
or
"How many can drive a car at 60 miles per hour in the first gear ?"
Usually no one claims to have accomplished either of the above. This provides the background to discuss the power equation of a motor. Power is the product of torque and speed. To get a standstill car to move we need higher torque, while at higher speeds we need less torque. But at any given time, the product of torque and speed needs to be constant. In the first gear we sacrifice speed in favor of torque and vice-versa at higher gears.
Creating robust machine learning models: At a recent event (Tesla Autonomy Day - 22 April 2019 - recommended watch for machine learning enthusiasts), a detailed explanation was given on how supervised machine learning models assist autonomous cars. Supervised machine learning based computer vision is used to perceive the surrounding and subsequently to maneuver and control the car. This one frame from the event, aptly summarizes the effort needed to build a good model to work in the real world.
What is the programming trade-off you may ask? In a classical program, creating rules is a cumbersome task. While in a supervised machine learning model, rules are created automatically. Shouldn't this make life easier for the programmer ?
Not really! As they say - there is no free lunch. To create a good machine learning model, we need large variety and quantity of data as fodder. Supervised machine learning takes away the drudgery of writing rules (code), but creating data sets for the model is a new requirement. Now the programmer is tasked with creating large quantities of labeled data sets. Continuously gathering corner case data from the real world and training the machine learning algorithm does not give the programmer lot of free time !
---------------------------------------------------------------------------------------------------------------