Chapter 2.2 : Self-Driving Car [Intro to TensorFlow & Deep Neural Network]
You can find here the whole code with 86% of accuracy : TenserFlow Exercise
We continue our fascinating adventure and we go deeper and deeper to make you discover the concepts essential to the development of our autonomous car, before starting our project we still need some notions that we will discover in this chapter., So Let's do it ^^
What's Deep Learning ?
Deep Learning is an exiting branch of Machine Learning that uses data to teach computers how to do things only humans were capable before.
You can find more informations here : Deep Learning - Wikipedia
Installing TensorFlow
Throughout this Chapter, we will apply our knowledge of neural networks on real datasets using TensorFlow, an open source Deep Learning library created by Google.
We will use TensorFlow to classify images from the notMNIST dataset - a dataset of images of English letters from A to J. You can see a few example images below. Our goal is to automatically detect the letter based on the image in the dataset.
Install : OS X, Linux
Prerequisites
We requires Python 3.4 or higher and Anaconda. If you don't meet all of these requirements, please install the appropriate package(s).
Install TensorFlow
You're going to use an Anaconda environment. If you're unfamiliar with Anaconda environments, check out the official documentation. More information, tips, and troubleshooting for installing tensorflow on Windows can be found here.
Run the following commands to setup your environment:
conda create --name=IntroToTensorFlow python=3 anaconda source activate IntroToTensorFlow
conda install -c conda-forge tensorflow
That's it! You have a working environment with TensorFlow.
Hello, world!
Try running the following code in your Python console to make sure you have TensorFlow properly installed. The console will print "Hello, world!" if TensorFlow is installed. Don’t worry about understanding what it does. We will learn about it.
import tensorflow as tf # Create TensorFlow object called tensor hello_constant = tf.constant('Hello World!') with tf.Session() as sess: # Run the tf.constant operation in the session output = sess.run(hello_constant)
print(output)
Let’s analyze the Hello World script you ran.
Tensor
In TensorFlow, data isn’t stored as integers, floats, or strings. These values are encapsulated in an object called a tensor. In the case of hello_constant = tf.constant('Hello World!'), hello_constant is a 0-dimensional string tensor, but tensors come in a variety of sizes as shown below:
# A is a 0-dimensional int32 tensor A = tf.constant(1234) # B is a 1-dimensional int32 tensor B = tf.constant([123,456,789]) # C is a 2-dimensional int32 tensor
C = tf.constant([ [123,456,789], [222,333,444] ])
tf.constant() is one of many TensorFlow operations you will use. The tensor returned by tf.constant() is called a constant tensor, because the value of the tensor never changes.
A "TensorFlow Session", is an environment for running a graph. The session is in charge of allocating the operations to GPU(s) and/or CPU(s), including remote machines. Let’s see how you use it.
with tf.Session() as sess: output = sess.run(hello_constant)
print(output)
The code has already created the tensor, hello_constant, from the previous lines. The next step is to evaluate the tensor in a session.
The code creates a session instance, sess, using tf.Session. The sess.run() function then evaluates the tensor and returns the results.
After you run the above, you will see the following printed out: 'Hello World'
Input
In the line above, We passed a tensor into a session and it returned the result. What if you want to use a non-constant? This is where tf.placeholder() and feed_dict come into place. We Will go over the basics of feeding data into TensorFlow.
tf.placeholder()
Sadly you can’t just set x to your dataset and put it in TensorFlow, because over time you'll want your TensorFlow model to take in different datasets with different parameters. You need tf.placeholder()!
tf.placeholder() returns a tensor that gets its value from data passed to the tf.session.run() function, allowing you to set the input right before the session runs.
Session’s feed_dict
x = tf.placeholder(tf.string) with tf.Session() as sess:
output = sess.run(x, feed_dict={x: 'Hello World'})
Use the feed_dict parameter in tf.session.run() to set the placeholder tensor. The above example shows the tensor x being set to the string "Hello, world". It's also possible to set more than one tensor using feed_dict as shown below.
x = tf.placeholder(tf.string) y = tf.placeholder(tf.int32) z = tf.placeholder(tf.float32) with tf.Session() as sess: output = sess.run(x, feed_dict={x: 'Test String', y: 123, z: 45.67})
Note: If the data passed to the feed_dict doesn’t match the tensor type and can’t be cast into the tensor type, you’ll get the error “ValueError: invalid literal for...”.
Let's see how well you understand tf.placeholder() and feed_dict. You can find an exercise that return the number 123.
import tensorflow as tf def run(): output = None x = tf.placeholder(tf.int32) with tf.Session() as sess: output = sess.run(x, feed_dict={x: 123}) return output
run()
TensorFlow Math
Getting the input is great, but now you need to use it. We're going to use basic math functions that everyone knows and loves - add, subtract, multiply, and divide - with tensors. (There's many more math functions you can check out in the documentation.)
Addition
x = tf.add(5, 2) # 7
We will start with the add function. The tf.add() function does exactly what you expect it to do. It takes in two numbers, two tensors, or one of each, and returns their sum as a tensor.
Converting types
It may be necessary to convert between types to make certain operators work together. For example, if you tried the following, it would fail with an exception:
tf.subtract(tf.constant(2.0),tf.constant(1)) # Fails with ValueError: Tensor conversion requested dtype float32 for Tensor with dtype int32:
That's because the constant 1 is an integer but the constant 2.0 is a floating point value and subtract expects them to match.
In cases like these, you can either make sure your data is all of the same type, or you can cast a value to another type. In this case, converting the 2.0 to an integer before subtracting, like so, will give the correct result:
tf.subtract(tf.cast(tf.constant(2.0), tf.int32), tf.constant(1)) # 1
Classification
Good job! We've accomplished a lot. In particular, We did the following:
- Ran operations in tf.session.
- Created a constant tensor with tf.constant().
We know the basics of TensorFlow, so let's take a break and get back to the theory of neural networks. In the next few lines, we're going to learn about one of the most popular applications of neural networks - classification.
So, Classification is the task of taking an input like this, a letter, and giving it a label that says for example this is an A. The typical setting is that you have a lot of examples, called the training sets, that have already been sorted.
In when you have a completely new example, your goal is going to be to figure out which of this classes it belongs to.
Classification is the central building block of machine learning. Once you know how to classify things, it's very easy, for example, to learn how to detect them or the rank them.
Logistic classifier
So, let's get started training a logistic classifier.
A logistic classifier is what's called the linear classifier, it takes the input, for example the pixels of an image, and applies a linear function to them to generate its predictions.
A linear function is just a giant matrix multiplier, it takes all the inputs as a big vector X and multiplies them with a matrix to generates its predictions, one per output class.
Throughout we'll denote the input by X, the weights by W and the bias by b, the weights of that matrix and the bias is where the machine learning comes in. We're going to train that model. That means we're going to try to find the values for the weights and bias which are good at performing those predictions.
How are we going to use this scores to perform the classification ? Well let's recap our task. Each image that we have as an input can have one and only one possible label. So we're going to turn those scores into probabilities. We're going to want the probability of the correct class very close to 1, and the probability of every others class to be close to 0.
The way to turn scores into probabilities is to use a SoftMax function. Beyond the formula what's important to know about it, is that can take any kind of scores and turn them into proper probabilities. Scores in the context of logistic regression are often also called Logits.
TensorFlow Linear Function
Let’s derive the function y = Wx + b. We want to translate our input, x, to labels, y.
For example, imagine we want to classify images as digits. x would be our list of pixel values, and y would be the logits, one for each digit. Let's take a look at y = Wx, where the weights, W, determine the influence of x at predicting each y.
y = Wx allows us to segment the data into their respective labels using a line.
However, this line has to pass through the origin, because whenever x equals 0, then y is also going to equal 0.
We want the ability to shift the line away from the origin to fit more complex data. The simplest solution is to add a number to the function, which we call “bias”.
Our new function becomes Wx + b, allowing us to create predictions on linearly separable data. Let’s use a concrete example and calculate the logits.
Transposition
We've been using the y = Wx + b function for our linear function.
But there's another function that does the same thing, y = xW + b. These functions do the same thing and are interchangeable, except for the dimensions of the matrices involved.
To shift from one function to the other, you simply have to swap the row and column dimensions of each matrix. This is called transposition.
For rest, we actually use xW + b, because this is what TensorFlow uses.
x now has the dimensions 1x3, W now has the dimensions 3x2, and b now has the dimensions 1x2. Calculating this will produce a matrix with the dimension of 1x2.
We now have our logits! The columns represent the logits for our two labels.
Now you can learn how to train this function in TensorFlow.
Weights and Bias in TensorFlow
The goal of training a neural network is to modify weights and biases to best predict the labels. In order to use weights and bias, you'll need a Tensor that can be modified. This leaves out tf.placeholder() and tf.constant(), since those Tensors can't be modified. This is where tf.Variable class comes in.
x = tf.Variable(5)
The tf.Variable class creates a tensor with an initial value that can be modified, much like a normal Python variable. This tensor stores its state in the session, so you must initialize the state of the tensor manually. You'll use the tf.global_variables_initializer() function to initialize the state of all the Variable tensors.
init = tf.global_variables_initializer() with tf.Session() as sess:
sess.run(init)
The tf.global_variables_initializer() call returns an operation that will initialize all TensorFlow variables from the graph. You call the operation using a session to initialize all the variables as shown above. Using the tf.Variable class allows us to change the weights and bias, but an initial value needs to be chosen.
Initializing the weights with random numbers from a normal distribution is good practice. Randomizing the weights helps the model from becoming stuck in the same place every time you train it.
Similarly, choosing weights from a normal distribution prevents any one weight from overwhelming other weights. You'll use the tf.truncated_normal() function to generate random numbers from a normal distribution.
n_features = 120 n_labels = 5 weights = tf.Variable(tf.truncated_normal((n_features, n_labels)))
The tf.truncated_normal() function returns a tensor with random values from a normal distribution whose magnitude is no more than 2 standard deviations from the mean.
Since the weights are already helping prevent the model from getting stuck, you don't need to randomize the bias. Let's use the simplest solution, setting the bias to 0.
n_labels = 5 bias = tf.Variable(tf.zeros(n_labels))
The tf.zeros() function returns a tensor with all zeros.
Softmax
We will implement a softmax(x) function that takes in x, a one or two dimensional array of logits.
In the one dimensional case, the array is just a single set of logits. In the two dimensional case, each column in the array is a set of logits. The softmax(x) function should return a NumPy array of the same shape as x.
For example, given a one-dimensional array:
# logits is a one-dimensional array with 3 elements logits = [1.0, 2.0, 3.0] # softmax will return a one-dimensional array with 3 elements print softmax(logits)
$ [ 0.09003057 0.24472847 0.66524096]
Given a two-dimensional array where each column represents a set of logits:
# logits is a two-dimensional array logits = np.array([ [1, 2, 3, 6], [2, 4, 5, 6], [3, 8, 7, 6]]) # softmax will return a two-dimensional array with the same shape print softmax(logits) $ [ [ 0.09003057 0.00242826 0.01587624 0.33333333] [ 0.24472847 0.01794253 0.11731043 0.33333333] [ 0.66524096 0.97962921 0.86681333 0.33333333]
]
import numpy as np def softmax(x): """Compute softmax values for each sets of scores in x.""" return np.exp(x) / np.sum(np.exp(x), axis=0) logits = [3.0, 1.0, 0.2]
print(softmax(logits))
TensorFlow Softmax
Now that you've built a softmax function from scratch, let's see how softmax is done in TensorFlow.
x = tf.nn.softmax([2.0, 1.0, 0.2])
Easy as that! tf.nn.softmax() implements the softmax function for you. It takes in logits and returns softmax activations.
import tensorflow as tf def run(): output = None logit_data = [2.0, 1.0, 0.1] logits = tf.placeholder(tf.float32) softmax = tf.nn.softmax(logits) with tf.Session() as sess: output = sess.run(softmax, feed_dict={logits: logit_data}) return output
One-Hot Encoding
We need a way to represent our label mathematically. We just said, let's have the probabilities for the correct class be close to one and the probability for all the others be close to zero. Se each label will be represented by a vector, that is as long as there are classes and it has the value 1.0 for the correct class and 0 for everywhere else.
So let's recap until here !
So we have an input, it's going to be turned into logits using a linear model, which is basically your matrix multiply and a bias. We're then going to feed the logits, which are scores, into a softmax to turn them into probabilities. And we're going to compare those probabilities to the One-Hot-Encoded labels using the cross entropy function. This entire setting is often called multinomial logistic classification. D(S(WX+b), L)
Minimizing Cross Entropy
So the question here is, how we're going to find those weights W and those bias b that will get our classifier to do what we want it to do. That is have a low distance for the correct class and have a high distance for the incorrect class.
One thing we can do is measure that distance averaged over the entire training sets for all the inputs and all the labels that you have available, that's called the training Loss. This loss, which is the average Cross-Entropy over your entire training set, is one humongous function. Every example is your training set gets multiplied by this one big Matrix W and then they get all added up in one big SUM. We want all the distances to be small, which would mean we're doing a good job at classifying every example in the training data. So we want to loss to be small. The loss is a function of the weights and the biases. So we are simply going to try and minimize that function. -> Gradient Descent.
Mini-batching
In this last section, we'll go over what mini-batching is and how to apply it in TensorFlow.
Mini-batching is a technique for training on subsets of the dataset instead of all the data at one time. This provides the ability to train a model, even if a computer lacks the memory to store the entire dataset.
Mini-batching is computationally inefficient, since you can't calculate the loss simultaneously across all samples. However, this is a small price to pay in order to be able to run the model at all.
It's also quite useful combined with SGD. The idea is to randomly shuffle the data at the start of each epoch, then create the mini-batches. For each mini-batch, you train the network weights with gradient descent. Since these batches are random, you're performing SGD with each batch.
Let's look at the MNIST dataset with weights and a bias to see if your machine can handle it.
from tensorflow.examples.tutorials.mnist import input_data import tensorflow as tf n_input = 784 # MNIST data input (img shape: 28*28) n_classes = 10 # MNIST total classes (0-9 digits) # Import MNIST data mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True) # The features are already scaled and the data is shuffled train_features = mnist.train.images test_features = mnist.test.images train_labels = mnist.train.labels.astype(np.float32) test_labels = mnist.test.labels.astype(np.float32) # Weights & bias weights = tf.Variable(tf.random_normal([n_input, n_classes])) bias = tf.Variable(tf.random_normal([n_classes]))
TensorFlow Mini-batching
In order to use mini-batching, you must first divide your data into batches.
Unfortunately, it's sometimes impossible to divide the data into batches of exactly equal size. For example, imagine you'd like to create batches of 128 samples each from a dataset of 1000 samples. Since 128 does not evenly divide into 1000, you'd wind up with 7 batches of 128 samples, and 1 batch of 104 samples. (7*128 + 1*104 = 1000)
In that case, the size of the batches would vary, so you need to take advantage of TensorFlow's tf.placeholder() function to receive the varying batch sizes.
Continuing the example, if each sample had n_input = 784 features and n_classes = 10 possible labels, the dimensions for features would be [None, n_input] and labels would be [None, n_classes].
# Features and Labels features = tf.placeholder(tf.float32, [None, n_input]) labels = tf.placeholder(tf.float32, [None, n_classes])
What does None do here?
The None dimension is a placeholder for the batch size. At runtime, TensorFlow will accept any batch size greater than 0.
Going back to our earlier example, this setup allows you to feed features and labels into the model as either the batches of 128 samples or the single batch of 104 samples.
Now that you know the basics, let's learn how to implement mini-batching.
Let's use mini-batching to feed batches of MNIST features and labels into a linear model.
Set the batch size and run the optimizer over all the batches with the batches function. The recommended batch size is 128. If you have memory restrictions, feel free to make it smaller.
from tensorflow.examples.tutorials.mnist import input_data import tensorflow as tf import numpy as np from helper import batches learning_rate = 0.001 n_input = 784 # MNIST data input (img shape: 28*28) n_classes = 10 # MNIST total classes (0-9 digits) # Import MNIST data mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True) # The features are already scaled and the data is shuffled train_features = mnist.train.images test_features = mnist.test.images train_labels = mnist.train.labels.astype(np.float32) test_labels = mnist.test.labels.astype(np.float32) # Features and Labels features = tf.placeholder(tf.float32, [None, n_input]) labels = tf.placeholder(tf.float32, [None, n_classes]) # Weights & bias weights = tf.Variable(tf.random_normal([n_input, n_classes])) bias = tf.Variable(tf.random_normal([n_classes])) # Logits - xW + b logits = tf.add(tf.matmul(features, weights), bias) # Define loss and optimizer cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels)) optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost) # Calculate accuracy correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) batch_size = 128 assert batch_size is not None, 'You must set the batch size' init = tf.global_variables_initializer() with tf.Session() as sess: sess.run(init) # TODO: Train optimizer on all batches for batch_features, batch_labels in batches(batch_size, train_features, train_labels): sess.run(optimizer, feed_dict={features: batch_features, labels: batch_labels}) # Calculate accuracy for test dataset test_accuracy = sess.run( accuracy, feed_dict={features: test_features, labels: test_labels}) print('Test Accuracy: {}'.format(test_accuracy))
Test Accuracy: 0.10819999873638153
The accuracy is low, but you probably know that you could train on the dataset more than once. You can train a model using the dataset multiple times.
Epochs
An epoch is a single forward and backward pass of the whole dataset. This is used to increase the accuracy of the model without requiring more data. This section will cover epochs in TensorFlow and how to choose the right number of epochs.
from tensorflow.examples.tutorials.mnist import input_data import tensorflow as tf import numpy as np from helper import batches # Helper function created in Mini-batching section def print_epoch_stats(epoch_i, sess, last_features, last_labels): """ Print cost and validation accuracy of an epoch """ current_cost = sess.run( cost, feed_dict={features: last_features, labels: last_labels}) valid_accuracy = sess.run( accuracy, feed_dict={features: valid_features, labels: valid_labels}) print('Epoch: {:<4} - Cost: {:<8.3} Valid Accuracy: {:<5.3}'.format( epoch_i, current_cost, valid_accuracy)) n_input = 784 # MNIST data input (img shape: 28*28) n_classes = 10 # MNIST total classes (0-9 digits) # Import MNIST data mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True) # The features are already scaled and the data is shuffled train_features = mnist.train.images valid_features = mnist.validation.images test_features = mnist.test.images train_labels = mnist.train.labels.astype(np.float32) valid_labels = mnist.validation.labels.astype(np.float32) test_labels = mnist.test.labels.astype(np.float32) # Features and Labels features = tf.placeholder(tf.float32, [None, n_input]) labels = tf.placeholder(tf.float32, [None, n_classes]) # Weights & bias weights = tf.Variable(tf.random_normal([n_input, n_classes])) bias = tf.Variable(tf.random_normal([n_classes])) # Logits - xW + b logits = tf.add(tf.matmul(features, weights), bias) # Define loss and optimizer learning_rate = tf.placeholder(tf.float32) cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels)) optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost) # Calculate accuracy correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) init = tf.global_variables_initializer() batch_size = 128 epochs = 10 learn_rate = 0.001 train_batches = batches(batch_size, train_features, train_labels) with tf.Session() as sess: sess.run(init) # Training cycle for epoch_i in range(epochs): # Loop over all batches for batch_features, batch_labels in train_batches: train_feed_dict = { features: batch_features, labels: batch_labels, learning_rate: learn_rate} sess.run(optimizer, feed_dict=train_feed_dict) # Print cost and validation accuracy of an epoch print_epoch_stats(epoch_i, sess, batch_features, batch_labels) # Calculate accuracy for test dataset test_accuracy = sess.run( accuracy, feed_dict={features: test_features, labels: test_labels})
print('Test Accuracy: {}'.format(test_accuracy))
Each epoch attempts to move to a lower cost, leading to better accuracy.
The accuracy only reached 0.86, but that could be because the learning rate was too high. Lowering the learning rate would require more epochs, but could ultimately achieve better accuracy.
You can find here the whole code with 86% of accuracy : TenserFlow Exercise
Good job! You built a one layer TensorFlow network! However, you want to build more than one layer. This is deep learning after all! In the next Chapter, you will start to satisfy your need for more layers.
Freelance Senior Data Analyst & Product Owner
5yJonathan Leroy