Reinforcement Learning and the Rise of the Machines
© 2016 Hussain Abbas

Reinforcement Learning and the Rise of the Machines

Author: Hussain Abbas, MSc

© 2016. All rights reserved

We have heard a great deal recently about Deep Learning and its ability to perform automatic feature selection. That is pretty cool considering the difficulty of the problem, but there is another type of learning that is fundamentally different from your standard machine learning brew that most data scientists are familiar with which I am going to talk with you about today.

In Unsupervised Learning, you give a computer some data and it discovers some interesting patterns. When you don’t know what you are looking for and are just investigating the data, you use Unsupervised Learning.

In Supervised Learning, you have a hypothesis that some set of variables can be used to predict something and the output is either regression or classification. The key in Supervised Learning is that you must have labeled input/output data. The label is either numeric (regression) or an integer (classification). Deep Learning falls under this construct.

Enter the New Age

In Reinforcement Learning, you have a goal seeking agent that learns by interacting with its environment. To achieve its objective, it must strike a proper balance between exploration and exploitation. This allocation problem depends on the environment that the agent is operating in. Furthermore, the need to balance exploration and exploitation is unique to Reinforcement Learning as it does not occur in either unsupervised or supervised learning.

In the Reinforcement Learning paradigm, you have a set of states, a set of actions, and a set of rewards. In each state, you have a set of actions that can be performed and a set of rewards that correspond to each of the actions, with the goal being to maximize the long term cumulative reward. Actions determine future states and rewards determine future actions. Rewards can be positive (pleasure) or negative (punishment).

Let’s think about what that means in English – we have an autonomous agent that is able to learn in real-time that has a goal which it is actively working towards. It is both a participant and observer in the environment. It measures what it changes and it changes what it measures. Sound scary? Well it shouldn’t, since this is basically how humans learn from experience.

Let’s go to a real-world example: We’re playing an online naval PVP in which we are competing against an AI player powered by a reinforcement learning algorithm. The goal for each player is to either sink the enemy’s ship or land more shots on target than they do. We’ll start off by assuming the algorithm doesn’t know anything except for the rules and the goal. Initially, the algorithm is bad because it doesn’t know what I will do. But each round I notice something interesting – with each round the algorithm is getting better and better – it is learning from experience. Whenever I make a move, the algorithm counters – it’s almost as if he “knows” what I’m going to do next before I do it. Before long, the algorithm is an expert in gunnery, maneuvering, and defense and is consistently beating me in almost every round. Even when I do something new that the algorithm has never seen before, in the next round he comes back with a vengeance and “remembers” that move. Like the Borg in Star Trek, the algorithm continuously adapts to its environment. But by continuously adapting, it is not just reacting to the environment, but fundamentally changing it. In Reinforcement Learning, we measure what we change and we change what we measure.

Remember the algorithm’s goal is to maximize its long term cumulative reward. Thus, if need be, it can lure me into a trap by incurring a short term loss (getting hit) in order to maximize the reward (sink my ship). Furthermore, the trap may be so complex that I don’t even realize that is a trap – he could trick me into thinking that I am winning meanwhile he is constructing some ultra-complicated setup which at the end he estimates with 98% probability that he will win. Unsurprisingly, this is a technique used by some of the greatest chess masters.

The more economically inclined may have noticed that this sounds a lot like Game Theory, however there is a major difference. The Game Theoretic player would never allow itself to get to a state in which it could lose. The Reinforcement Learning player relaxes this assumption, thereby allowing it to undertake strategies in which it has a possibility of losing, but with small probability.

If it sounds impossible to beat such an algorithm, then you’re on the right track. It is the ultimate Bayesian, since it was never given an explicit model of the world – rather it learned solely by interacting with its environment. By doing so, it both changes its environment and it responds to changes in the environment. It can simulate thousands of games against human players and in each game it learns more and more thereby enabling it to get better and better. This is exactly why AlphaGo was able to recently beat the world champion in Go and how DeepMind was able to learn how to play Atari games better than human experts. Go is a game that is significantly harder than Chess due to the size of the state space, so for an AI player to beat a human player in Go is a major achievement.

Thus, we see that Reinforcement Learning is closer to the idea of Artificial Intelligence than any other field of Machine Learning since it does not use an explicit model, but rather creates the model from what it sees, and what it does impacts what it sees – a flywheel effect if you will.

With the rise of Reinforcement Learning, the lines between Machine Learning (The Applied) and Artificial Intelligence (The Theoretical) become increasingly blurred. Furthermore, the gains achieved by Reinforcement Learning can be easily extended via the integration of Unsupervised and Supervised Learning methods.

When combined with big data technologies such as Hadoop, Reinforcement learning can deployed to its fullest potential.  

To view or add a comment, sign in

More articles by Hussain Abbas, MSc

Insights from the community

Others also viewed

Explore topics