Perishable Product Discounting with Reinforcement Learning

Pranab Ghosh

AI Consultant || MIT Alumni || Entrepreneur || Open Source Project Owner || Blogger

Published Oct 19, 2019

Some time ago a retailer brought up this problem with me. I implemented a solution using a variation of Reinforcement Learning called Multi Arm Bandit. The retailer discounts the product with the hope that the remaining inventory gets sold out by the expiration date.

A decision making dilemma

The retailer has to decide the discount rate and the lead time for the discount. There is a dilemma in making the decision.

If the discount rate is high and / or the discounting starts too early and the product gets sold out well before the expiry date, there is loss of revenue. On the other hand, if the the discount rate is low and / or the discounting starts too late, the remaining inventory may not sell out by the expiry date, again resulting in loss of revenue.

Multi Arm Bandit solution

So far the retailer made the decision regarding the discount rate and lead time based on past experience. But now the retailer is interested in automating the decision making process with Machine Learning. This is where Reinforcement Learning comes into the picture

Reinforcement Learning helps you make the optimal decision, which in this case is the combination of discount rate and discount lead time that will maximize the revenue.

Reinforcement Learning is used for decision making tasks in an uncertain environment. It involves state, action and reward. For a given state, the algorithm takes certain action and gets reward for the action at some future time. As a result of taking the action, the state changes.

Game playing is the quintessential example of Reinforcement Learning application. Unlike games, there are many decision making tasks that don't involve states.

The algorithms used for such decision making tasks without states are collectively known as Multi Arm Bandit.

Our price discounting problem has no state. There is only action and reward. The action is the discount rate, discount lead time tuple. The reward is the revenue for the product sale.

The Multi Arm Bandit model is continuously learning system, learning from past experiences. The algorithm makes decisions based what has been learnt so far. All Multi Arm Bandit algorithms strive to maximize the reward over some time horizon.

Perishable Product Discounting with Reinforcement Learning

Pranab Ghosh

AI Consultant || MIT Alumni || Entrepreneur || Open Source Project Owner || Blogger

A decision making dilemma

Multi Arm Bandit solution

Further Reading

More articles by Pranab Ghosh

Insights from the community

Others also viewed

Linear Reward Penalty Algorithm, Parameterized Policy Representation, Evaluation of Policy, REINFORCE Algorithm.

A Brief And Absolutely Incomplete Guide To Reinforcement Learning

What is Reinforcement Learning?

Reinforcement Learning: Introduction

Reinforcement Learning - Making Smarter Machines

6 Tips To Get You Started With Reinforcement Learning

Edition 1: Understanding the Basics of Reinforcement Learning

Markov Decision Processes (MDPs) in Reinforcement Learning

All About Monte Carlo Algorithm in Reinforcement Learning "Hit & Trial" || Basic

Artificial Intelligence - Reinforcement Learning and its Practical Applications

Explore topics

A decision making dilemma

Multi Arm Bandit solution

Further Reading

More articles by Pranab Ghosh

Does AutoML make Data Scientists obsolete? Not so fast.

Quick and Easy Sentiment Analysis using Google Search Result size and Mutual Information

Black Box Machine Learning may be harmful

Essential Differences between Deep Learning and Conventional Neural Network

Big Data ETL Does Not Have to Cost Big Bucks

The Amazing Power of Generalization

Sometimes the Only Path to Survival is Big Data

Prescriptive Analytics is Predictive Analytics Inverted

When Approximation is Good Enough

You Really Need Spark When ...

Insights from the community

Others also viewed

Linear Reward Penalty Algorithm, Parameterized Policy Representation, Evaluation of Policy, REINFORCE Algorithm.

A Brief And Absolutely Incomplete Guide To Reinforcement Learning

What is Reinforcement Learning?

Reinforcement Learning: Introduction

Reinforcement Learning - Making Smarter Machines

6 Tips To Get You Started With Reinforcement Learning

Edition 1: Understanding the Basics of Reinforcement Learning

Markov Decision Processes (MDPs) in Reinforcement Learning

All About Monte Carlo Algorithm in Reinforcement Learning "Hit & Trial" || Basic

Artificial Intelligence - Reinforcement Learning and its Practical Applications

Explore topics