Introduction to Monte Carlo Sampling Method --a probabilistic approach
Monte Carlo Sampling Method- a probabilistic approach
There exists many problems where calculating probability is often straight forward but finding a desired quantity of probability is very difficult with nature of business or may be with exponential growing number with the random variable.
Instead we can do a random sampling to get an approximated value of desired quantity from the distribution. This process of doing a random sampling is referred as Monte Carlo Methods.
Need of Sampling: -
There are many problems in probability and machine learning where we cannot calculate the analytical function directly. The fact would be we will have to use a probabilistic approach to estimate the analytical function (target function). In other words, we can say that we want to have an argument that exact inference can be made possible through probabilistic approach.
What are Monte Carlo methods?
1-Estimate Density: - collect samples to approximate the target function.
2-Desired Quantity: - try to get the mean or variance of a distribution
3-Optimize the Function: - locate or find the sample that maximizes or minimizes the target function.
Drawing a sample is always something like finding the probability of any event. One would imagine that the same would be very tough with complex simulation of getting an analytical mapping function. This complex process on the other hand is known as Monte Carlo Simulation. Multiple samples are being drawn simultaneously to get an idea/estimate of the desired quantity. Given the fact that when we go with large numbers in successive trials being performed the more accurate the approximated quantity will become.
Examples of Monte Carlo Sampling Method: -
Basically, we use Monte Carlo methods more often than we think that we are using. For instance, when we state binomial/Bernoulli distribution for a coin flip then we want to simulate the same from the distribution which is nothing but a Monte Carlo Simulation. In addition to this when we sample data from a uniform distribution of numbers {1,2,3,4,5,6} to simulate the roll of the dice, we are again performing a Monte Carlo Simulation. Having said this, we are again using the same when we gather random sample of data from the domain/problem space to estimate the probability distribution by using a histogram or pdf method. As a matter of fact, Monte Carlo methods can be used for calculating probability of any event in any complex domain, be it any context. Relative idea of Monte Carlo inference has been implemented in Bayesian models which included recursive Bayesian inference. Many of our machine learning algorithms has been designed on the pervasive idea of drawing samples from a probability distribution by using which we will get an estimate of the desired quantity. In general Monte Carlo methods provide the very basics of resampling methods like bootstrap method in terms of deriving desired quantity and getting an accuracy of a model in a defined dataset.
Working with Code: --
In this example we will have a function that is going to define the probability distribution of a random variable. Let us assume that we have our mean as 50 and sigma as 5 from a normal distribution. Now we will draw samples from this distribution.
For the time being just assume that we do not know the probability distribution of this random variable rather we want to use samples which will derive an estimate of density. Now we will draw different samples to plot histograms for density.
*******************************************************************
# example of effect of size in distribution on monte carlo sample
#import necessary libraries
from numpy.random import normal
from matplotlib import pyplot as plt
# define the distribution
mu =50
sigma=5
#define the monte carlo size of different sample in a list
Size = [10,100,500,1000]
#iterate through a loop to verify the plot results
#here we will see four plots as per size list
for i in range(len(size)):
sample = normal(mu, sigma, size[i])
# plot histogram of sample one by one
plt.subplot(2,2,i+1)
plt.hist(sample, bins=20)
plt.title('%d samples' % size[i])
plt.xticks([])
#show the plot
plt.show()
*************************************************************************
Results:--
With large samples size growing in numbers we can see that, the small sample sizes of 10 and 50 do not effectively capture the density of the target function. It is 100 sample size when we are experiencing the relative estimate of the target function. With 1000 or more sample size we can exactly see the familiar bell curve of probability distribution or the normal distribution in this case.
With Regards,
Debi Prasad Rath.
Data Scientist
Xebia
@AmazeDataAI- Technical Architect | Machine Learning | Deep Learning | NLP | Gen AI | Azure | AWS | Databricks
5yHi Connections. Please check out my article...