Day 2 - Statistics in Machine Learning

Day 2 - Statistics in Machine Learning

▶️ Range

The range is the easiest dispersion of data or measure of variability. The range can measure by subtracting the lowest value from the massive Number. The wide range indicates high variability, and the small range specifies low variability in the distribution. 

Range = Highest value - Lowest value

▶️ Interquartile Range (IQR)

Interquartile range is the amount of spread in the middle 50% of a dataset. In other words, it is the distance between the first quartile (Q1) and the third Quartile (Q3).

👉 Here's how to find the IQR:

➡️ Step 1: Put the data in order from least to greatest.

➡️ Step 2: Find the median. If the number of data points is odd, the median is the middle data point. If the number of data points is even, the median is the average of the middle two data points.

➡️ Step 3: Find the first quartile (Q1). The first quartile is the median of the data points to the left of the median in the ordered list.

➡️ Step 4: Find the third quartile (Q3). The third quartile is the median of the data points to the right of the median in the ordered list.

➡️ Step 5: Calculate IQR by subtracting (Q3) - (Q1).

No alt text provided for this image

▶️ Variance

Variance is a simple measure of dispersion. Variance measures how far each number in the dataset from the mean. To compute variance first, calculate the mean and squared deviations from a mean.

Population Variance

No alt text provided for this image

Sample Variance

No alt text provided for this image

▶️ Standard Deviation

Standard deviation measures the spread of a data distribution. The more spread out a data distribution is, the greater its standard deviation.

Overview of how to calculate Standard Deviation

The formula for standard deviation (SD) is

No alt text provided for this image

where 

  • ∑ means "sum of",  
  • x is a value in the data set, 
  • μ is the mean of the data set, and
  • N is the number of data points in the population.

👉 Here's a quick preview of the steps we're about to follow:

➡️ Step 1: Find the Mean.

➡️ Step 2: For each data point, find the square of its distance to the Mean.

➡️ Step 3: Sum the values from Step 2.

➡️ Step 4: Divide by the number of Data Points.

➡️ Step 5: Take the Square Root.

👉 An Important Note

The formula above is for finding the standard deviation of a Population. If you're dealing with a sample, you'll want to use a slightly different formula (below), which uses n−1 instead of N.

No alt text provided for this image

▶️ Population and Sample Standard Deviation

Standard deviation measures the spread of a data distribution. It measures the typical distance between each data point and the mean.

The formula we use for standard deviation depends on whether the data is being considered a population of its own, or the data is a sample representing a larger population.

  • If the data is being considered a population on its own, we divide by the number of data points, N.
  • If the data is a sample from a larger population, we divide by one fewer than the number of data points in the sample, n−1.

To view or add a comment, sign in

More articles by Mrityunjay Pathak

  • Bias and Variance and Its Trade Off

    There are various ways to evaluate a machine-learning model. Bias and Variance are one such way to help us in parameter…

  • Machine Learning Mathematics🔣

    Machine Learning is the field of study that gives computers the capability to learn without being explicitly…

  • How to Modify your GitHub Profile Readme File as your Portfolio

    What if you don't have a personal portfolio website? No worries! You can transform your GitHub README.md into a…

    4 Comments
  • Data Science Resources

    Are you starting your journey into the world of Data Science? Here's a curated list of top resources to master various…

  • 25 Python Sets Questions with Solution

    25 Python Sets Coding Questions along with Explanations for each. Let's get started ↓ Question 1: Write a Python…

  • 25 Python Tuple Questions with Solution

    25 Python Tuple Coding Questions along with Explanations for each. Let's get started ↓ Question 1: Find the length of a…

  • 25 Python Dictionary Questions and Solutions

    25 Python Dictionary Coding Questions along with Explanations for each. Let's get started ↓ Question 1: Create an empty…

  • 25 Python List Questions with Solution

    25 Python List Coding Questions along with Explanations for each. Let's get started ↓ Question: Given a list nums, find…

    2 Comments
  • 25 Python String Questions with Solution

    25 Python Strings Coding Questions along with Explanations for each. Let's get started ↓ Write a Python program to…

    3 Comments
  • 25 Python Loop Coding Questions

    25 Python Loop Coding Questions along with Explanations for each. Let's get started ↓ Print numbers from 1 to 10 using…

    3 Comments

Insights from the community

Others also viewed

Explore topics