Mathematics for Data Science

Mathematics for Data Science

What Is Data Science? And Why Math Is Required?

 Data Science is a Way of solving business Problems using mathematics to make faster and simplified solution.

Tough Definition!! Let’s Simplify it.

Remember: Data Science = Mathematics + Computer Science

That Means: Computing Data Using Mathematical Solutions

To Become a Data Scientist, having a good understanding of programming languages, Machine Learning algorithms and following a data-driven approach is necessary.

Let’s Dig Down Into Mathematical Concepts

1. Statistics:

Statistics is the Collecting, Analysing, Presenting and interpreting data to assist in making more effective decisions. It extracts information from data.

The Statistics has the influence over all domains.

-        20% Chance of rain

-        Batting Averages

-        Chances of getting king of spade.

-        Share fall or rise in Stock Market 

        Variables: Any Characteristics of an Entity

                Qualitative: when we can’t perform operations. It is also called as Categorical Variables. Which is Subdivided into Ordinal and Nominal.

                Quantitative: when we can perform operations. It is also called as Numeric Variables

  • Population: Set of Sources from which data has to be collected.
  • Sample: Subset of Population.

Statistics are Sub-Divided into:

  • Descriptive Statistics: Method of Organising, summarising and presenting data.
  • Inferential Statistics: methods for estimating what the population characteristics might be, given what is known about the sample's characteristics.

2.     Probability Distribution: Is a Statistical Function that describes Possible Values likelihoods that a random variable can take within a given range.

Measures of Central Tendency:

        1.    Mean: Measure of Average of all the values                    

        2.    Median: Measure of Middle value in sorted order.                 

        3.    Mode: The highest Occurring value

Measures of the Spread

Just like the measure of centre, we also have measures of the spread, which comprises of the following measures:

        1.    Range: It is the given measure of how spread apart the values in a data set are.

        2.    Inter Quartile Range (IQR): It is the measure of variability, based on dividing a data set into quartiles.

        3.    Variance: It describes how much a random variable differs from its expected value. It entails computing squares of deviations.

                1.    Deviation is the difference between each element from the mean.

                2.    Population Variance is the average of squared deviations

                3.    Sample Variance is the average of squared differences from the mean

        4.    Standard Deviation: It is the measure of the dispersion of a set of data from its mean.

3.    Bernoulli Distribution:

It is a discrete Probability distribution. It is applied on independent events and is applicable to only 2 probabilities “Success” or “failure”.

Bernoulli Distribution Is also known as Binomial Distribution.

No alt text provided for this image

           Where p : probability of getting a success in single trail

                       1 – p : probability of getting failure in single trail

                       n : Total number of trials 

                       r : Number of Successes Desired

4.    Normal Distribution:

It is a bell-shaped curve which is symmetric about the mean. The area under the curve specifies the probability of occurrence within the specific ranges, so the total area under the curve is equal to 1 as the sum of all probabilities is 1 (Probability theorem).

There’s a special case of normal distribution is when the mean is 0 and the standard deviation is 1. This is called standard normal distribution.

Standard Normal Distribution

No alt text provided for this image

It was basically invented to simplify the integral computations coupled with the normal distribution when you have to calculate the probabilities. To convert any normal distribution to standard normal form we use the below formula to calculate the z-score

No alt text provided for this image

Hypothesis Testing:

           A hypothesis is an assumption about a population parameter.

Hypothesis testing involving one population focuses on confirming claims such as the population average is equal to a specific value.

Through hypothesis testing, you can determine whether there is enough evidence to conclude if the hypothesis about the population parameter is true or not.

Hypothesis Testing starts with the formulation of these two hypotheses:

Null hypothesis (H₀): Represents the status quo and involves stating the belief that the mean of the population is greater than or equal to, =, lesser than or equal to - a specific value.

Alternate hypothesis (H₁): represents the opposite of the null hypothesis and holds true if the null hypothesis is found to be false. It is also called Researchers hypothesis as researchers are always interested in proving this one right.

5.    Linear Algebra and Calculus:

Linear algebra uses the tools and methods of vector and matrix operations to determine the properties of linear systems. It covers topics such as vectors, vector spaces and matrix theory used for calculating and exploring the properties of vectors and matrices, the linear independence of vectors and the vector spaces underlying sets of vectors and matrices.

We'll continue the topic in detail at a later point✌.

To view or add a comment, sign in

More articles by Shraddha Shelar

  • 🌐 Big Data: Unleashing the Power of Information! 🚀📊

    💡 What is Big Data? 🤔 🌟Imagine a vast ocean of information, where every click, swipe, and search creates a data wave…

    4 Comments
  • Statistics and probability

    We have seen the overview of statistics in my previous post (Maths for Data Science), Lets Go through Some more Jargons…

    2 Comments
  • Introduction to Linear Algebra

    Many difficult problems can be handled easily when we organize relevant information in a certain way for which we can…

  • Introduction to SQL Course

    SQL (Structured Query Language) is a popular query language which is entirely designed for accessing RDBMS (Relational…

  • Introduction to Database Systems

    The first question Pop up in my mind “What is Data?” It is simply a piece of Information ..

  • Data Storytelling - Basic Data Visualization in Excel

    Data Visualization is a technique use to represent a data in a pictorial format to communicate information clearly and…

  • Data Storytelling - How to Choose the Right Chart or Graph for Your Data

    Representing a data in a form of Charts and Graphs is an art of Data Visualization. So, why is it required? Let me…

  • Computer Science for Business Professionals Part 2

    Let’s dive deeper in Cloud Computing….!!! Firstly, I’ll resolve the Elephant in the Room situation pertaining to…

    2 Comments
  • Computer Science for Business Professionals

    Let’s see the Programming Languages in broader prospective. Programming Languages are differentiated as Low Level and…

    1 Comment
  • Introduction to Computer Science

    Let's Start with a very basic question as 'What is Computer?'. A Machine that can be programmed for accepting data…

    9 Comments

Insights from the community

Others also viewed

Explore topics