Why Statistics is Important

Why Statistics is Important

Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. A significant portion of data analysis principles is rooted in statistics and, by extension, mathematics. You don't need to grasp every statistical nuance or transform into a mathematician to excel in this field. Here are some key concepts to understand:

Measures of Central Tendency and Variability

1. Mean: The average of a set of numbers.

2. Median: The middle number in a sorted list.

3. Mode: The number that occurs most frequently in a set.

4. Range: The difference between the highest and lowest values.

Example

Given the following numbers: 2, 3, 7, 3, 2, 1, 2

- Mean: Sum of all numbers / Count of numbers = (2 + 3 + 7 + 3 + 2 + 1 + 2) / 7 = 20 / 7 = 2.86

- Median: Sorted list (ascending): 1, 2, 2, 2, 3, 3, 7. The median is 2.

- Mode: The number that occurs most frequently is 2.

- Range: 7 (highest) - 1 (lowest) = 6

Case Study

Consider this scenario for a clear understanding of the measure of central tendency:

Imagine tracking your weekly actual cost of living. Most weeks, you incur a consistent expense around $100. However, in one exceptional week, an unexpected emergency arises, and you end up spending $700.

- Expenses: $100, $100, $100, $700

- Median = $100; Mean = $250

When applying the mean (average) to analyze your data, it might mislead you. The mean could give a figure that doesn't accurately represent your regular spending pattern because it's heavily influenced by that one outlier week with the $700 expense. In this case, the median might be more accurate.

Other Statistical Measures

1. Standard Deviation: Measures how far the distribution of the data is from the mean. The lower the standard deviation, the better.

- Example: 10, 20, 30, 40, 50 vs. 40, 10, 5, 5, 90. Both have a mean of 30, but the second set has a higher standard deviation.

2. Degrees of Freedom: The number of values in the final calculation of a statistic that are free to vary. Usually calculated as n-1, where n is your sample size. A larger sample size is generally preferred.

- Example: The degrees of freedom for a sample size of 100,000 is 99,999.

3. Normally Distributed: When the mean, median, and mode are equal or similar.

- Example: For the numbers 1, 2, 3, 4, 5, 6, 1, 2, 3, the mean, median, and mode are all 3.

4. Standard Error: Indicates how different the population mean is likely to be from a sample mean. It tells you how much the sample mean would vary if you were to repeat a study using new samples from within a single population. Usually, a low standard error is desired.

- Example: If the population mean is 25 and the sample mean is 23, then the standard error is 2.

5. Confidence Interval: The range when you add and subtract the standard error from the sample mean.

- Example: If the sample mean is 23 and the standard error is 2, then the confidence interval is 21-25, meaning we are confident that our population result will be within this range.

6. Statistically Significant: Indicates that we are more than 95% confident that our result is accurate, meaning the chances of error are less than 5%. Generally, a p-value less than 0.05 is expected. In medicine, you have to be over 99% sure.

- Basis for accepting or rejecting our hypothesis.

Additional Concepts

1. Maximum: The largest number in a distribution.

- Example: In the set 2, 3, 7, 3, 2, 1, 2, the maximum is 7.

2. Minimum: The lowest number in a distribution.

- Example: In the set 2, 3, 7, 3, 2, 1, 2, the minimum is 1.

3. Probability: The likelihood of an event occurring, ranging from 0 to 1.

4. Odds Ratio: Quantifies the strength of the association between two events.

- Example: Older people have higher odds of experiencing a stroke compared to younger people.

5. Percentile: Used to compare an individual to others within a distribution, expressed as a percentage.

- Example: If you are in the 90th percentile in your class, your grade is higher than 90% of your classmates.

6. Quartiles: Divide a distribution into four equal groups.

- Example: Q1 (25th percentile), Q2 (median), Q3 (75th percentile).

Conclusion

Statistics is a powerful tool for analyzing and interpreting data. By understanding these fundamental concepts, you can gain valuable insights and make informed decisions. Be curious and continue to learn more about statistical methods to enhance your analytical skills.

To view or add a comment, sign in

More articles by Alka kumari

  • DATA ANALYTICS

    Data analytics is the art and science of drawing actionable insights from data. Data Analytics + Business Knowledge =…

  • Data Analytics Techniques

    Part 1: What is Data Analysis and What Does a Data Analyst Do? What is Data Analysis? 1. There is no one-size-fits-all…

    1 Comment
  • The Epic 2024 Guide to Data Mastery

    In today’s data-driven world, the role of a Data Scientist is not just lucrative but also transformative. These highly…

  • The data analytics project life cycle

    Here are the Important stages: Identifying the right problem statements for your Business Problem Designing the right…

  • 📊 Introduction to Descriptive Statistics 📊

    Descriptive statistics are a set of techniques and methods used in data analysis to provide a clear and concise summary…

    1 Comment
  • Special matrix types

    Identity matrix An identity matrix is a special type of square matrix in linear algebra. It is denoted as I and has…

  • Vector vector multiplication (dot product) 2

    The dot product, also known as the scalar product, is a key operation in linear algebra. It involves multiplying the…

  • Introduction to NumPy part 3/3

    Randomly generated arrays In addition to creating and manipulating multi-dimensional arrays, NumPy also provides the…

  • Introduction to NumPy part 2/3

    Multi-dimensional arrays One of the key advantages of NumPy is its ability to efficiently work with multi-dimensional…

  • Introduction to NumPy part 1

    Numpy is a powerful Python library that provides support for large, multi-dimensional arrays and matrices, along with a…

    4 Comments

Insights from the community

Others also viewed

Explore topics