Why Statistics is Important
Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. A significant portion of data analysis principles is rooted in statistics and, by extension, mathematics. You don't need to grasp every statistical nuance or transform into a mathematician to excel in this field. Here are some key concepts to understand:
Measures of Central Tendency and Variability
1. Mean: The average of a set of numbers.
2. Median: The middle number in a sorted list.
3. Mode: The number that occurs most frequently in a set.
4. Range: The difference between the highest and lowest values.
Example
Given the following numbers: 2, 3, 7, 3, 2, 1, 2
- Mean: Sum of all numbers / Count of numbers = (2 + 3 + 7 + 3 + 2 + 1 + 2) / 7 = 20 / 7 = 2.86
- Median: Sorted list (ascending): 1, 2, 2, 2, 3, 3, 7. The median is 2.
- Mode: The number that occurs most frequently is 2.
- Range: 7 (highest) - 1 (lowest) = 6
Case Study
Consider this scenario for a clear understanding of the measure of central tendency:
Imagine tracking your weekly actual cost of living. Most weeks, you incur a consistent expense around $100. However, in one exceptional week, an unexpected emergency arises, and you end up spending $700.
- Expenses: $100, $100, $100, $700
- Median = $100; Mean = $250
When applying the mean (average) to analyze your data, it might mislead you. The mean could give a figure that doesn't accurately represent your regular spending pattern because it's heavily influenced by that one outlier week with the $700 expense. In this case, the median might be more accurate.
Other Statistical Measures
1. Standard Deviation: Measures how far the distribution of the data is from the mean. The lower the standard deviation, the better.
- Example: 10, 20, 30, 40, 50 vs. 40, 10, 5, 5, 90. Both have a mean of 30, but the second set has a higher standard deviation.
2. Degrees of Freedom: The number of values in the final calculation of a statistic that are free to vary. Usually calculated as n-1, where n is your sample size. A larger sample size is generally preferred.
- Example: The degrees of freedom for a sample size of 100,000 is 99,999.
Recommended by LinkedIn
3. Normally Distributed: When the mean, median, and mode are equal or similar.
- Example: For the numbers 1, 2, 3, 4, 5, 6, 1, 2, 3, the mean, median, and mode are all 3.
4. Standard Error: Indicates how different the population mean is likely to be from a sample mean. It tells you how much the sample mean would vary if you were to repeat a study using new samples from within a single population. Usually, a low standard error is desired.
- Example: If the population mean is 25 and the sample mean is 23, then the standard error is 2.
5. Confidence Interval: The range when you add and subtract the standard error from the sample mean.
- Example: If the sample mean is 23 and the standard error is 2, then the confidence interval is 21-25, meaning we are confident that our population result will be within this range.
6. Statistically Significant: Indicates that we are more than 95% confident that our result is accurate, meaning the chances of error are less than 5%. Generally, a p-value less than 0.05 is expected. In medicine, you have to be over 99% sure.
- Basis for accepting or rejecting our hypothesis.
Additional Concepts
1. Maximum: The largest number in a distribution.
- Example: In the set 2, 3, 7, 3, 2, 1, 2, the maximum is 7.
2. Minimum: The lowest number in a distribution.
- Example: In the set 2, 3, 7, 3, 2, 1, 2, the minimum is 1.
3. Probability: The likelihood of an event occurring, ranging from 0 to 1.
4. Odds Ratio: Quantifies the strength of the association between two events.
- Example: Older people have higher odds of experiencing a stroke compared to younger people.
5. Percentile: Used to compare an individual to others within a distribution, expressed as a percentage.
- Example: If you are in the 90th percentile in your class, your grade is higher than 90% of your classmates.
6. Quartiles: Divide a distribution into four equal groups.
- Example: Q1 (25th percentile), Q2 (median), Q3 (75th percentile).
Conclusion
Statistics is a powerful tool for analyzing and interpreting data. By understanding these fundamental concepts, you can gain valuable insights and make informed decisions. Be curious and continue to learn more about statistical methods to enhance your analytical skills.