Understanding Statistics: Central Tendency, Data Types, and Dispersion
1. Measures of Central Tendency
Let’s start with the basics: Measures of Central Tendency. This might sound complicated, but it’s really just 5th-grade math—Mean, Median, and Mode. These measures help provide an overview or a small idea about the data.
- Mean: The sum of all values divided by the number of values.
Example: For data {1, 2, 3, 4, 5}, the mean is (1+2+3+4+5)/5 = 3.
- Median: The middle value in a list of numbers.
Example: For data {1, 2, 3, 4, 5}, the median is 3. For data {1, 2, 3, 4, 5, 100}, the median is (3+4)/2 = 3.5.
- Mode: The value that appears most frequently.
Example: For data {1, 2, 2, 3, 4}, the mode is 2.
2. Sample and Population Data
- Population Data (N): The entire dataset you have.
- Sample Data (n): A smaller part of the population data used to perform mathematical operations to understand the whole dataset.
Recommended by LinkedIn
3. Why Do We Need Median?
Consider this data: {1, 2, 3, 4, 5}. Here, both the mean and median are the same. Now, look at this data: {1, 2, 3, 4, 5, 100}. In this case, the mean is (1+2+3+4+5+100)/6 = 19, while the median is 3.5. The median provides the middle value, which is less affected by extreme values, while the mean gives the average.
4. Levels of Measurement
5. Measures of Dispersion
Measures of Dispersion, also known as the spread of data, can be measured in two main ways:
where x_i are the data points and N is the number of data points.
The Above formula is for population Variance for sample variance we just have to do n-1 in the denominator.
By understanding these fundamental concepts, you can begin to analyze and interpret data more effectively. Stay tuned for more insights as we continue to explore the fascinating world of data science!