How to Calculate Test Statistic
Last Updated :
21 Aug, 2024
In statistical hypothesis testing, a test statistic is a crucial tool used to determine the validity of the hypothesis about a population parameter. This article delves into the calculation of test statistics exploring its importance in hypothesis testing and its application in real-world scenarios. Understanding how to compute and interpret test statistics is essential for students and professionals in various fields including data analysis, research and quality control.
Test Statistic
A test statistic is a value calculated from sample data during a hypothesis test. It is used to decide whether to reject the null hypothesis. The test statistic measures how far the sample data is from what we would expect under the null hypothesis. Depending on the type of test (e.g., t-test, chi-square test, etc.), the test statistic is compared to a critical value or used to calculate a p-value, which helps in determining the statistical significance of the results.
In simpler terms, think of a test statistic as a number that tells us how much the sample data stands out from what we expect if there's no real effect or difference. If this number is big enough, we might conclude that something interesting is happening in the data.
Types of Test Statistic
There are many types of test statistic:
- Z-Statistic
- T-Statistic
- F-Statistic
- Chi-Square Statistic
Z-Statistic
When the sample size is large and population variance is known, we can use z-statistic.
Formula for Z-Statistic is:
Z = \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}}
Where,
- \bar{X} = Sample mean
- \mu = Population mean
- \sigma = Population standard deviation
- n = Sample size
Read More about Z-test.
T-Statistic
When the sample size is small n \leq 30 or population variance is unknown, we can use t-statistic.
Formula for t-statistic is:
T = \frac{\bar{X} - \mu}{\frac{s}{\sqrt{n}}}
Where,
- \bar{X}= Sample mean
- \mu = Population mean
- s = Sample standard deviation
- n = Sample size
Read More about t-test.
Chi-Square Statistic
For categorical data to test the independence of the two variables or goodness of fit, we can use chi-square statistic.
Formula for chi-square statistic is:
\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}
Where,
- O_i = Observed frequency
- E_i = Expected frequency
Read More about Chi-square test.
F-Statistic
For comparing variances between the two or more groups often used in the ANOVA, we can use f-statistic.
Formula for f-statistic is:
F = \frac{\text{Variance between groups}}{\text{Variance within groups}}
Examples with Solutions
Example for Z-Statistic
Problem: A manufacturer claims that the mean weight of their product is 200 grams. A sample of 30 products has a mean weight of 198 grams with the known population standard deviation of the 5 grams. The Test the claim at a 0.05 significance level.
Solution:
Hypotheses:
- Null Hypothesis H_0: \mu = 200
- Alternative Hypothesis H_1: \mu \neq 200
Test Statistic:
Z = \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}} = \frac{198 - 200}{\frac{5}{\sqrt{30}}} \approx -2.19
Critical Value: For a two-tailed test at \alpha = 0.05 critical values are \pm 1.96.
Decision: Since -2.19 < -1.96 reject the null hypothesis.
Example for T-Statistic
Problem: A researcher wants to the test if the average test score of the class differs from the 75. A sample of the 15 students has an average score of 78 with the sample standard deviation of 10. The Test the hypothesis at the 0.01 significance level.
Solution:
Hypotheses:
- Null Hypothesis H_0: \mu = 75
- Alternative Hypothesis H_1: \mu \neq 75
Test Statistic:
T = \frac{\bar{X} - \mu}{\frac{s}{\sqrt{n}}} = \frac{78 - 75}{\frac{10}{\sqrt{15}}} \approx 2.32
Critical Value: For a two-tailed test with the df = 14 and \alpha = 0.01 critical values are \pm 2.977.
Decision: Since 2.32 < 2.977 do not reject the null hypothesis.
Example for Chi-Square Statistic
Problem: A survey of 100 people found the following preferences for the types of movies: Action (30), Comedy (20), Drama (25) and Horror (25). Test if the preferences are equally distributed at the 0.05 significance level.
Solution:
Hypotheses:
- Null Hypothesis H_0: Preferences are equally distributed.
- Alternative Hypothesis H_1: Preferences are not equally distributed.
Expected Frequencies: All categories should have 25 expected frequency.
Test Statistic:
\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i} = \frac{(30 - 25)^2}{25} + \frac{(20 - 25)^2}{25} + \frac{(25 - 25)^2}{25} + \frac{(25 - 25)^2}{25} = 4 + 1 + 0 + 0 = 5
Critical Value: For df = 3 and \alpha = 0.05 critical value is 7.815.
Decision: Since 5 < 7.815 do not reject the null hypothesis.
Example for F-Statistic
Problem: Two different types of fertilizers were tested to the compare their effects on the plant growth. The variance in plant height for the Fertilizer A is 16 and for Fertilizer B is 25. Test if the variances are equal at the 0.05 significance level.
Solution:
Hypotheses:
- Null Hypothesis H_0: \sigma_1^2 = \sigma_2^2
- Alternative Hypothesis H_1: \sigma_1^2 \neq \sigma_2^2
Test Statistic:
F = \frac{\text{Variance of Fertilizer B}}{\text{Variance of Fertilizer A}} = \frac{25}{16} = 1.56
Critical Value: For df_1 = 1 and df_2 = 1 critical value is 18.51.
Decision: Since 1.56 < 18.51 do not reject the null hypothesis.
Practice Questions
Question 1: A sample of 50 students has an average height of 165 cm. The population standard deviation is 8 cm. Test if the sample mean is significantly different from the 170 cm at a 0.01 significance level.
Question 2: An online retailer claims that 40% of their customers are repeat buyers. A survey of 200 customers shows that 85 are repeat buyers. Test this claim at a 0.05 significance level.
Question 3: A factory claims that the average lifespan of its light bulbs is 1200 hours. A sample of 20 bulbs has an average lifespan of 1180 hours with the standard deviation of the 50 hours. Test the factory's claim at a 0.05 significance level.
Question 4: A researcher wants to test if there is a significant difference in the mean scores of two different teaching methods. Method A has a mean score of 85 with a standard deviation of 10 and Method B has a mean score of 80 with the standard deviation of 12. Assume the sample size for both the methods is 25. Test the hypothesis at the 0.05 significance level.
Question 5: A company wants to test if their new product's defect rate is less than 5%. A sample of 150 products shows that 6 are defective. Test the claim at a 0.01 significance level.
Question 6: We have two independent samples with the following the statistics: Sample 1 (n=15, mean=25, variance=9) and Sample 2 (n=20, mean=22, variance=16). Test if the variances are equal at a 0.05 significance level.
Question 7: A drug manufacturer wants to test if the average recovery time with their new drug is less than the historical average of 30 days. A sample of 12 patients has an average recovery time of 28 days with the standard deviation of 4 days. Test the claim at a 0.05 significance level.
Question 8: In a study of customer satisfaction the variance of the satisfaction scores in two different regions is compared. Region 1 has a variance of 25 and Region 2 has a variance of the 36. The Test if the variances are equal at a 0.05 significance level.
Question 9: An agricultural experiment compares the effects of the two fertilizers on the crop yield. The Fertilizer A yields a mean of 50 kg/acre with the standard deviation of 5 kg/acre and Fertilizer B yields a mean of 55 kg/acre with a standard deviation of the 6 kg/acre. If the sample sizes are both 20 test if the mean yields are significantly different at a 0.05 significance level.
Question 10: A company tests whether the average time to assemble a product is different from expected 45 minutes. The sample of 25 assembly times has a mean of the 47 minutes with the standard deviation of 3 minutes. Test the company's claim at a 0.05 significance level.
Similar Reads
How to Calculate Kurtosis in Statistics?
Kurtosis is a statistical measure used to describe the distribution of observed data around the mean. It is used to identify the tails and sharpness of a distribution. The kurtosis of a probability distribution for a random variable x is defined as the ratio of the fourth central moment (μ4â) to the
4 min read
How to Calculate Statistical Significance?
Answer: Statistical Significance can be calculated using the formula[Tex]Z=\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}[/Tex]In research surveys, statistical significance is an important metric for determining the validity of hypotheses. Every day, a variety of people conduct a variety of tests
3 min read
How to Calculate the P-Value of a Chi-Square Statistic in R
Chi-Square Statistic is a method to represent the relationship between two categorical variables. In statistics, variables are categorized into two classes: numerical variables and non-numerical variables (categorical). Chi-square statistic is used to signify how much difference exists between the o
4 min read
How to Calculate P-Hat?
Answer: P-hat (pÌ) is calculated by dividing the number of successes (events of interest) by the total number of observations or trials.Certainly! P-hat (pÌ) is a statistical estimate of a population proportion based on sample data. It is commonly used in inferential statistics, particularly in hypo
2 min read
How to Find p Value from Test Statistic
P-values are widely used in statistics and are important for many hypothesis tests. But how do we find a p-value? The method can vary depending on the specific test, but there's a general process we can follow. In this article, we'll learn how to find the p-value, get an overview of the general step
7 min read
How to Calculate Cramerâs V in R
Cramer's V is a measure of the relationship between two categorical variables, similar to the Pearson correlation coefficient for continuous variables. It goes from 0 to 1, with 0 representing no relationship and 1 indicating perfect relationship. You may calculate Cramer's V in R by calling the ass
6 min read
How to calculate P Value?
P-value, also referred to as probability value is a statistical measure used to determine whether to accept or reject the Null Hypothesis, considering the Null Hypothesis to be True. For calculating the p-value, we perform an experiment and based on the observations of the test-statistic we make dec
4 min read
How to Write Test Cases - Software Testing
Software testing is known as a process for validating and verifying the working of a software/application. It re-check that the software functions are meets the requirements without errors, bugs, or any other issues and provides the expected output to the user. The software testing process is not li
15+ min read
How to Calculate Point Estimates in R?
Point estimation is a technique used to find the estimate or approximate value of population parameters from a given data sample of the population. The point estimate is calculated for the following two measuring parameters: Measuring parameterPopulation ParameterPoint EstimateProportionÏp MeanμxÌ T
3 min read
How to Calculate Standard Deviation?
Standard Deviation is a measure of how data is spread out around the mean. It is a statistical tool used to determine the amount of variation or dispersion of a set of values from the mean. A low standard deviation indicates that the data points are clustered closely around the mean, while a high st
7 min read