Hypothesis is actually assumptions that we make on population parameters.
Sounds like you may not have understood completely, so let me take the opportunity to explain it in a clearer and more comprehensive manner.
Imagine you have a friend named Raj who claims that a new diet plan they’ve been following has made them lose weight. You, being curious, want to check if the diet plan is really effective or if Raj’s weight loss was just due to chance.
Hypothesis testing is a way to scientifically investigate such claims. It involves two hypotheses:
- Null Hypothesis (H0): This is like the default assumption or the status quo. It says that there is no significant difference or effect, and any observed change is just due to random chance or coincidence. In this case, the null hypothesis would be that the diet plan doesn’t actually lead to weight loss.
- Alternative Hypothesis (Ha): This is what Raj is claiming — that the diet plan does make a difference and leads to weight loss.
Now, to test these hypotheses, you need some data. So, you gather some more people who are willing to try the same diet plan, and after a few weeks, you measure their weight changes.
Next, you use statistical analysis to determine if there is enough evidence to reject the null hypothesis. If the evidence is strong, you may reject the idea that the diet plan has no effect (H0) and accept the alternative (Ha), supporting Raj’s claim. But if there’s not enough evidence, you may fail to reject the null hypothesis, suggesting that the diet plan may not be as effective as Raj thinks.
Now, let’s utilize Python for hypothesis testing.
Import Libraries
import numpy as np
from scipy import stats
Define your data
# Sample data (exam scores)
sample_data = [68, 72, 75, 78, 80, 82, 84, 86, 88, 90]
Set up the hypothesis
- Null Hypothesis (H0): The mean score of the population is equal to 75.
- Alternative Hypothesis (H1): The mean score of the population is different from 75
Perform the hypothesis test
- We are using the two-tailed t-test which is a type of hypothesis test used to determine whether there is a significant difference between the means of two groups. It is called “two-tailed” because it considers differences in both directions (i.e., greater than or less than) from the expected value.
- The two-tailed t-test is commonly used when we want to investigate if there is a significant difference between two groups, but we do not have specific expectations about the direction of that difference
# Set the expected mean
expected_mean = 75
# Perform a two-tailed t-test (since we want to check for any difference from the expected mean)
t_statistic, p_value = stats.ttest_1samp(sample_data, expected_mean)
# Print the results
print("T-Statistic:", t_statistic)
print("P-Value:", p_value)
- The t-statistic represents how many standard errors the sample mean is away from the expected mean under the null hypothesis. The p-value is the probability of observing the sample mean (or something more extreme) if the null hypothesis is true. If the p-value is below a significance level (e.g., 0.05), you can reject the null hypothesis and conclude that there is significant evidence to support the alternative hypothesis.
if p_value < 0.05:
print("Reject the null hypothesis. There is a significant difference from the expected mean.")
else:
print("Fail to reject the null hypothesis. There is no significant difference from the expected mean.")
If the p-value is smaller than a chosen significance level (commonly 0.05), we reject the null hypothesis in favor of the alternative hypothesis. This means that we have evidence to support the claim that there is a significant difference between the means of the two groups.
Stay updated with my blog journey! Follow me to join the adventure of upcoming Blogs.