13 Large sample test
The sample for which sample size \(n\) is greater than 30 (\(n\) ≥ 30) is known as large sample. The study of sampling distribution of statistic for large sample is known as large sample theory .
13.1 Large Sample Tests
In this section we shall discuss the application of \(Z\)-test. The test statistic \(Z\) calculated from the large sample follows a standard normal distribution. \(Z\) \(\sim N(0,1)\)
Test for a single proportion
Test for the equality of two proportions
Test for a single mean
Test for equality of two means
Test for equality of two standard deviations
Test for significance of correlation coefficient
13.1.1 Decision rule for Z test
Let \(Z\) be the calculated value and α be the level of significance, then we reject the null hypothesis if
\(|Z|\) > \(Z\)α/2 ; for two tailed test
\(Z\) > \(Z\)α ; for right tailed test
\(Z\) < - \(Z\)α ; for left tailed test
Table values (critical values) of Z for specified level of significance is shown below
\(α\) | \(Z-value\) |
---|---|
0.10 | 1.645 |
0.05 | 1.960 |
0.025 | 2.241 |
0.01 | 2.576 |
0.005 | 2.807 |
13.2 Test for a single population proportion
Consider there is a population with proportion value, say \(P\); \(P\) is unknown, we will take a random sample of size \(n\) from the population and calculate a sample proportion \(p\). We want to test whether the population proportion \(P\), which is unknown is equal to \(P\)0, based on the sample proportion \(p\).
The null hypothesis to be tested is
H0 : \(P\) = \(P\)0
The alternative hypothesis may be either
H1 : \(P\) < \(P\)0 (called left tailed alternative)
Or
H1 : \(P\) > \(P\)0 (called right tailed alternative)
Or
H1 : \(P\) ≠ \(P\)0 (called two tailed alternative)
For this test, we will calculate test statistic, \(Z\) using the following formula.
\[Z = \frac{p - P_{0}}{\sqrt{\text{pq/n}}}\]
Where \(q\) = 1 - \(p\)
When the null hypothesis is true this \(Z\) will follow a standard normal distribution with mean 0 and variance 1. \(Z\) \(\sim N(0,1)\)
This calculated of \(Z\) is then compared with critical value of \(Z\) from the table of standard normal distribution. If the calculated value is higher than the table value, we reject the null hypothesis. If the calculated value is less than table value, we conclude that we don’t have enough evidence in our sample to reject the null hypothesis. See section 13.1.1
Example 1
A sample of 500 apples were taken from a large consignment and 60 were found to be bad. Test whether the proportion of bad apples in the consignment is 0.1 or not at 5% level of significance.
Solution:
Sample size (\(n\)) = 500
Proportion of bad apples in the sample (\(p\)) = 60/500 = 0.12
Proportion of good apples in the sample (\(q\)) = 1 - 0.12 = 0.88
We want to test whether the proportion of bad apple in the population = 0.1 ; Here \(P\)0 = 0.1
The null hypothesis to be tested is
H0 : \(P\) = 0.1
If we want to prove that proportion of bad apples is 0.1 or not we can use a two tailed test with alternative hypothesis as below
H1 : \(P\) ≠ 0.1
So that if null hypothesis is rejected, we accept this alternative hypothesis
Level of significance is given at 5% , so α=0.05
Now we calculate \(Z\)
\(Z = \frac{p - P_{0}}{\sqrt{\text{pq/n}}}\)
\(Z = \frac{0.12 - 0.1}{\sqrt{0.12 \times 0.88/500}}\)
\(= 1.428\)
Here in our case since it is a two tailed test, we have to look for the critical value of \(Z\) at α/2. So we have to look for the value of \(Z\) at α/2 = 0.05/2 = 0.025, which is 2.241 (see section 13.1.1). The calculated value of Z is less than the table value. So, we conclude that, we don’t have enough evidence to reject the null hypothesis. So, it can be stated that proportion of bad apples in the population is 0.1.
13.3 Test for equality of two proportions
Consider there are two populations with proportion values, say \(P\)1 and \(P\)2 ; both are unknown, we will take a random sample of sizes \(n\)1 and \(n\)2 from the both populations respectively and calculate sample proportions \(p\)1 and \(p\)2. We want to test whether the population proportions \(P\)1 and \(P\)2 are equal, based on the sample proportions \(p\)1 and \(p\)2.
The null hypothesis to be tested is
H0 : \(P\)1 = \(P\)2
The alternative hypothesis may be either
H1 : \(P\)1 < \(P\)2 (called left tailed alternative)
Or
H1 : \(P\)1> \(P\)2 (called right tailed alternative)
Or
H1 : \(P\)1≠ \(P\)2 (called two tailed alternative)
For this test, we will calculate test statistic, \(Z\) using the following formula.
\[Z = \frac{p_{1} - p_{2}}{\sqrt{\widehat{P}\widehat{Q}\left( \frac{1}{n_{1}} + \frac{1}{n_{2}} \right)}}\]
Where \(\widehat{P} = \frac{n_{1}p_{1} + n_{2}p_{2}}{n_{1} + n_{2}}\) and \(\widehat{Q} = 1 - \widehat{P}\)
When the null hypothesis is true this \(Z\) will follow a standard normal distribution with mean 0 and variance 1. \(Z\) ~ N(0,1).
This calculated value of \(z\) is then compared with critical value of \(Z\) from the table of standard normal distribution. See section 13.1.1. If the calculated value is higher than the table value, we reject the null hypothesis. If the calculated value is less than table value, we conclude that we don’t have enough evidence in our sample to reject the null hypothesis.
Example 2
In order to assess the adoption of a new variety of paddy by farmers, a survey was conducted in a locality. The survey covered 80 farmers with large land holding and 250 farmers with small land holding. It was observed that 50 out of big farmers and 78 out of small farmers adopted the new paddy variety. Test whether there is any significant difference in the adoption behaviour of the two groups of farmers (Take α = 0.01)
Solution:
Sample size of first population, framers with large land holding (\(n\)1) = 80
Sample size of first population, framers with small land holding (\(n\)2) = 250
Proportion of framers with large land holding who adopted paddy variety (\(p\)1) = 50/80 = 0.625
Proportion of framers with large land holding who adopted paddy variety (\(p\)2) =78/250 = 0.312
We want to test whether the proportion is significantly different between both populations so,
H0 : \(P\)1= \(P\)2
Here the alternate hypothesis is
H1 : \(P\)1 ≠ \(P\)2
So, it is a two tailed test
\[\widehat{P} = \frac{n_{1}p_{1} + n_{1}p_{1}}{n_{1} + n_{2}}\]
\[\widehat{P} = \frac{80 \times 0.625 + 250 \times 0.312}{80 + 250}\]
\[= 0.3879\]
\[\widehat{Q} = 1 - 0.3879 = 0.6121\]
So that if null hypothesis is rejected, we accept this alternative hypothesis
Level of significance is given at 1% , so α=0.01
Now we calculate \(Z\) using the formula
\[Z = \frac{p_{1} - p_{2}}{\sqrt{\widehat{P}\widehat{Q}\left( \frac{1}{n_{1}} + \frac{1}{n_{2}} \right)}}\]
\[= \frac{0.625 - 0.312}{\sqrt{0.3879 \times 0.6121\left( \frac{1}{80} + \frac{1}{250} \right)}}\]
\[= \frac{0.313}{\sqrt{0.2374 \times 0.0165}}\]
\[= 5.008\]
Since it is a two tailed test, we have to look for the critical value of \(Z\) at α/2. Here in this case so we have to look for the value of \(Z\) at α/2 = 0.01/2 = 0.005, which is 2.807 (see section 13.1.1). Since the calculated value (5.008) is greater than the table value (2.807) we reject the null hypothesis and conclude that there is a significant difference between two population proportions.
13.4 Test for a single population mean
Consider there is a population with mean, say \(μ\); where \(μ\) is unknown, we will take a random sample of size n from the population and calculate a sample mean, denoted as \(\overline{x}\). We want to test whether the population mean \(μ\), which is unknown is equal to some known constant \(μ\)0, based on the sample mean \(\overline{x}\).
The null hypothesis to be tested is
H0 : \(μ\) = \(μ\)0
The alternative hypothesis may be either
H1 : \(μ\) < \(μ\)0 (called left tailed alternative)
Or
H1 : \(μ\) > \(μ\)0 (called right tailed alternative)
Or
H1 : \(μ\) ≠ \(μ\)0 (called two tailed alternative)
This particular test has two cases
Case when the population standard deviation σ is known
Case when the population standard deviation σ is unknown
We will discuss each case below
Population standard deviation σ is known
Population standard deviation σ is known and for this test, we will calculate test statistic, \(Z\) using the following formula.
\[Z = \frac{\overline{x} - \mu_{0}}{\frac{\sigma}{\sqrt{n}}}\]
When the null hypothesis is true this \(Z\) will follow a standard normal distribution with mean 0 and variance 1. \(Z\) ~ N(0,1). Decision rule is given in section 13.1.1
Population standard deviation σ is unknown
Population standard deviation σ is unknown and for this test, 𝜎 is replaced by its sample estimate s, we will calculate test statistic, \(Z\) using the following formula.
\[Z = \frac{\overline{x} - \mu_{0}}{\frac{s}{\sqrt{n}}}\]
When the null hypothesis is true this \(Z\) will follow a standard normal distribution with mean 0 and variance 1. \(Z\) ~ N(0,1). Decision rule is given in section 13.1.1
Example 3
A sample of 900 items has a mean 3.4 cm and standard deviation 2.61cm. Test whether the population mean is 3.25cm at 5% significance level.
Solution:
Null hypothesis, H0 : \(μ\) = 3.25
Alternate hypothesis, H1 : \(μ\) ≠ 3.25; two tailed test
Sample size (\(n\)) = 900
Sample mean, \(\overline{x}\) = 3.4
Sample standard deviation (\(s\)) = 2.61
Population mean (\(μ\)0) = 3.25
Level of significance, α = 0.05
\(Z = \frac{\overline{x} - \mu_{0}}{\frac{s}{\sqrt{n}}}\)
\(Z = \frac{3.4 - 3.25}{\frac{2.61}{\sqrt{900}}} = 1.724\)
Since it is a two tailed test, we have to look for the critical value of \(Z\) at α/2. Here in this case so we have to look for the value of Z at α/2=0.05/2 =0.025, which is 2.241 (Decision rule see section 13.1.1). Since the calculated value (1.724) is less than the table value (2.241), we conclude that, we don’t have enough evidence to reject the null hypothesis. So, it can be stated that mean is 3.25 cm.
13.5 Test for equality of two means
Let there be two normally distributed populations with means \(µ\)1 and \(µ\)2 and standard deviations be \(σ\)1 and \(σ\)2 respectively. Let samples of sizes \(n\)1 and \(n\)2 were taken from these populations. Let the sample means were \(\overline{x}_{1}\) and \(\overline{x}_{2}\) respectively. We want to test whether these population means are significantly different or not based on the sample means.
The null hypothesis to be tested is
H0 : \(µ\)1 = \(µ\)2
The alternative hypothesis may be either
H1 : \(µ\)1 < \(µ\)2 (called left tailed alternative)
Or
H1 : \(µ\)1> \(µ\)2 (called right tailed alternative)
Or
H1 : \(µ\)1≠ \(µ\)2 (called two tailed alternative)
There are two cases under this test
Case when the population standard deviations are not equal \(σ\)1 ≠ \(σ\)2
Case when the population standard deviations are equal \(σ\)1 = \(σ\)2 = \(σ\)
When the population standard deviations are equal
Under this case, we will calculate test statistic, \(Z\) using the following formula.
\[Z = \frac{{\overline{x}}_{1} - {\overline{x}}_{2}}{\sigma\sqrt{\left( \frac{1}{n_{1}} + \frac{1}{n_{2}} \right)}}\]
When the null hypothesis is true this \(Z\) will follow a standard normal distribution with mean 0 and variance 1. \(Z\) ~ N(0,1). Decision rule is same as that of above tests (section 13.1.1).
Note: If \(σ\) is unknown then it is replaced by its estimate \(\sqrt{\frac{n_{1}s_{1}^{2} + n_{2}s_{2}^{2}}{n_{1} + n_{2}}}\) where \(n\)1 and \(n\)2 are the sample sizes, \(s\)1 and \(s\)2 are the sample standard deviations.
13.5.1 When the population standard deviations are not equal
Under this case, we will calculate test statistic, \(Z\) using the following formula.
\[Z = \frac{{\overline{x}}_{1} - {\overline{x}}_{2}}{\sqrt{\left( \frac{\sigma_{1}^{2}}{n_{1}} + \frac{\sigma_{2}^{2}}{n_{2}} \right)}}\]
When the null hypothesis is true this \(Z\) will follow a standard normal distribution with mean 0 and variance 1. \(Z\) ~ N(0,1). Decision rule is same as that of above tests (section 13.1.1).
Example 4
The means of two single large samples of 1000 and 2000 members are 67.5 inches and 68 inches respectively. Can the samples be regarded as drawn from the same population of standard deviation 2.5 inches. (Test at 5% significance level).
Solution:
Null hypothesis, \(H\)0 : \(μ\)1 = \(μ\)2
Alternate hypothesis, \(H\)1 : \(μ\)1≠ \(μ\)2; two tailed test
Sample size (\(n\)1) = 2000
Sample size (\(n\)2) = 1000
Sample mean of first group, \({\overline{x}}_{1}\) = 68
Sample mean of second group, \({\overline{x}}_{2}\) = 67.5
Population standard deviation (\(σ\)) = 2.5
Level of significance, α = 0.05
we will calculate test statistic, \(Z\) using the following formula.
\(Z = \frac{{\overline{x}}_{1} - {\overline{x}}_{2}}{\sigma\sqrt{\left( \frac{1}{n_{1}} + \frac{1}{n_{2}} \right)}}\)
\(Z = \frac{68 - 67.5}{2.5\sqrt{\left( \frac{1}{2000} + \frac{1}{1000} \right)}} = \frac{0.5}{0.0968} = 5.20\)
Since it is a two tailed test, we have to look for the critical value of \(Z\) at α/2. Here in this case so we have to look for the value of \(Z\) at α/2=0.05/2 =0.025, which is 2.241 (see section 13.1.1). Calculated value (5.20) is greater than table value, so we reject the null hypothesis. We may conclude that samples are not drawn from the same population.