13 Large sample test

The sample for which sample size \(n\) is greater than 30 (\(n\) ≥ 30) is known as large sample. The study of sampling distribution of statistic for large sample is known as large sample theory .

13.1 Large Sample Tests

In this section we shall discuss the application of \(Z\)-test. The test statistic \(Z\) calculated from the large sample follows a standard normal distribution. \(Z\) \(\sim N(0,1)\)

Test for a single proportion
Test for the equality of two proportions
Test for a single mean
Test for equality of two means
Test for equality of two standard deviations
Test for significance of correlation coefficient

13.1.1 Decision rule for Z test

Let \(Z\) be the calculated value and α be the level of significance, then we reject the null hypothesis if

\(|Z|\) > \(Z\)_α/2 ; for two tailed test
\(Z\) > \(Z\)_α ; for right tailed test
\(Z\) < - \(Z\)_α ; for left tailed test

Table values (critical values) of Z for specified level of significance is shown below

\(α\)	\(Z-value\)
0.10	1.645
0.05	1.960
0.025	2.241
0.01	2.576
0.005	2.807

13.2 Test for a single population proportion

Consider there is a population with proportion value, say \(P\); \(P\) is unknown, we will take a random sample of size \(n\) from the population and calculate a sample proportion \(p\). We want to test whether the population proportion \(P\), which is unknown is equal to \(P\)₀, based on the sample proportion \(p\).

The null hypothesis to be tested is

H₀ : \(P\) = \(P\)₀

The alternative hypothesis may be either

H₁ : \(P\) < \(P\)₀ (called left tailed alternative)

H₁ : \(P\) > \(P\)₀ (called right tailed alternative)

H₁ : \(P\) ≠ \(P\)₀ (called two tailed alternative)

For this test, we will calculate test statistic, \(Z\) using the following formula.

\[Z = \frac{p - P_{0}}{\sqrt{\text{pq/n}}}\]

Where \(q\) = 1 - \(p\)

When the null hypothesis is true this \(Z\) will follow a standard normal distribution with mean 0 and variance 1. \(Z\) \(\sim N(0,1)\)

This calculated of \(Z\) is then compared with critical value of \(Z\) from the table of standard normal distribution. If the calculated value is higher than the table value, we reject the null hypothesis. If the calculated value is less than table value, we conclude that we don’t have enough evidence in our sample to reject the null hypothesis. See section 13.1.1

Example 1

A sample of 500 apples were taken from a large consignment and 60 were found to be bad. Test whether the proportion of bad apples in the consignment is 0.1 or not at 5% level of significance.

Solution:

Sample size (\(n\)) = 500

Proportion of bad apples in the sample (\(p\)) = 60/500 = 0.12

Proportion of good apples in the sample (\(q\)) = 1 - 0.12 = 0.88

We want to test whether the proportion of bad apple in the population = 0.1 ; Here \(P\)₀ = 0.1

The null hypothesis to be tested is

H₀ : \(P\) = 0.1

If we want to prove that proportion of bad apples is 0.1 or not we can use a two tailed test with alternative hypothesis as below

H₁ : \(P\) ≠ 0.1

So that if null hypothesis is rejected, we accept this alternative hypothesis

Level of significance is given at 5% , so α=0.05

Now we calculate \(Z\)

\(Z = \frac{p - P_{0}}{\sqrt{\text{pq/n}}}\)

\(Z = \frac{0.12 - 0.1}{\sqrt{0.12 \times 0.88/500}}\)

\(= 1.428\)

Here in our case since it is a two tailed test, we have to look for the critical value of \(Z\) at α/2. So we have to look for the value of \(Z\) at α/2 = 0.05/2 = 0.025, which is 2.241 (see section 13.1.1). The calculated value of Z is less than the table value. So, we conclude that, we don’t have enough evidence to reject the null hypothesis. So, it can be stated that proportion of bad apples in the population is 0.1.

Self excercise

A random sample of 500 plants was taken from a large experimental field and 65 plants were found to be affected by yellowing disease. Is the disease rate significant? (α = 0.05)

13.3 Test for equality of two proportions

Consider there are two populations with proportion values, say \(P\)₁ and \(P\)₂ ; both are unknown, we will take a random sample of sizes \(n\)₁ and \(n\)₂ from the both populations respectively and calculate sample proportions \(p\)₁ and \(p\)₂. We want to test whether the population proportions \(P\)₁ and \(P\)₂ are equal, based on the sample proportions \(p\)₁ and \(p\)₂.

The null hypothesis to be tested is

H₀ : \(P\)₁ = \(P\)₂

The alternative hypothesis may be either

H₁ : \(P\)₁ < \(P\)₂ (called left tailed alternative)

H₁ : \(P\)₁> \(P\)₂ (called right tailed alternative)

H₁ : \(P\)₁≠ \(P\)₂ (called two tailed alternative)

For this test, we will calculate test statistic, \(Z\) using the following formula.

\[Z = \frac{p_{1} - p_{2}}{\sqrt{\widehat{P}\widehat{Q}\left( \frac{1}{n_{1}} + \frac{1}{n_{2}} \right)}}\]

Where \(\widehat{P} = \frac{n_{1}p_{1} + n_{2}p_{2}}{n_{1} + n_{2}}\) and \(\widehat{Q} = 1 - \widehat{P}\)

When the null hypothesis is true this \(Z\) will follow a standard normal distribution with mean 0 and variance 1. \(Z\) ~ N(0,1).

This calculated value of \(z\) is then compared with critical value of \(Z\) from the table of standard normal distribution. See section 13.1.1. If the calculated value is higher than the table value, we reject the null hypothesis. If the calculated value is less than table value, we conclude that we don’t have enough evidence in our sample to reject the null hypothesis.

Example 2

In order to assess the adoption of a new variety of paddy by farmers, a survey was conducted in a locality. The survey covered 80 farmers with large land holding and 250 farmers with small land holding. It was observed that 50 out of big farmers and 78 out of small farmers adopted the new paddy variety. Test whether there is any significant difference in the adoption behaviour of the two groups of farmers (Take α = 0.01)

Solution:

Sample size of first population, framers with large land holding (\(n\)₁) = 80

Sample size of first population, framers with small land holding (\(n\)₂) = 250

Proportion of framers with large land holding who adopted paddy variety (\(p\)₁) = 50/80 = 0.625

Proportion of framers with large land holding who adopted paddy variety (\(p\)₂) =78/250 = 0.312

We want to test whether the proportion is significantly different between both populations so,

H₀ : \(P\)₁= \(P\)₂

Here the alternate hypothesis is

H₁ : \(P\)₁ ≠ \(P\)₂

So, it is a two tailed test

\[\widehat{P} = \frac{n_{1}p_{1} + n_{1}p_{1}}{n_{1} + n_{2}}\]

\[\widehat{P} = \frac{80 \times 0.625 + 250 \times 0.312}{80 + 250}\]

\[= 0.3879\]

\[\widehat{Q} = 1 - 0.3879 = 0.6121\]

So that if null hypothesis is rejected, we accept this alternative hypothesis

Level of significance is given at 1% , so α=0.01

Now we calculate \(Z\) using the formula

\[Z = \frac{p_{1} - p_{2}}{\sqrt{\widehat{P}\widehat{Q}\left( \frac{1}{n_{1}} + \frac{1}{n_{2}} \right)}}\]

\[= \frac{0.625 - 0.312}{\sqrt{0.3879 \times 0.6121\left( \frac{1}{80} + \frac{1}{250} \right)}}\]

\[= \frac{0.313}{\sqrt{0.2374 \times 0.0165}}\]

\[= 5.008\]

Since it is a two tailed test, we have to look for the critical value of \(Z\) at α/2. Here in this case so we have to look for the value of \(Z\) at α/2 = 0.01/2 = 0.005, which is 2.807 (see section 13.1.1). Since the calculated value (5.008) is greater than the table value (2.807) we reject the null hypothesis and conclude that there is a significant difference between two population proportions.

13.4 Test for a single population mean

Consider there is a population with mean, say \(μ\); where \(μ\) is unknown, we will take a random sample of size n from the population and calculate a sample mean, denoted as \(\overline{x}\). We want to test whether the population mean \(μ\), which is unknown is equal to some known constant \(μ\)₀, based on the sample mean \(\overline{x}\).

The null hypothesis to be tested is

H₀ : \(μ\) = \(μ\)₀

The alternative hypothesis may be either

H₁ : \(μ\) < \(μ\)₀ (called left tailed alternative)

H₁ : \(μ\) > \(μ\)₀ (called right tailed alternative)

H₁ : \(μ\) ≠ \(μ\)₀ (called two tailed alternative)

This particular test has two cases

Case when the population standard deviation σ is known
Case when the population standard deviation σ is unknown

We will discuss each case below

Population standard deviation σ is known

Population standard deviation σ is known and for this test, we will calculate test statistic, \(Z\) using the following formula.

\[Z = \frac{\overline{x} - \mu_{0}}{\frac{\sigma}{\sqrt{n}}}\]

When the null hypothesis is true this \(Z\) will follow a standard normal distribution with mean 0 and variance 1. \(Z\) ~ N(0,1). Decision rule is given in section 13.1.1

Population standard deviation σ is unknown

Population standard deviation σ is unknown and for this test, 𝜎 is replaced by its sample estimate s, we will calculate test statistic, \(Z\) using the following formula.

\[Z = \frac{\overline{x} - \mu_{0}}{\frac{s}{\sqrt{n}}}\]

When the null hypothesis is true this \(Z\) will follow a standard normal distribution with mean 0 and variance 1. \(Z\) ~ N(0,1). Decision rule is given in section 13.1.1

Example 3

A sample of 900 items has a mean 3.4 cm and standard deviation 2.61cm. Test whether the population mean is 3.25cm at 5% significance level.

Solution:

Null hypothesis, H₀ : \(μ\) = 3.25

Alternate hypothesis, H₁ : \(μ\) ≠ 3.25; two tailed test

Sample size (\(n\)) = 900

Sample mean, \(\overline{x}\) = 3.4

Sample standard deviation (\(s\)) = 2.61

Population mean (\(μ\)₀) = 3.25

Level of significance, α = 0.05

\(Z = \frac{\overline{x} - \mu_{0}}{\frac{s}{\sqrt{n}}}\)

\(Z = \frac{3.4 - 3.25}{\frac{2.61}{\sqrt{900}}} = 1.724\)

Since it is a two tailed test, we have to look for the critical value of \(Z\) at α/2. Here in this case so we have to look for the value of Z at α/2=0.05/2 =0.025, which is 2.241 (Decision rule see section 13.1.1). Since the calculated value (1.724) is less than the table value (2.241), we conclude that, we don’t have enough evidence to reject the null hypothesis. So, it can be stated that mean is 3.25 cm.

Self exercise

From a paddy field a sample of 36 plants were selected at random. From these plants, the panicle lengths were observed. The mean and standard deviation of these measurements were 18.7cm and 1.25cm respectively. Test whether the mean length of panicle of paddy is 19cm. (α = 0.05)

13.5 Test for equality of two means

Let there be two normally distributed populations with means \(µ\)₁ and \(µ\)₂ and standard deviations be \(σ\)₁ and \(σ\)₂ respectively. Let samples of sizes \(n\)₁ and \(n\)₂ were taken from these populations. Let the sample means were \(\overline{x}_{1}\) and \(\overline{x}_{2}\) respectively. We want to test whether these population means are significantly different or not based on the sample means.

The null hypothesis to be tested is

H₀ : \(µ\)₁ = \(µ\)₂

The alternative hypothesis may be either

H₁ : \(µ\)₁ < \(µ\)₂ (called left tailed alternative)

H₁ : \(µ\)₁> \(µ\)₂ (called right tailed alternative)

H₁ : \(µ\)₁≠ \(µ\)₂ (called two tailed alternative)

There are two cases under this test

Case when the population standard deviations are not equal \(σ\)₁ ≠ \(σ\)₂
Case when the population standard deviations are equal \(σ\)₁ = \(σ\)₂ = \(σ\)

When the population standard deviations are equal

Under this case, we will calculate test statistic, \(Z\) using the following formula.

\[Z = \frac{{\overline{x}}_{1} - {\overline{x}}_{2}}{\sigma\sqrt{\left( \frac{1}{n_{1}} + \frac{1}{n_{2}} \right)}}\]

When the null hypothesis is true this \(Z\) will follow a standard normal distribution with mean 0 and variance 1. \(Z\) ~ N(0,1). Decision rule is same as that of above tests (section 13.1.1).

Note: If \(σ\) is unknown then it is replaced by its estimate \(\sqrt{\frac{n_{1}s_{1}^{2} + n_{2}s_{2}^{2}}{n_{1} + n_{2}}}\) where \(n\)₁ and \(n\)₂ are the sample sizes, \(s\)₁ and \(s\)₂ are the sample standard deviations.

13.5.1 When the population standard deviations are not equal

Under this case, we will calculate test statistic, \(Z\) using the following formula.

\[Z = \frac{{\overline{x}}_{1} - {\overline{x}}_{2}}{\sqrt{\left( \frac{\sigma_{1}^{2}}{n_{1}} + \frac{\sigma_{2}^{2}}{n_{2}} \right)}}\]

When the null hypothesis is true this \(Z\) will follow a standard normal distribution with mean 0 and variance 1. \(Z\) ~ N(0,1). Decision rule is same as that of above tests (section 13.1.1).

Example 4

The means of two single large samples of 1000 and 2000 members are 67.5 inches and 68 inches respectively. Can the samples be regarded as drawn from the same population of standard deviation 2.5 inches. (Test at 5% significance level).

Solution:

Null hypothesis, \(H\)₀ : \(μ\)₁ = \(μ\)₂

Alternate hypothesis, \(H\)₁ : \(μ\)₁≠ \(μ\)₂; two tailed test

Sample size (\(n\)₁) = 2000

Sample size (\(n\)₂) = 1000

Sample mean of first group, \({\overline{x}}_{1}\) = 68

Sample mean of second group, \({\overline{x}}_{2}\) = 67.5

Population standard deviation (\(σ\)) = 2.5

Level of significance, α = 0.05

we will calculate test statistic, \(Z\) using the following formula.

\(Z = \frac{{\overline{x}}_{1} - {\overline{x}}_{2}}{\sigma\sqrt{\left( \frac{1}{n_{1}} + \frac{1}{n_{2}} \right)}}\)

\(Z = \frac{68 - 67.5}{2.5\sqrt{\left( \frac{1}{2000} + \frac{1}{1000} \right)}} = \frac{0.5}{0.0968} = 5.20\)

Since it is a two tailed test, we have to look for the critical value of \(Z\) at α/2. Here in this case so we have to look for the value of \(Z\) at α/2=0.05/2 =0.025, which is 2.241 (see section 13.1.1). Calculated value (5.20) is greater than table value, so we reject the null hypothesis. We may conclude that samples are not drawn from the same population.

Self exercise

Two random samples were drawn from two populations and the following data were obtained. Test whether the population means are equal. \(n\)₁ = 400, \(n\)₂ = 400, \(\overline{x}_{1}\) = 250, \(\overline{x}_{2}\) = 220, \(s\)₁ = 40, \(s\)₂ = 55

“It is a capital mistake to theorize before one has data”:- Sir Arthur Conan Doyle

12 Sample Survey

14 Small sample tests