Skip to content### 4.11 Hypothesis testing

**What is the Chi-square test of independence?**

**When can I use the test?**

**Can I use the test if I have frequency counts in a table?**

## Using the Chi-square test of independence

### What do we need?

## Chi-square test of independence example

#### Table 1: Contingency table for movie snacks data

### Finding expected counts

#### Table 2: Contingency table for movie snacks data with row and column totals

#### Table 3: Contingency table for movie snacks data showing actual count vs. expected count

### Performing the test

#### Table 4: Preparing to calculate our test statistic

### Understanding results

### Statistical details

### Understanding p-values

View comments

**Content-specific conceptual understandings**

**Formulation of null and alternative hypotheses, H0and H1.****Significance levels.****p -values.**

Students should express H0 and H1 as an equation or inequality, or in words as appropriate.

**Expected and observed frequencies.****The χ2 test for independence: contingency tables, degrees of freedom, critical value.****The χ2 goodness of fit test.**

In examinations:

• the maximum number of rows or columns in a contingency table will be 4

• the degrees of freedom will always be greater than one. At SL the degrees of freedom for the goodness of fit test will always be n…1

• the χ2 critical value will be given if appropriate

• students will be expected to use technology to find a p -value and the χ2 statistic

• only questions on upper tail tests with commonly-used significance levels (1%, 5%, 10%) will be set

• students will be expected to either compare a p -value to the given significance level or compare the χ2 statistic to a given critical value

• expected frequencies will be greater than 5.

Hand calculations of the expected values or the χ2statistic may enhance understanding.

If using χ2 tests in the IA, students should be aware of the limitations of the test for expected frequencies of 5 or less.

**The t -test.****Use of the p -value to compare the means of two populations.****Using one-tailed and two-tailed tests.**

In examinations calculations will be made using technology.

At SL, samples will be unpaired, and population variance will always be unknown.

Students will be asked to interpret the results of a test.

Students should know that the underlying distribution of the variables must be normal for the t -test to be applied. In examinations, students should assume that variance of the two groups is equal and therefore the pooled two-sample t -test should be used.

The Chi-square test of independence is a statistical hypothesis test used to determine whether two categorical or nominal variables are likely to be related or not.

You can use the test when you have counts of values for two categorical variables.

Yes. If you have only a table of values that shows frequency counts, you can use the test.

The Chi-square test of independence checks whether two variables are likely to be related or not. We have counts for two categorical or nominal variables. We also have an idea that the two variables are not related. The test gives us a way to decide if our idea is plausible or not.

The sections below discuss what we need for the test, how to do the test, understanding results, statistical details and understanding p-values.

For the Chi-square test of independence, we need two variables. Our idea is that the variables are not related. Here are a couple of examples:

- We have a list of movie genres; this is our first variable. Our second variable is whether or not the patrons of those genres bought snacks at the theater. Our idea (or, in statistical terms, our null hypothesis) is that the type of movie and whether or not people bought snacks are unrelated. The owner of the movie theater wants to estimate how many snacks to buy. If movie type and snack purchases are unrelated, estimating will be simpler than if the movie types impact snack sales.
- A veterinary clinic has a list of dog breeds they see as patients. The second variable is whether owners feed dry food, canned food or a mixture. Our idea is that the dog breed and types of food are unrelated. If this is true, then the clinic can order food based only on the total number of dogs, without consideration for the breeds.

For a valid test, we need:

- Data values that are a simple random sample from the population of interest.
- Two categorical or nominal variables. Don’t use the independence test with continous variables that define the category combinations. However, the counts for the combinations of the two categorical variables will be continuous.
- For each combination of the levels of the two variables, we need at least five expected values. When we have fewer than five for any one combination, the test results are not reliable.

Let’s take a closer look at the movie snacks example. Suppose we collect data for 600 people at our theater. For each person, we know the type of movie they saw and whether or not they bought snacks.

Let’s start by answering: Is the Chi-square test of independence an appropriate method to evaluate the relationship between movie type and snack purchases?

- We have a simple random sample of 600 people who saw a movie at our theater. We meet this requirement.
- Our variables are the movie type and whether or not snacks were purchased. Both variables are categorical. We meet this requirement.
- The last requirement is for more than five expected values for each combination of the two variables. To confirm this, we need to know the total counts for each type of movie and the total counts for whether snacks were bought or not. For now, we assume we meet this requirement and will check it later.

It appears we have indeed selected a valid method. (We still need to check that more than five values are expected for each combination.)

Here is our data summarized in a contingency table:

Type of Movie | Snacks | No Snacks |

Action | 50 | 75 |

Comedy | 125 | 175 |

Family | 90 | 30 |

Horror | 45 | 10 |

Before we go any further, let’s check the assumption of five expected values in each category. The data has more than five counts in each combination of Movie Type and Snacks. But what are the expected counts if movie type and snack purchases are independent?

To find expected counts for each Movie-Snack combination, we first need the row and column totals, which are shown below:

Type of Movie | Snacks | No Snacks | Row totals |

Action | 50 | 75 | 125 |

Comedy | 125 | 175 | 300 |

Family | 90 | 30 | 120 |

Horror | 45 | 10 | 55 |

Column totals | 310 | 290 | GRAND TOTAL = 600 |

The expected counts for each Movie-Snack combination are based on the row and column totals. We multiply the row total by the column total and then divide by the grand total. This gives us the expected count for each cell in the table. For example, for the Action-Snacks cell, we have:

125×310600=38,750600=65

We rounded the answer to the nearest whole number. If there is not a relationship between movie type and snack purchasing we would expect 65 people to have watched an action film with snacks.

Here are the actual and expected counts for each Movie-Snack combination. In each cell of Table 3 below, the expected count appears in **bold** beneath the actual count. The expected counts are rounded to the nearest whole number.

Type of Movie | Snacks | No Snacks | Row totals |

Action | 5065 | 7560 | 125 |

Comedy | 125155 | 175145 | 300 |

Family | 9062 | 3058 | 120 |

Horror | 4528 | 1027 | 55 |

Column totals | 310 | 290 | GRAND TOTAL = 600 |

When using software, these calculated values will be labeled as “expected values,” “expected cell counts” or some similar term.

All of the expected counts for our data are larger than five, so we meet the requirement for applying the independence test.

Before calculating the test statistic, let’s look at the contingency table again. The expected counts use the row and column totals. If we look at each of the cells, we can see that some expected counts are close to the actual counts but most are not. If there is no relationship between the movie type and snack purchases, the actual and expected counts will be similar. If there is a relationship, the actual and expected counts will be different.

A common mistake with expected counts is to simply divide the grand total by the number of cells. For our movie data, this is 600 / 8 = 75. This is not correct. We know the row totals and column totals. These are fixed and cannot change for our data. The expected values are based on the row and column totals, not just on the grand total.

The basic idea in calculating the test statistic is to compare actual and expected values, given the row and column totals that we have in the data. First, we calculate the difference from actual and expected for each Movie-Snacks combination. Next, we square that difference. Squaring gives the same importance to combinations with fewer actual values than expected and combinations with more actual values than expected. Next, we divide by the expected value for the combination. We add up these values for each Movie-Snacks combination. This gives us our test statistic.

This is much easier to follow using the data from our example. Table 4 below shows the calculations for each Movie-Snacks combination carried out to two decimal places.

Type of Movie | Snack | No Snacks |

Action | Actual: 50Expected: 64.58 | Actual: 75Expected: 60.42 |

Difference: 50 – 64.58 = -14.58 Squared Difference: 212.67 Divide by Expected: 212.67/64.58 = 3.29 | Difference: 75 – 60.42 = 14.58 Squared Difference: 212.67 Divide by Expected: 212.67/60.42 = 3.52 | |

Comedy | Actual: 125Expected 155 | Actual 175Expected 145 |

Difference: 125 – 155 = -30 Squared Difference: 900 Divide by Expected: 900/155 = 5.81 | Difference: 175 – 145 = 30 Squared Difference: 900 Divide by Expected: 900/145 = 6.21 | |

Family | Actual: 90Expected: 62 | Actual: 30Expected 58 |

Difference: 90 – 62 = 28 Squared Difference: 784 Divide by Expected: 784/62 = 12.65 | Difference: 30 – 58 = -28 Squared Difference: 784 Divide by Expected: 784/58 = 13.52 | |

Horror | Actual: 45Expected 28.42 | Actual: 10Expected 26.58 |

Difference: 45 – 28.42 = 16.58 Squared Difference: 275.01 Divide by Expected: 275.01/28.42 = 9.68 | Difference: 10 – 26.58 = -16.58 Squared Difference: 275.01 Divide by Expected: 275.01/26.58 = 10.35 |

Lastly, to get our test statistic, we add the numbers in the final row for each cell:

3.29+3.52+5.81+6.21+12.65+13.52+9.68+10.35=65.03

To make our decision, we compare the test statistic to a value from the Chi-square distribution. This activity involves five steps:

- We decide on the risk we are willing to take of concluding that the two variables are not independent when in fact they are. For the movie data, we had decided prior to our data collection that we are willing to take a 5% risk of saying that the two variables – Movie Type and Snack Purchase – are not independent when they really are independent. In statistics-speak, we set the significance level, α, to 0.05.
- We calculate a test statistic. As shown above, our test statistic is 65.03.
- We find the critical value from the Chi-square distribution based on our degrees of freedom and our significance level. This is the value we expect if the two variables are independent.
- The degrees of freedom depend on how many rows and how many columns we have. The degrees of freedom (df) are calculated as:

df=(r−1)×(c−1)In the formula,

*r*is the number of rows, and*c*is the number of columns in our contingency table. From our example, with Movie Type as the rows and Snack Purchase as the columns, we have:

df=(4−1)×(2−1)=3×1=3The Chi-square value with α = 0.05 and three degrees of freedom is 7.815.

- We compare the value of our test statistic (65.03) to the Chi-square value. Since 65.03 > 7.815, we reject the idea that movie type and snack purchases are independent.

We conclude that there *is* some relationship between movie type and snack purchases. The owner of the movie theater cannot estimate how many snacks to buy regardless of the type of movies being shown. Instead, the owner must think about the type of movies being shown when estimating snack purchases.

It’s important to note that we cannot conclude that the type of movie *causes* a snack purchase. The independence test tells us only whether there is a relationship or not; it does not tell us that one variable causes the other.

Let’s use graphs to understand the test and the results.

The side-by-side chart below shows the actual counts in blue, and the expected counts in orange. The counts appear at the top of the bars. The yellow box shows the movie type and snack purchase totals. These totals are needed to find the expected counts.

Figure 1: Bar chart showing the expected and actual counts for the different movie types

Compare the expected and actual counts for the Horror movies. You can see that more people than expected bought snacks and fewer people than expected chose not to buy snacks.

If you look across all four of the movie types and whether or not people bought snacks, you can see that there is a fairly large difference between actual and expected counts for most combinations. The independence test checks to see if the actual data is “close enough” to the expected counts that would occur if the two variables are independent. Even without a statistical test, most people would say that the two variables are not independent. The statistical test provides a common way to make the decision, so that everyone makes the same decision on the data.

The chart below shows another possible set of data. This set has the exact same row and column totals for movie type and snack purchase, but the yes/no splits in the snack purchase data are different.

Figure 2: Bar chart showing the expected and actual counts using different sample data

The purple bars show the actual counts in this data. The orange bars show the expected counts, which are the same as in our original data set. The expected counts are the same because the row totals and column totals are the same. Looking at the graph above, most people would think that the type of movie and snack purchases are independent. If you perform the Chi-square test of independence using this new data, the test statistic is 0.903. The Chi-square value is still 7.815 because the degrees of freedom are still three. You would fail to reject the idea of independence because 0.903 < 7.815. The owner of the movie theater can estimate how many snacks to buy regardless of the type of movies being shown.

Let’s look at the movie-snack data and the Chi-square test of independence using statistical terms.

Our null hypothesis is that the type of movie and snack purchases are independent. The null hypothesis is written as:

H0:Movie Type and Snack purchases are independent

The alternative hypothesis is the opposite.

H0:Movie Type and Snack purchases are not independent

Before we calculate the test statistic, we find the expected counts. This is written as:

Σij=Ri×CjN

The formula is for an *i* x *j* contingency table. That is a table with *i* rows and *j* columns. For example, *E _{11 }*is the expected count for the cell in the first row and first column. The formula shows

We calculate the test statistic using the formula below:

Σi,j=1n=(Oij−Eij)2Eij

In the formula above, we have *n *combinations of rows and columns. The Σ symbol means to add up the calculations for each combination. (We performed these same steps in the Movie-Snack example, beginning in Table 4.) The formula shows *O _{ij }*as the Observed count for the

We then compare the test statistic to the critical Chi-square value corresponding to our chosen alpha value and the degrees of freedom for our data. Using the Movie-Snack data as an example, we had set α = 0.05 and had three degrees of freedom. For the Movie-Snack data, the Chi-square value is written as:

χ0.05,32

There are two possible results from our comparison:

- The test statistic is lower than the Chi-square value. You fail to reject the hypothesis of independence. In the movie-snack example, the theater owner can go ahead with the assumption that the type of movie a person sees has no relationship with whether or not they buy snacks.
- The test statistic is higher than the Chi-square value. You reject the hypothesis of independence. In the movie-snack example, the theater owner cannot assume that there is no relationship between the type of movie a person sees and whether or not they buy snacks.

Let’s use a graph of the Chi-square distribution to better understand the p-values. You are checking to see if your test statistic is a more extreme value in the distribution than the critical value. The graph below shows a Chi-square distribution with three degrees of freedom. It shows how the value of 7.815 “cuts off” 95% of the data. Only 5% of the data from a Chi-square distribution with three degrees of freedom is greater than 7.815.

Figure 3: Graph of Chi-square distribution for three degrees of freedom

The next distribution graph shows our results. You can see how far out “in the tail” our test statistic is. In fact, with this scale, it looks like the distribution curve is at zero at the point at which it intersects with our test statistic. It isn’t, but it is very, very close to zero. We conclude that it is very unlikely for this situation to happen by chance. The results that we collected from our movie goers would be extremely unlikely if there were truly no relationship between types of movies and snack purchases.

Figure 4: Graph of Chi-square distribution for three degrees of freedom with test statistic plotted

Statistical software shows the p-value for a test. This is the likelihood of another sample of the same size resulting in a test statistic more extreme than the test statistic from our current sample, assuming that the null hypothesis is true. It’s difficult to calculate this by hand. For the distributions shown above, if the test statistic is exactly 7.815, then the p*–*value will be p=0.05. With the test statistic of 65.03, the p*–*value is very, very small. In this example, most statistical software will report the p*–*value as “p < 0.0001.” This means that the likelihood of finding a more extreme value for the test statistic using another random sample (and assuming that the null hypothesis is correct) is less than one chance in 10,000.