Chi-Square Goodness of Fit Test Tutorial
When to use this tool
Use the Chi-Square Goodness of Fit test to determine whether a set of observed data on a categorical variable differs significantly from a theoretical or expected distribution. The test compares the observed frequencies in each category to the expected frequencies based on the hypothesized distribution, allowing you to assess if the sample data is likely to have come from that population distribution. In EnigneRoom, you can test whether the proportions are equal across the categories (uniform), specify a different proportion for each category, or specify historical counts for each category. As an example, a manufacturing company produces different colored lightbulbs, claiming that 10% of their production is red, 30% blue, 40% green, and 20% yellow. To test this claim, a quality inspector randomly samples 200 bulbs and records their colors, using a Chi-Square goodness of fit test to see if the observed distribution matches the company's stated proportions.
Using EngineRoom
The data set consists of the categorical variable, Lightbulb Color, listing the unique categories of colors, the count of each color observed in the sample and the expected proportion of each color.
Select the Chi-Square Goodness of Fit test from the Non-parametric menu under Analyze.
Example:
Note: This example has the Guided Mode disabled, so it combines some steps in one dialog box. You can enable or disable Guided Mode from the User menu on the top right of the EngineRoom workspace.
Steps:
- Select the Analyze menu > Non-parametric > Chi-Square Goodness of Fit
- Drag the Lightbulb Color variable onto the Categories Variable drop zone on the study:
- Drag the Observed Counts variable onto the Observed Values Variable drop zone on the study:
- Drag the Expected Proportions variable onto the Expected Values Variable drop zone on the study:
- Note that when you drag on an Expected proportion or count variable, the 'Expected Type: Equal Proportions' option is automatically disabled. This option is only enabled if there is no variable attached to the Expected Values Variable dropzone on the study, and it automatically tests the observed data against a distribution consisting of equal proportions for all the levels of the categorical variable.
- The option 'Choose whether your expected value is a proportion or a count value' allows you to specify either a proportion or a count for the expected values - in our data they're proportions, so leave this setting at the default:
- Finally, the alternative hypothesis is always the not-equal to case (see the null and alternative hypotheses spelled out at the bottom of the study screen). You can enter an alpha value here, or the test will use the default value of 5%:
- Click Continue to see the Chi-Square Goodness of Fit test output:
The conclusion statement states the results of the test in plain English, followed by the null and alternative hypothesis statements.
The numeric output on the left shows the Test table showing the total sample size, test statistic value, degrees of freedom and p-value. In these results, the p-value is 0, which is smaller than the chosen alpha value of 0.05, so you reject the null hypothesis and conclude that observed proportions are significantly different from the specified proportions.
The Observed vs Expected Count table shows the actual number of observations in each category in the sample and the number of observations that you would expect to occur if the test proportions were true, along with the 95% confidence intervals around the observed counts. The Observed vs Expected Proportion table shows the same for the proportions.
The graphical output on the right shows the diagnostic plots. The chart on the right is a bar chart of each category's observed and expected values - you can use this chart to determine whether there is a difference in a particular category.
The chart on the left takes the difference of the observed - expected count and displays it as a proportion of the expected count. The direction of the bars in this chart indicate whether the difference is positive (bar to the right of the vertical line) or negative (bar to the left of the vertical line). The length of the bars indicate the relative distance of the observed from the expected value.
The p-value chart can be seen by clicking the link next to the diagnostic plots link - it shows the p-value falls within the blue rejection region for the test:
Notes:
- It is important to consider the sample size and ensure that the observed and expected counts in each category are sufficiently large for a reliable Chi-Square test. The expected count should be at least 5 in each category for a valid test result. The observed count should be at least 5 in each category for valid confidence intervals.
- If you have raw data - that is, data listing each category as it occurs, you can use this test to evaluate whether the data categories have equal proportions (a uniform distribution.)
- You can change the test settings by clicking on the Test setup button:
Chi-Square Goodness of Fit Test Video Tutorial
Coming Soon
Was this helpful?