Skip to content

An Exploratory Technique for Investigating Large Quantities of Categorical Data

Authors: G. V. Kass

Published: 1980 (Journal Paper)

Source: Applied Statistics

Algorithm: CHAID

DOI: 10.2307/2986296

Summary

Introduces CHAID (Chi-square Automatic Interaction Detection), a multi-way decision tree algorithm for categorical outcomes that uses chi-square tests to select split variables and allows more than two branches per node. CHAID pioneered statistical significance testing as the split criterion, addressing AID's bias toward high-cardinality variables.

Abstract

The technique set out in the paper, chaid, is an offshoot of aid (Automatic Interaction Detection) designed for a categorized dependent variable. Some important modifications which are relevant to standard aid include: built-in significance testing with the consequence of using the most significant predictor (rather than the most explanatory), multi-way splits (in contrast to binary) and a new type of predictor which is especially useful in handling missing information.

Tags

  • Decision trees

  • CHAID

  • Categorical data

  • Chi-square tests

  • Survey analysis

  • Statistical learning