An Exploratory Technique for Investigating Large Quantities of Categorical Data¶
Authors: G. V. Kass
Published: 1980 (Journal Paper)
Source: Applied Statistics
Algorithm: CHAID
DOI: 10.2307/2986296
Summary¶
Introduces CHAID (Chi-square Automatic Interaction Detection), a multi-way decision tree algorithm for categorical outcomes that uses chi-square tests to select split variables and allows more than two branches per node. CHAID pioneered statistical significance testing as the split criterion, addressing AID's bias toward high-cardinality variables.
Abstract¶
The technique set out in the paper, chaid, is an offshoot of aid (Automatic Interaction Detection) designed for a categorized dependent variable. Some important modifications which are relevant to standard aid include: built-in significance testing with the consequence of using the most significant predictor (rather than the most explanatory), multi-way splits (in contrast to binary) and a new type of predictor which is especially useful in handling missing information.
Links¶
Primary
Standard
Alternate
Tags¶
-
Decision trees
-
CHAID
-
Categorical data
-
Chi-square tests
-
Survey analysis
-
Statistical learning