Akaike Information Criterion (AIC) for Comparing Models

Equations

\[AIC = - 2\ln(\hat{L}) + 2k\qquad\ (general\ definition)\] \[AIC = n\ln\left(\frac{SSE}{n}\right) + 2k\qquad\ (statistical\ analysis\ definition)\] \[n = \text{number of observations}\] \[SSE = sum\ of\ squared\ errors = \sum_{i=1}^{n}(y_i - \hat{y}_i)^2\] \[k = \text{number of estimated parameters}\]

Evaluation

AIC evaluation works by taking the difference between AIC (model A) and AIC (model B): AIC(A) – AIC(B)

Burnham-Anderson criteria

ΔAIC	Interpretation
0–2	Essentially equivalent
4–7	Some support for lower model
>10	Very strong evidence for lower AIC model

Link to Claude Shannon’s Information Entropy

Akaike called it an “entropy maximization principle”. Compare with Shannon information entropy equation (one of many ways to express it):

\[H = -\sum_{i=1}^{n} p_i \ln p_i\]

Anthropic’s Claude is named in honor of Shannon.