Principal Component Analysis - A Geometric Approach
Overview Principal Component Analysis is a neat statistical trick, with a simple bit of linear algebra backing it up. Imagine you've got a dependent, or response variable, $Y$ and a large number of independent, a.k.a. explanatory, variables $X_1, ... X_p$. You also have $n$ measurements of each, giving you a matrix $$ \begin{align} X = & \begin{pmatrix} x_{11} & x_{12} & ... & x_{1n}\\ \vdots & & & \\ x_{p1} & x_{p2} & ... & x_{pn}\\ \end{pmatrix} \\ = & \begin{pmatrix} -\mathbb{x_1}- \\ \vdots \\ -\mathbb{x_p}- \\ \end{pmatrix} \\ \end{align} $$ You want to build a model explaining how $Y$ depends on the $X_i$, perhaps a linear model like $$ Y = \beta_0 + \sum_{i=1}^p \beta_i X_i $$ but first you need to check that the $X_i$ are more or less independent of each other, otherwise there's no way of uniquely setting the $\beta_i$ values and the stats package is likely to produce an unreliable output. In general the $X_i$ are not inde