Principal Component Analysis - A Geometric Approach

Overview Principal Component Analysis is a neat statistical trick, with a simple bit of linear algebra backing it up. Imagine you've got a dependent, or response variable, Y and a large number of independent, a.k.a. explanatory, variables X_1, ... X_p. You also have n measurements of each, giving you a matrix \begin{align} X = & \begin{pmatrix} x_{11} & x_{12} & ... & x_{1n}\\ \vdots & & & \\ x_{p1} & x_{p2} & ... & x_{pn}\\ \end{pmatrix} \\ = & \begin{pmatrix} -\mathbb{x_1}- \\ \vdots \\ -\mathbb{x_p}- \\ \end{pmatrix} \\ \end{align} You want to build a model explaining how Y depends on the X_i, perhaps a linear model like Y = \beta_0 + \sum_{i=1}^p \beta_i X_i but first you need to check that the X_i are more or less independent of each other, otherwise there's no way of uniquely setting the \beta_i values and the stats package is likely to produce an unreliable output. In general the X_i a...