## Calculating principal components of data with VERY HIGH dimensionality

Suppose we have a set of data X with dimensionality D and total number of data is N. You might encounter insufficient memory problem when D is a large number or simply waste too much time calculating those redundant components when D >> N. Using singular value decomposition can prevent those problems.

From singular value decomposition, we know that a D-by-N matrix X can be decomposed as $X = USV^T$, where U is composed of the eigenvectors of $XX^T$ and of the size D-by-D and V is composed of the eigenvectors of $X^TX$ and of size N-by-N. Because $V^TV$ is an identity matrix(column vectors in V are orthonormal), the first equation can be written as $XV = US$. From the question, the first m (m<=N) columns of matrix U are actually the principle components of X. We can actually get the first effect m columns of U without actually calculating $XX^T$. Here is how it works.

1. Compute $X^TX$.
2. Compute the eigevectors $V$ and eigenvalues $\mathbf {\gamma}$ of $X^TX$.
3. Because $X^TX =VS^TU^TUSV^T=VS^TSV^T=V\Lambda V^T$, where $\Lambda$ is the diagonal matrix of eigenvalues of $X^TX$,we can conclude, from the last two terms of the euqation, $\sigma_i^2 = \lambda_i$ where $\sigma_i$ is the i-th singular value in S, and $\lambda_i$ is the i-th eigenvalues of $X^TX$.
4. The i-th column of U can be computed from $X\cdot V_i = \sigma_i\cdot U_i$.

Based on the derivations above, the matlab code is as follows

meanV=mean(x, 2);
x=x-repmat(meanV, 1, ndata);
% eigenvectors and eigenvalues
[v eval] =  eig(x'*x);
% get eigenvalues
eval = diag(eval);
% calculate sigma and
sigma = repmat(sqrt(eval)',dim,1);
UU = (x*v)./sigma;
[dd ind] = sort(eval,'descent');
UU = UU(:,ind);

Just found out that the above algorithm can be replaced by a single matlab command.

[U S V] = svd(X,0);

svd(X,0) produce the economy size decomposition. That is only the first N columns of U is returned, given that size of X is D-by-N. Please note that the first parameter of svd is X not X’X.Please leave a comment if this article helps! 🙂