Calculating principal components of data with VERY HIGH dimensionality

Suppose we have a set of data X with dimensionality D and total number of data is N. You might encounter insufficient memory problem when D is a large number or simply waste too much time calculating those redundant components when D >> N. Using singular value decomposition can prevent those problems.

From singular value decomposition, we know that a D-by-N matrix X can be decomposed as X = USV^T, where U is composed of the eigenvectors of XX^T and of the size D-by-D and V is composed of the eigenvectors of X^TX and of size N-by-N. Because V^TV is an identity matrix(column vectors in V are orthonormal), the first equation can be written as XV = US. From the question, the first m (m<=N) columns of matrix U are actually the principle components of X. We can actually get the first effect m columns of U without actually calculating XX^T. Here is how it works.

  1. Compute X^TX.
  2. Compute the eigevectors V and eigenvalues \mathbf {\gamma} of X^TX.
  3. Because X^TX =VS^TU^TUSV^T=VS^TSV^T=V\Lambda V^T, where \Lambda is the diagonal matrix of eigenvalues of X^TX,we can conclude, from the last two terms of the euqation, \sigma_i^2 = \lambda_i where \sigma_i is the i-th singular value in S, and \lambda_i is the i-th eigenvalues of X^TX.
  4. The i-th column of U can be computed from X\cdot V_i = \sigma_i\cdot U_i.

Based on the derivations above, the matlab code is as follows

meanV=mean(x, 2);
x=x-repmat(meanV, 1, ndata);
% eigenvectors and eigenvalues
[v eval] =  eig(x'*x);
% get eigenvalues
eval = diag(eval);
% calculate sigma and
sigma = repmat(sqrt(eval)',dim,1);
UU = (x*v)./sigma;
[dd ind] = sort(eval,'descent');
UU = UU(:,ind);

Just found out that the above algorithm can be replaced by a single matlab command.

[U S V] = svd(X,0);

svd(X,0) produce the economy size decomposition. That is only the first N columns of U is returned, given that size of X is D-by-N. Please note that the first parameter of svd is X not X’X.Please leave a comment if this article helps! 🙂


One Response to Calculating principal components of data with VERY HIGH dimensionality

  1. Lalit says:

    This decomposition of SVD helps me lot..

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: