Suppose we have a set of data X with dimensionality D and total number of data is N. You might encounter insufficient memory problem when D is a large number or simply waste too much time calculating those redundant components when D >> N. Using singular value decomposition can prevent those problems.
From singular value decomposition, we know that a D-by-N matrix X can be decomposed as , where U is composed of the eigenvectors of and of the size D-by-D and V is composed of the eigenvectors of and of size N-by-N. Because is an identity matrix(column vectors in V are orthonormal), the first equation can be written as . From the question, the first m (m<=N) columns of matrix U are actually the principle components of X. We can actually get the first effect m columns of U without actually calculating . Here is how it works.
- Compute .
- Compute the eigevectors and eigenvalues of .
- Because , where is the diagonal matrix of eigenvalues of ,we can conclude, from the last two terms of the euqation, where is the i-th singular value in S, and is the i-th eigenvalues of .
- The i-th column of U can be computed from .
Based on the derivations above, the matlab code is as follows
meanV=mean(x, 2); x=x-repmat(meanV, 1, ndata); % eigenvectors and eigenvalues [v eval] = eig(x'*x); % get eigenvalues eval = diag(eval); % calculate sigma and sigma = repmat(sqrt(eval)',dim,1); UU = (x*v)./sigma; [dd ind] = sort(eval,'descent'); UU = UU(:,ind);
Just found out that the above algorithm can be replaced by a single matlab command.
[U S V] = svd(X,0);
svd(X,0) produce the economy size decomposition. That is only the first N columns of U is returned, given that size of X is D-by-N. Please note that the first parameter of svd is X not X’X.Please leave a comment if this article helps! 🙂