[Insight-users] PCAShapeModelEstimator -- seems way too slow?

Sat May 15 00:17:30 EDT 2004

Sayan,

I am most familiar with how PCA works and the particular strategies 
used to speed the calculations, both in general and within ITK 
specifically. Moreover, I understand that a data set such as mine does 
defeat the ITK optimization, as I mentioned earlier.

>> (Though I do realize that a dataset such as mine with more images 
>> than pixels per image does defeat some of the optimizations rolled 
>> in...)

Nevertheless, calculating the inner product A'A (a ~1000x1000 matrix) 
and its eigenvectors *SHOULD* take a minute at most. To verify this, I 
performed the same calculations in ITK on my machine, matlab on a 450 
Mhz sparcII, and on numerical python on my machine.

ITK takes about 10-15 minutes to compute the PCA decomposition for 1100 
images of 300 px each. Now my test results:

First: Matlab calculates eig(A'*A) in about 30 sec, even when A is 300 
x 1100. This is the "particularly slow way" of getting the results, as 
I mentioned earlier. So there is a true and significant difference 
between Matlab and the ITK/VNL code, *regardless* of the fact that my 
data set is a strange shape.

Second: For a better comparison than matlab running on another machine, 
I ran some numeric python code to compute the eigenvectors of A'A on my 
personal machine. This calls BLAS and LAPACK, and even for the 300x1000 
case, the calculation completes within a minute. So, again, the ITK PCA 
routine is seeing a profound slowdown somewhere.

Conclusion: There is some slowdown somewhere, either in how I compiled 
the code, or in how I call the code, or in how the code is implemented, 
that is independent of the fact that I'm passing in suboptimal data. 
Perhaps the slowness isn't in the actual VNL linear algebra code, but 
in the wrappers that pack the matrices. I'll try to profile the whole 
run and look.

Zach

Department of Biochemistry and Program in Biomedical Informatics
Stanford University School of Medicine

On May 14, 2004, at 2:19 PM, Sayan Pathak wrote:

> Hi Zach,
> Classically, the PCA can be calculated by identifying the eigen vectors
> corresponding to the k largest eigen values in the covariance matrix
> (generated from the data). Owing to the fact that in image processing, 
> one
> uses images with large number of pixels, the implementation in ITK is 
> bit
> different and the logic behind choosing such an approach is explained 
> in the
> documentation of the itkImagePCAShapeModelEstimator class. Here is an 
> excerpt
> from the class documentation explaining the logic:
>
>
>  * To speed the computation instead of performing the eigen analysis 
> of the
>  * covariance vector A*A' where A is a matrix with p x t, p = number of
>  * pixels or voxels in each images and t = number of training images, 
> we
>  * calculate the eigen vectors of the inner product matrix A'*A. The
> resulting
>  * eigen vectors (E) are then multiplied with the matrix A to get the
>  * principal components. The covariance matrix has a dimension of p x 
> p.
> Since
>  * number of pixels in any image being typically very high the eigen
>  * decomposition becomes computationally expensive. The inner product 
> on the
>  * other hand has the dimension of t x t, where t is typically much 
> smaller
>  * that p. Hence the eigen decomposition (most compute intensive part) 
> is
>  * orders of magnitude faster.
>
> Lets consider a typical data set of 100 images each of the size 256 x 
> 256
> pixels. The A matrix will have a dimension of 256^2 x 100. As per the
> classical approach, the covariance matrix will have a dimension of 
> 256^2 x
> 256^2. Using our approach the inner product matrix dimension reduces 
> to 100 x
> 100. These leads to a huge reduction in both computation and the memory
> requirements.
>
> In your case, the A matrix will have a size of 300 x 1100. The 
> covariance
> matrix will have a dimension of 300 x 300, while the inner product 
> matrix
> dimension will 1100 x 1100. This would explain the large computation 
> time you
> are seeing.
>
> One approach could be to compute the covariance matrix followed by k 
> largest
> eigen vector identification using the existing functions in ITK. 
> Ideally, the
> approach would be to have both the methods implemented and switch 
> between one
> or the other depending on the size of the training set, the two 
> scenarios
> being large images with fewer training sets (use inner product 
> matrix)and
> small images with large number of training sets (use the covariance 
> matrix).
>
> Hope this explanation helps. Thanks,
>
> Sayan
>
>
>
>
>
>> Hello,
>
>> Recently, I've been doing some PCA work, and I noticed that itk's
>> ImagePCAShapeModelEstimator seems to be surprisingly slow.
>
>> I have, for example, 1100 images of size 20x15 -- a pretty modest data
>> set to do PCA on. Matlab (running on a 450 MHz ultrasparc-II) can
>> compute the principal components on such a dataset in a few seconds,
>> even when I intentionally do things in a particularly slow way.
>
>> Using the ITK PCA estimator, this same operation takes 15+ minutes on
>> my personal machine (867 mhz g4). It's not a RAM or VM issue since the
>> process never seems to grab more than 100 megs of RAM, and doesn't hit
>> the disk at all.
>
>> This seems especially strange given the effort in the PCA class's
>> implementation toward efficiency. (Though I do realize that a dataset
>> such as mine with more images than pixels per image does defeat some 
>> of
>> the optimizations rolled in...)
>
>> What could I possibly be doing wrong? I profiled the itk PCA code, and
>> nothing looks overtly wrong -- it just seems to take way too long!
>
>> Zach Pincus
>