[Insight-users] Performance regression ImageSeriesReader? (with test)

Bradley Lowekamp blowekamp at mail.nih.gov
Tue Mar 23 14:45:02 EDT 2010


Bill,

After my tests I agree that reading the headers in DICOM files is a surprisingly expensive operation as such it should be minimized. The coping of the MDAs is insignificant performance wise.  I believe that the best solution would be to have a dedicated DICOM series readers, which also removes the extra header reads needed for the name generation as well as the extra one in the UpdateOutputInformation.

If we assume that the usually way to utilize the reader is to just Update, or stream Update, then the additional read of the headers appears unnecessary. 

I believe a solution would be to make the GetMDDA method smarter, and by default update this MDDA in the UpdateData. A time stamp would need to be used for the MDDA to check when it needs to be updated in the UpdateData methods. For streaming, the first time through would require reading all of the headers for the MDDA, this should bring the time stamp up to date. The GetMDDA methods could also check this timestamp and perform the reading of the headers if it's out of date. This is my best current idea on how to maintain the 1) and 2) I previously mentioned.

Brad

On Mar 23, 2010, at 12:33 PM, Bill Lorensen wrote:

> Brad,
> 
> I have an itk 2.8 checkout. The difference is due to the processing of
> all files in the GenerateOutputInformation method. In the past, only
> two files were processed. If I restrict the number of files to 2
> rather that number of files, I get pretty reasonable speeds.
> 
> Roger,
> 
> As an experiment (and definitely not a fix!), can you in the method
> void ImageSeriesReader<TOutputImage>
> ::GenerateOutputInformation(void)
> 
> change the line:
> for ( int i = 0; i != numberOfFiles; ++i )
> to
> for ( int i = 0; i != 2; ++i )
> 
> and rerun your tests.
> 
> Bill
> 
> 
> On Tue, Mar 23, 2010 at 8:59 AM, Bradley Lowekamp
> <blowekamp at mail.nih.gov> wrote:
>> Bill,
>> That is only the half of it. Every time an ImageFileReader is used 3 MDDs
>> (meta data dictionaries) are created, one in the ImageIO, one in the
>> ImageFileReader, and one in the output Image. This is in addition to the two
>> copies, you pointed out in ImageSeriesReader. Clearly reading with an
>> ImageFileReader the MDD scales very poorly as the it's size increases. I
>> still have the remaining performance questions:
>> How much time is spent coping the MDD vs reading? (leaning towards reading
>> as very expensive)
>> As pointed out in Roger's most recent performance tests, there appears to be
>> some additional performance problems in the UpdateData, part. This is
>> independent of the additional MDD read in the UpdateOutputInformation. This
>> is definitely another problem, perhaps inside the DICOM library.
>> The change of moving (apparently duplicating) the copying to MDDs to the MDD
>> array was added over a year ago, when streaming support was added. If I
>> recall correctly the two motivating factors were 1) the MDD array is output
>> information and logically should be updating during the
>> UpdateOutputInformation part of the pipeline 2) when streaming each file
>> should not need to be read to create the MMD array. I don't recall where
>> this discussion took place right now.
>> I will run some performance test to try to figure out where the time is
>> being spent. Without changing 1 from above, I am not sure how much could be
>> gained.
>> 
>> Looking at the performance numbers of the Read Directory part, I would guess
>> that the meta data is also read there. I believe that an idea solution would
>> only read this information once. But that is beyond this scope.
>> Brad
>> 
>> On Mar 22, 2010, at 11:20 PM, Bill Lorensen wrote:
>> 
>> Brad,
>> 
>> It looks like the meta data array is populated in both the
>> GenerateOutputInformation and GenerateData. Also all slices are
>> processed in GenerateOutputInformation. In 2.8, only 2 slices were
>> processed.
>> 
>> Why were these changes made? We are also seeing bad dicom performance
>> in Slicer3.
>> 
>> Bill
>> 
>> On Mon, Mar 22, 2010 at 6:24 AM, Bradley Lowekamp
>> <blowekamp at mail.nih.gov> wrote:
>> 
>> Hello,
>> 
>> Can you please tell us a little more about your test data and computer. What
>> 
>> kind of file system is the data on ( locale or network)? How much memory
>> 
>> does the computer have? What is the size of the data? What is the native
>> 
>> pixel type of the data? What are the actual timings? Does the execution seem
>> 
>> to be CPU or IO bound?
>> 
>> One of the changes that was made to the class was to populate the
>> 
>> MetaDataArray in the UpdataOutputInformation phase of the instead of the
>> 
>> UpdateOutputData part. This should be just reading the headers of the files
>> 
>> in the series. There were several reasons this change was made. To help
>> 
>> determine the cause of your slowness, lets break up the timing a little
>> 
>> further.
>> 
>> Could you please call:
>> 
>> start timer
>> 
>> reader->UpdateOutputInformation();
>> 
>> lap timer
>> 
>> reader->UpdateLargestPossibleRegion();
>> 
>> stop timer
>> 
>> And post the timing results.
>> 
>> Thanks,
>> 
>> Brad
>> 
>> On Mar 21, 2010, at 2:52 PM, Roger Bramon Feixas wrote:
>> 
>> This week we updated our ITK version from 2.8 to 3.16  and we noticed the
>> 
>> medical models are loading 2x slower using the 3.16 ITK version. We use
>> 
>> itk::ImageSeriesReader and the problem is focused in its Update() method.
>> 
>> I attached a simple test program which reproduces the problem and where we
>> 
>> can see that the Update() method is 2 times slower using ITK 3.16 vs. ITK
>> 
>> 2.8.
>> 
>> We compiled both versions using Visual Studio 2008 on Windows XP 32bits and
>> 
>>  we don't known if this problem also occurs in other platforms.
>> 
>> I wonder if other itk users have this same performance problem and if there
>> 
>> is anybody can help us in order to solve it.
>> 
>> Thanks!
>> 
>> Roger
>> 
>> <test.zip><ATT00001..txt>
>> 
>> ========================================================
>> 
>> Bradley Lowekamp
>> 
>> Lockheed Martin Contractor for
>> 
>> Office of High Performance Computing and Communications
>> 
>> National Library of Medicine
>> 
>> blowekamp at mail.nih.gov
>> 
>> 
>> _____________________________________
>> 
>> Powered by www.kitware.com
>> 
>> Visit other Kitware open-source projects at
>> 
>> http://www.kitware.com/opensource/opensource.html
>> 
>> Kitware offers ITK Training Courses, for more information visit:
>> 
>> http://www.kitware.com/products/protraining.html
>> 
>> Please keep messages on-topic and check the ITK FAQ at:
>> 
>> http://www.itk.org/Wiki/ITK_FAQ
>> 
>> Follow this link to subscribe/unsubscribe:
>> 
>> http://www.itk.org/mailman/listinfo/insight-users
>> 
>> 
>> 
>> ========================================================
>> 
>> Bradley Lowekamp
>> 
>> Lockheed Martin Contractor for
>> 
>> Office of High Performance Computing and Communications
>> 
>> National Library of Medicine
>> 
>> blowekamp at mail.nih.gov
>> 
>> 

========================================================
Bradley Lowekamp  
Lockheed Martin Contractor for
Office of High Performance Computing and Communications
National Library of Medicine 
blowekamp at mail.nih.gov


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.itk.org/pipermail/insight-users/attachments/20100323/dde7cc6a/attachment-0001.htm>


More information about the Insight-users mailing list