[Insight-users] Performance regression ImageSeriesReader? (with test)

Tue Mar 23 15:39:53 EDT 2010

Bill:

My proposed solution is to use the old behavior but use a time stamp to avoid the extra updates of the MDDA when streaming.

The requested region is set after UpdateOutputInformation is executed, so it can't be toggled during the this phase of execution. 

On Mar 23, 2010, at 3:28 PM, Bill Lorensen wrote:

> Can we check if streaming is on and revert to the old behavior if it is off?
> 
> This is a huge performance penalty to support streaming which is
> important but no the usual use case.
> 
> 
> On Tue, Mar 23, 2010 at 12:12 PM, Bradley Lowekamp
> <blowekamp at mail.nih.gov> wrote:
>> Bill,
>> Going back would have horrible effects for streaming. It would make slice by
>> slice streaming an n^2 algorithm, which is far worse then the current order
>> of N hindrance for normals Updates. We must make some improvements from 2.8.
>> If we declare the the MetaDataDictionary is suppose to be updated in the
>> update data phase. ( the ImageFileReader does it in the
>> UpdateOutputInformation phase ) Then the prior stated point 1 design
>> requirement is gone. And the following solution come to mind:
>> 1) Modify the GetMMDA methods to produce a warning if the update output data
>> has not been called. This is to be nice if some users now expect
>> UpdateOutputInformation to produce the MDDA.
>> 2) Add a time stamp for the MMDA, so that when streaming the MMDA is only
>> updated once and not every time a region is requested.
>> Additionally I believe that we need better DICOM test data which include
>> more tags similar to real world data.
>> Brad
>> On Mar 23, 2010, at 2:54 PM, Bill Lorensen wrote:
>> 
>> The UpdateInformation is supposed to update origin, spacing,
>> direction, pixel type, etc. I don't think it is supposed to completely
>> populate the meta data dictionary. At least until itk 2.8 it did not.
>> Why not revert back to the old behavior as a sort term fix.
>> 
>> I think this performance hit needs to be repaired before we release
>> 3.16. This has been causing major pain for Slicer3 users who
>> frequently use dicom. Fortunately for us, Roger brought it to light.
>> We missed it because our performance testing is weak.
>> 
>> There are other issues for sure.
>> 
>> Bill
>> 
>> On Tue, Mar 23, 2010 at 11:45 AM, Bradley Lowekamp
>> <blowekamp at mail.nih.gov> wrote:
>> 
>> Bill,
>> 
>> After my tests I agree that reading the headers in DICOM files is a
>> 
>> surprisingly expensive operation as such it should be minimized. The coping
>> 
>> of the MDAs is insignificant performance wise.  I believe that the best
>> 
>> solution would be to have a dedicated DICOM series readers, which also
>> 
>> removes the extra header reads needed for the name generation as well as the
>> 
>> extra one in the UpdateOutputInformation.
>> 
>> If we assume that the usually way to utilize the reader is to just Update,
>> 
>> or stream Update, then the additional read of the headers appears
>> 
>> unnecessary.
>> 
>> I believe a solution would be to make the GetMDDA method smarter, and by
>> 
>> default update this MDDA in the UpdateData. A time stamp would need to be
>> 
>> used for the MDDA to check when it needs to be updated in the UpdateData
>> 
>> methods. For streaming, the first time through would require reading all of
>> 
>> the headers for the MDDA, this should bring the time stamp up to date. The
>> 
>> GetMDDA methods could also check this timestamp and perform the reading of
>> 
>> the headers if it's out of date. This is my best current idea on how to
>> 
>> maintain the 1) and 2) I previously mentioned.
>> 
>> Brad
>> 
>> On Mar 23, 2010, at 12:33 PM, Bill Lorensen wrote:
>> 
>> Brad,
>> 
>> I have an itk 2.8 checkout. The difference is due to the processing of
>> 
>> all files in the GenerateOutputInformation method. In the past, only
>> 
>> two files were processed. If I restrict the number of files to 2
>> 
>> rather that number of files, I get pretty reasonable speeds.
>> 
>> Roger,
>> 
>> As an experiment (and definitely not a fix!), can you in the method
>> 
>> void ImageSeriesReader<TOutputImage>
>> 
>> ::GenerateOutputInformation(void)
>> 
>> change the line:
>> 
>> for ( int i = 0; i != numberOfFiles; ++i )
>> 
>> to
>> 
>> for ( int i = 0; i != 2; ++i )
>> 
>> and rerun your tests.
>> 
>> Bill
>> 
>> 
>> On Tue, Mar 23, 2010 at 8:59 AM, Bradley Lowekamp
>> 
>> <blowekamp at mail.nih.gov> wrote:
>> 
>> Bill,
>> 
>> That is only the half of it. Every time an ImageFileReader is used 3 MDDs
>> 
>> (meta data dictionaries) are created, one in the ImageIO, one in the
>> 
>> ImageFileReader, and one in the output Image. This is in addition to the two
>> 
>> copies, you pointed out in ImageSeriesReader. Clearly reading with an
>> 
>> ImageFileReader the MDD scales very poorly as the it's size increases. I
>> 
>> still have the remaining performance questions:
>> 
>> How much time is spent coping the MDD vs reading? (leaning towards reading
>> 
>> as very expensive)
>> 
>> As pointed out in Roger's most recent performance tests, there appears to be
>> 
>> some additional performance problems in the UpdateData, part. This is
>> 
>> independent of the additional MDD read in the UpdateOutputInformation. This
>> 
>> is definitely another problem, perhaps inside the DICOM library.
>> 
>> The change of moving (apparently duplicating) the copying to MDDs to the MDD
>> 
>> array was added over a year ago, when streaming support was added. If I
>> 
>> recall correctly the two motivating factors were 1) the MDD array is output
>> 
>> information and logically should be updating during the
>> 
>> UpdateOutputInformation part of the pipeline 2) when streaming each file
>> 
>> should not need to be read to create the MMD array. I don't recall where
>> 
>> this discussion took place right now.
>> 
>> I will run some performance test to try to figure out where the time is
>> 
>> being spent. Without changing 1 from above, I am not sure how much could be
>> 
>> gained.
>> 
>> Looking at the performance numbers of the Read Directory part, I would guess
>> 
>> that the meta data is also read there. I believe that an idea solution would
>> 
>> only read this information once. But that is beyond this scope.
>> 
>> Brad
>> 
>> On Mar 22, 2010, at 11:20 PM, Bill Lorensen wrote:
>> 
>> Brad,
>> 
>> It looks like the meta data array is populated in both the
>> 
>> GenerateOutputInformation and GenerateData. Also all slices are
>> 
>> processed in GenerateOutputInformation. In 2.8, only 2 slices were
>> 
>> processed.
>> 
>> Why were these changes made? We are also seeing bad dicom performance
>> 
>> in Slicer3.
>> 
>> Bill
>> 
>> On Mon, Mar 22, 2010 at 6:24 AM, Bradley Lowekamp
>> 
>> <blowekamp at mail.nih.gov> wrote:
>> 
>> Hello,
>> 
>> Can you please tell us a little more about your test data and computer. What
>> 
>> kind of file system is the data on ( locale or network)? How much memory
>> 
>> does the computer have? What is the size of the data? What is the native
>> 
>> pixel type of the data? What are the actual timings? Does the execution seem
>> 
>> to be CPU or IO bound?
>> 
>> One of the changes that was made to the class was to populate the
>> 
>> MetaDataArray in the UpdataOutputInformation phase of the instead of the
>> 
>> UpdateOutputData part. This should be just reading the headers of the files
>> 
>> in the series. There were several reasons this change was made. To help
>> 
>> determine the cause of your slowness, lets break up the timing a little
>> 
>> further.
>> 
>> Could you please call:
>> 
>> start timer
>> 
>> reader->UpdateOutputInformation();
>> 
>> lap timer
>> 
>> reader->UpdateLargestPossibleRegion();
>> 
>> stop timer
>> 
>> And post the timing results.
>> 
>> Thanks,
>> 
>> Brad
>> 
>> On Mar 21, 2010, at 2:52 PM, Roger Bramon Feixas wrote:
>> 
>> This week we updated our ITK version from 2.8 to 3.16  and we noticed the
>> 
>> medical models are loading 2x slower using the 3.16 ITK version. We use
>> 
>> itk::ImageSeriesReader and the problem is focused in its Update() method.
>> 
>> I attached a simple test program which reproduces the problem and where we
>> 
>> can see that the Update() method is 2 times slower using ITK 3.16 vs. ITK
>> 
>> 2.8.
>> 
>> We compiled both versions using Visual Studio 2008 on Windows XP 32bits and
>> 
>>  we don't known if this problem also occurs in other platforms.
>> 
>> I wonder if other itk users have this same performance problem and if there
>> 
>> is anybody can help us in order to solve it.
>> 
>> Thanks!
>> 
>> Roger
>> 
>> 
>> ========================================================
>> 
>> Bradley Lowekamp
>> 
>> Lockheed Martin Contractor for
>> 
>> Office of High Performance Computing and Communications
>> 
>> National Library of Medicine
>> 
>> blowekamp at mail.nih.gov
>> 
>> 

========================================================
Bradley Lowekamp  
Lockheed Martin Contractor for
Office of High Performance Computing and Communications
National Library of Medicine 
blowekamp at mail.nih.gov

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.itk.org/pipermail/insight-users/attachments/20100323/32bf481f/attachment-0001.htm>