Bill,<div><br></div><div>Now I'm at home. You will have the results of the experiment tomorrow morning.</div><div><br></div><div>Thanks,</div><div><br></div><div>Roger<br><br><div class="gmail_quote">On Tue, Mar 23, 2010 at 8:44 PM, Bill Lorensen <span dir="ltr"><<a href="mailto:bill.lorensen@gmail.com">bill.lorensen@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Sounds good. So you are saying that in the<br>
ImageSeriesReader<TOutputImage><br>
::GenerateOutputInformation<br>
<br>
that the number of files that are processed will be only 2.<br>
<br>
Bill<br>
<br>
On Tue, Mar 23, 2010 at 12:39 PM, Bradley Lowekamp<br>
<div><div></div><div class="h5"><<a href="mailto:blowekamp@mail.nih.gov">blowekamp@mail.nih.gov</a>> wrote:<br>
> Bill:<br>
> My proposed solution is to use the old behavior but use a time stamp to<br>
> avoid the extra updates of the MDDA when streaming.<br>
> The requested region is set after UpdateOutputInformation is executed, so it<br>
> can't be toggled during the this phase of execution.<br>
><br>
> On Mar 23, 2010, at 3:28 PM, Bill Lorensen wrote:<br>
><br>
> Can we check if streaming is on and revert to the old behavior if it is off?<br>
><br>
> This is a huge performance penalty to support streaming which is<br>
> important but no the usual use case.<br>
><br>
><br>
> On Tue, Mar 23, 2010 at 12:12 PM, Bradley Lowekamp<br>
> <<a href="mailto:blowekamp@mail.nih.gov">blowekamp@mail.nih.gov</a>> wrote:<br>
><br>
> Bill,<br>
><br>
> Going back would have horrible effects for streaming. It would make slice by<br>
><br>
> slice streaming an n^2 algorithm, which is far worse then the current order<br>
><br>
> of N hindrance for normals Updates. We must make some improvements from 2.8.<br>
><br>
> If we declare the the MetaDataDictionary is suppose to be updated in the<br>
><br>
> update data phase. ( the ImageFileReader does it in the<br>
><br>
> UpdateOutputInformation phase ) Then the prior stated point 1 design<br>
><br>
> requirement is gone. And the following solution come to mind:<br>
><br>
> 1) Modify the GetMMDA methods to produce a warning if the update output data<br>
><br>
> has not been called. This is to be nice if some users now expect<br>
><br>
> UpdateOutputInformation to produce the MDDA.<br>
><br>
> 2) Add a time stamp for the MMDA, so that when streaming the MMDA is only<br>
><br>
> updated once and not every time a region is requested.<br>
><br>
> Additionally I believe that we need better DICOM test data which include<br>
><br>
> more tags similar to real world data.<br>
><br>
> Brad<br>
><br>
> On Mar 23, 2010, at 2:54 PM, Bill Lorensen wrote:<br>
><br>
> The UpdateInformation is supposed to update origin, spacing,<br>
><br>
> direction, pixel type, etc. I don't think it is supposed to completely<br>
><br>
> populate the meta data dictionary. At least until itk 2.8 it did not.<br>
><br>
> Why not revert back to the old behavior as a sort term fix.<br>
><br>
> I think this performance hit needs to be repaired before we release<br>
><br>
> 3.16. This has been causing major pain for Slicer3 users who<br>
><br>
> frequently use dicom. Fortunately for us, Roger brought it to light.<br>
><br>
> We missed it because our performance testing is weak.<br>
><br>
> There are other issues for sure.<br>
><br>
> Bill<br>
><br>
> On Tue, Mar 23, 2010 at 11:45 AM, Bradley Lowekamp<br>
><br>
> <<a href="mailto:blowekamp@mail.nih.gov">blowekamp@mail.nih.gov</a>> wrote:<br>
><br>
> Bill,<br>
><br>
> After my tests I agree that reading the headers in DICOM files is a<br>
><br>
> surprisingly expensive operation as such it should be minimized. The coping<br>
><br>
> of the MDAs is insignificant performance wise. I believe that the best<br>
><br>
> solution would be to have a dedicated DICOM series readers, which also<br>
><br>
> removes the extra header reads needed for the name generation as well as the<br>
><br>
> extra one in the UpdateOutputInformation.<br>
><br>
> If we assume that the usually way to utilize the reader is to just Update,<br>
><br>
> or stream Update, then the additional read of the headers appears<br>
><br>
> unnecessary.<br>
><br>
> I believe a solution would be to make the GetMDDA method smarter, and by<br>
><br>
> default update this MDDA in the UpdateData. A time stamp would need to be<br>
><br>
> used for the MDDA to check when it needs to be updated in the UpdateData<br>
><br>
> methods. For streaming, the first time through would require reading all of<br>
><br>
> the headers for the MDDA, this should bring the time stamp up to date. The<br>
><br>
> GetMDDA methods could also check this timestamp and perform the reading of<br>
><br>
> the headers if it's out of date. This is my best current idea on how to<br>
><br>
> maintain the 1) and 2) I previously mentioned.<br>
><br>
> Brad<br>
><br>
> On Mar 23, 2010, at 12:33 PM, Bill Lorensen wrote:<br>
><br>
> Brad,<br>
><br>
> I have an itk 2.8 checkout. The difference is due to the processing of<br>
><br>
> all files in the GenerateOutputInformation method. In the past, only<br>
><br>
> two files were processed. If I restrict the number of files to 2<br>
><br>
> rather that number of files, I get pretty reasonable speeds.<br>
><br>
> Roger,<br>
><br>
> As an experiment (and definitely not a fix!), can you in the method<br>
><br>
> void ImageSeriesReader<TOutputImage><br>
><br>
> ::GenerateOutputInformation(void)<br>
><br>
> change the line:<br>
><br>
> for ( int i = 0; i != numberOfFiles; ++i )<br>
><br>
> to<br>
><br>
> for ( int i = 0; i != 2; ++i )<br>
><br>
> and rerun your tests.<br>
><br>
> Bill<br>
><br>
><br>
> On Tue, Mar 23, 2010 at 8:59 AM, Bradley Lowekamp<br>
><br>
> <<a href="mailto:blowekamp@mail.nih.gov">blowekamp@mail.nih.gov</a>> wrote:<br>
><br>
> Bill,<br>
><br>
> That is only the half of it. Every time an ImageFileReader is used 3 MDDs<br>
><br>
> (meta data dictionaries) are created, one in the ImageIO, one in the<br>
><br>
> ImageFileReader, and one in the output Image. This is in addition to the two<br>
><br>
> copies, you pointed out in ImageSeriesReader. Clearly reading with an<br>
><br>
> ImageFileReader the MDD scales very poorly as the it's size increases. I<br>
><br>
> still have the remaining performance questions:<br>
><br>
> How much time is spent coping the MDD vs reading? (leaning towards reading<br>
><br>
> as very expensive)<br>
><br>
> As pointed out in Roger's most recent performance tests, there appears to be<br>
><br>
> some additional performance problems in the UpdateData, part. This is<br>
><br>
> independent of the additional MDD read in the UpdateOutputInformation. This<br>
><br>
> is definitely another problem, perhaps inside the DICOM library.<br>
><br>
> The change of moving (apparently duplicating) the copying to MDDs to the MDD<br>
><br>
> array was added over a year ago, when streaming support was added. If I<br>
><br>
> recall correctly the two motivating factors were 1) the MDD array is output<br>
><br>
> information and logically should be updating during the<br>
><br>
> UpdateOutputInformation part of the pipeline 2) when streaming each file<br>
><br>
> should not need to be read to create the MMD array. I don't recall where<br>
><br>
> this discussion took place right now.<br>
><br>
> I will run some performance test to try to figure out where the time is<br>
><br>
> being spent. Without changing 1 from above, I am not sure how much could be<br>
><br>
> gained.<br>
><br>
> Looking at the performance numbers of the Read Directory part, I would guess<br>
><br>
> that the meta data is also read there. I believe that an idea solution would<br>
><br>
> only read this information once. But that is beyond this scope.<br>
><br>
> Brad<br>
><br>
> On Mar 22, 2010, at 11:20 PM, Bill Lorensen wrote:<br>
><br>
> Brad,<br>
><br>
> It looks like the meta data array is populated in both the<br>
><br>
> GenerateOutputInformation and GenerateData. Also all slices are<br>
><br>
> processed in GenerateOutputInformation. In 2.8, only 2 slices were<br>
><br>
> processed.<br>
><br>
> Why were these changes made? We are also seeing bad dicom performance<br>
><br>
> in Slicer3.<br>
><br>
> Bill<br>
><br>
> On Mon, Mar 22, 2010 at 6:24 AM, Bradley Lowekamp<br>
><br>
> <<a href="mailto:blowekamp@mail.nih.gov">blowekamp@mail.nih.gov</a>> wrote:<br>
><br>
> Hello,<br>
><br>
> Can you please tell us a little more about your test data and computer. What<br>
><br>
> kind of file system is the data on ( locale or network)? How much memory<br>
><br>
> does the computer have? What is the size of the data? What is the native<br>
><br>
> pixel type of the data? What are the actual timings? Does the execution seem<br>
><br>
> to be CPU or IO bound?<br>
><br>
> One of the changes that was made to the class was to populate the<br>
><br>
> MetaDataArray in the UpdataOutputInformation phase of the instead of the<br>
><br>
> UpdateOutputData part. This should be just reading the headers of the files<br>
><br>
> in the series. There were several reasons this change was made. To help<br>
><br>
> determine the cause of your slowness, lets break up the timing a little<br>
><br>
> further.<br>
><br>
> Could you please call:<br>
><br>
> start timer<br>
><br>
> reader->UpdateOutputInformation();<br>
><br>
> lap timer<br>
><br>
> reader->UpdateLargestPossibleRegion();<br>
><br>
> stop timer<br>
><br>
> And post the timing results.<br>
><br>
> Thanks,<br>
><br>
> Brad<br>
><br>
> On Mar 21, 2010, at 2:52 PM, Roger Bramon Feixas wrote:<br>
><br>
> This week we updated our ITK version from 2.8 to 3.16 and we noticed the<br>
><br>
> medical models are loading 2x slower using the 3.16 ITK version. We use<br>
><br>
> itk::ImageSeriesReader and the problem is focused in its Update() method.<br>
><br>
> I attached a simple test program which reproduces the problem and where we<br>
><br>
> can see that the Update() method is 2 times slower using ITK 3.16 vs. ITK<br>
><br>
> 2.8.<br>
><br>
> We compiled both versions using Visual Studio 2008 on Windows XP 32bits and<br>
><br>
> we don't known if this problem also occurs in other platforms.<br>
><br>
> I wonder if other itk users have this same performance problem and if there<br>
><br>
> is anybody can help us in order to solve it.<br>
><br>
> Thanks!<br>
><br>
> Roger<br>
><br>
><br>
> ========================================================<br>
><br>
> Bradley Lowekamp<br>
><br>
> Lockheed Martin Contractor for<br>
><br>
> Office of High Performance Computing and Communications<br>
><br>
> National Library of Medicine<br>
><br>
> <a href="mailto:blowekamp@mail.nih.gov">blowekamp@mail.nih.gov</a><br>
><br>
><br>
><br>
> ========================================================<br>
><br>
> Bradley Lowekamp<br>
><br>
> Lockheed Martin Contractor for<br>
><br>
> Office of High Performance Computing and Communications<br>
><br>
> National Library of Medicine<br>
><br>
> <a href="mailto:blowekamp@mail.nih.gov">blowekamp@mail.nih.gov</a><br>
><br>
><br>
</div></div></blockquote></div><br></div>