[Insight-developers] Need to add images for new tests to Data -- how to do that in a Gerrit topic?

Gaëtan Lehmann gaetan.lehmann at jouy.inra.fr
Thu Nov 4 13:46:36 EDT 2010


Le 4 nov. 10 à 17:11, Matthew McCormick (thewtex) a écrit :

> Git submodules are a fantastic thing.  A git submodule is simply a
> pointer to the
>
> 1. URL of the sub-repository
> 2. The path in the tree where it is located
> 3. The commit that the submodule is at for the current commit.
>
> The first two are simply kept in the .gitmodules file of the
> repository.  The third has great integration with git, so it is
> efficient to work with.
>
> It means you can do the following
>
>  git checkout -b examine_an_old_version <some old commit hash>
>  git submodule update
>
> And all the submodules will be at the state they were at in the old  
> commit.
>
> Many of the projects in the Utilities/ direction should be submodules.
> With a submodule,
>
> 1.  The commit history of a separate logical unit is kept separate
> 2.  It is easier to bring in changes from upstream.
> 3.  It is easier to push changes upstream.
> 4.  Copied code is kept to a minimum.
>

Sounds nice for the Utilities.
IMO, we should put KWStyle there too, as a submodule.

But for the test it seem quite artificial, as the baseline are part of  
the test.
It also add a lot of complexity to create a new test with new test  
files:

Previously we had to do

   cvs add Testing/Code/...
   cvs add Testing/Data/...
   cvs ci

and now

   cd Testing/Data
   git add ...
   git commit
   git push
   cd -
   git add Testing/Data
   git add Testing/Code/...
   git commit
   git config "hooks.Testing/Data.update" 085e657..9dc1292
   git commit
   git push

*ouch*

It also makes more difficult to submit patches with new data files to  
gerrit for the developers without write access to ITKData.git.
And it means that a file can't be reviewed with the test before being  
commited in ITKData.git - if the change is rejected, the data still  
stay there for ever.

As Bill said, we should make it simpler.

> For an example of a Kitware project that does a nice job of using
> submodules, see Paraview.
>
> However, having Testing/Data as a submodule is a workaround for a
> deficiency in distributed version control systems.  I, too, once had a
> Testing/Data repository that I tried to keep all the images for my
> personal projects.  But it quickly became too big.  It took too long
> to download and it took up too much disk space.  For a project like
> ITK, bringing the Testing/Data into the main repository is not
> feasible.

Really? Testing/Data takes 74 GB, including the history.
ITK takes 148 MB, including the history.
So the amount of data seems quite reasonable to me, and that's almost  
10 years of accumulated changes.
Linux takes 834 MB - way more than ITK + its data.


[glehmann at gbook ITK]$ du -sh Testing/Data/
  74M	Testing/Data/
[glehmann at gbook ITK]$ du -sh Testing/Data/.git
  38M	Testing/Data/.git
[glehmann at marvin tmp]$ du -sh ITK
148M	ITK
[glehmann at marvin tmp]$ du -sh ITK/.git
60M	ITK/.git
[glehmann at marvin tmp]$ du -sh linux-2.6.36.y/
834M	linux-2.6.36.y/
[glehmann at marvin tmp]$ du -sh linux-2.6.36.y/.git
388M	linux-2.6.36.y/.git


>  I think the MIDAS solution sounds interesting, and it could
> be of use to the general git community.  We need a way to grab a
> repository without downloading the entire Git history.  Yet, version
> correspondence should still be intact.

Can't we use a shallow clone for that?

Regards,

Gaëtan

>
> Regards,
> Matt
>
> On Thu, Nov 4, 2010 at 9:35 AM, Bill Lorensen  
> <bill.lorensen at gmail.com> wrote:
>> Please keep the process as simple as possible. We need to encourage  
>> testing.
>>
>> My preference would be to move the Testing baselines into the main
>> repository, just like it was before.  Adding a baseline should be as
>> painless as possible. The Testing data can reside somewhere else,  
>> but,
>> as Kent said, it should be downloadable all at once to allow testing
>> without an internet connection.
>>
>> I think the current setup is too complicated. Actually, I don't  
>> understand it.
>>
>> On Thu, Nov 4, 2010 at 10:18 AM, kent williams
>> <norman-k-williams at uiowa.edu> wrote:
>>> I don’t object to using one of those remote repository methods to  
>>> store test
>>> data, with these concerns:
>>>
>>> The complexity of the process needs to be managed.  If it means I  
>>> have learn
>>> a new API and write an extra hundred lines of code just to pull an  
>>> image
>>> down for testing, that’s breaking the current process.
>>> There should be a way to pull down the testing data corpus as a  
>>> whole. You
>>> should be able to run CTest without an Internet connection.
>>> I’m not sure anyone wants to go back and re-write every existing  
>>> regression
>>> test.  Even if it only takes 10 minutes a test that would be  
>>> nearly 7 weeks
>>> of full time work.
>>>
>>> Some of these concerns could be addressed with CMake --
>>>
>>> Have a CMake Macro that can grab a file based on URL or whatever,  
>>> and store
>>> it in the Data directory. Maybe roll this into add_test() --  
>>> something like
>>> add_test(<current params> REQUIRED_DATA_FILES <list of URLs or  
>>> whatever>)
>>> Have a top level CMake option that defaults false —
>>> FETCH_AND_CACHE_TEST_DATA
>>>
>>>
>>>
>>> On 11/4/10 7:58 AM, "Hans Johnson" <hans-johnson at uiowa.edu> wrote:
>>>
>>> We should revisit the earlier discussed topic of removing data  
>>> from the SCM
>>> repository all together, and make all data accessible through a  
>>> public
>>> repository outside the ITK development tree (MIDAS, http, XNAT,  
>>> something
>>> else, all of these....).
>>>
>>> The data should be dynamically downloaded during the running of  
>>> the tests
>>> and cached locally when needed.
>>>
>>> Hans
>>>
>>> _______________________________________________
>>> Powered by www.kitware.com
>>>
>>> Visit other Kitware open-source projects at
>>> http://www.kitware.com/opensource/opensource.html
>>>
>>> Kitware offers ITK Training Courses, for more information visit:
>>> http://kitware.com/products/protraining.html
>>>
>>> Please keep messages on-topic and check the ITK FAQ at:
>>> http://www.itk.org/Wiki/ITK_FAQ
>>>
>>> Follow this link to subscribe/unsubscribe:
>>> http://www.itk.org/mailman/listinfo/insight-developers
>>>
>>>
>> _______________________________________________
>> Powered by www.kitware.com
>>
>> Visit other Kitware open-source projects at
>> http://www.kitware.com/opensource/opensource.html
>>
>> Kitware offers ITK Training Courses, for more information visit:
>> http://kitware.com/products/protraining.html
>>
>> Please keep messages on-topic and check the ITK FAQ at:
>> http://www.itk.org/Wiki/ITK_FAQ
>>
>> Follow this link to subscribe/unsubscribe:
>> http://www.itk.org/mailman/listinfo/insight-developers
>>
> _______________________________________________
> Powered by www.kitware.com
>
> Visit other Kitware open-source projects at
> http://www.kitware.com/opensource/opensource.html
>
> Kitware offers ITK Training Courses, for more information visit:
> http://kitware.com/products/protraining.html
>
> Please keep messages on-topic and check the ITK FAQ at:
> http://www.itk.org/Wiki/ITK_FAQ
>
> Follow this link to subscribe/unsubscribe:
> http://www.itk.org/mailman/listinfo/insight-developers

-- 
Gaëtan Lehmann
Biologie du Développement et de la Reproduction
INRA de Jouy-en-Josas (France)
tel: +33 1 34 65 29 66    fax: 01 34 65 29 09
http://voxel.jouy.inra.fr  http://www.itk.org
http://www.mandriva.org  http://www.bepo.fr

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 203 bytes
Desc: Ceci est une signature ?lectronique PGP
URL: <http://www.itk.org/mailman/private/insight-developers/attachments/20101104/3ba0451f/attachment.pgp>


More information about the Insight-developers mailing list