[Insight-developers] New workflow to add test data
Brad King
brad.king at kitware.com
Mon Jun 20 14:37:43 EDT 2011
On 06/20/2011 01:31 PM, Cory Quammen wrote:
> My biggest trouble was thinking I had all the scripts for doing this
> up to date.
[snip]
> shouldn't be a problem as people base branches on more recent commits
> to master.
Okay, thanks.
> The content link and real data object aren't interchangeable if you
> want to edit the real data object after generating the content link.
> Editing the data object may be a relatively rare thing to do compared
> to totally regenerating it and copying it into the tree, but I think
> it will happen.
The approach centers around copying test baselines and inputs from
outside into the source tree. It is not intended for incremental
in-place editing of a data file as if it were a source file. If a
test output is regenerated one needs to copy it back into the source
tree anyway.
What kind of files are you editing? If a file is editable text one
might argue that it is a source file and should not be treated like
data with this approach in the first place.
> If the process continues to remove the data object files, then it
> would be good to add a warning to the wiki page stating that the real
> data object will disappear after running CMake, so keeping a copy of
> the data object outside the source tree is advisable.
I added a note in the new instructions page in the extra-info column
on the right of that step:
http://www.itk.org/Wiki/ITK/Git/Develop/Data#Run_CMake
It links to the details discussion at the bottom of the page which
says where the data go.
>>> It would be nice if the test data could be left in place
>>
>> I originally had this goal in mind. However, I later realized that it
>> is both hard to do and conceptually incorrect:
One more point:
- The real data object must go into a .ExternalData_${algo}_${hash} file.
If the original file were to stay around then we would need to make a
copy. Since the approach is meant to work equally well with large data
files we should keep as few copies as possible. Right now we make no
copies and just move/rename the original file a couple of times.
>> so this requires a separate "git add" for every content link rather
>> than "git add ." in the directory.
>
> That is acceptable to me.
The DATA{} syntax supports image series as documented in the ExternalData
module so some developers may add a large number of files at once. I
think it is too likely that a real file may leak through in this case.
> Not to get too off topic, but it seems like there may also be a
> problem if a developer moves source code files via "mv" into the tree.
> Is that the case?
Yes, but data files are much more likely to be moved, especially in the
case of test baselines that are originally written to the build tree.
> I was thinking that you could prevent the commit of a data file if
> there were a content link corresponding to the data file name (e.g., I
> accidentally commit "test.png" but its content link "test.png.md5" is
> also in the commit). If there is no .md5 file for a given image file,
> then it should be up to Gerrit reviews to determine whether that file
> gets in, as you say.
Great idea, thanks! I'll look at adding that because it is useful
independent of the above discussion.
-Brad
More information about the Insight-developers
mailing list