[Insight-users] how to use itk::RegularExpressionSeriesFileNames

Darren Weber darren.weber.lists at gmail.com
Wed Jun 3 16:11:53 EDT 2009


Interesting blog:
http://learningcppisfun.blogspot.com/2007/02/functors-with-state-3-print-contents-of.html

On Wed, Jun 3, 2009 at 12:54 PM, Darren Weber
<darren.weber.lists at gmail.com>wrote:

>
> Oh, the class prints the file names in it's print method:
>
> nameGenerator->Print(std::cout, 0);
>
> A function to print only the file names (based on the test code):
>
> void printFileNames(std::vector<std::string> fileNames)
>     {
>     std::vector<std::string>::iterator nameIdx;
>     std::cout << "File names --------" << std::endl;
>     for (nameIdx = fileNames.begin();
>         nameIdx != fileNames.end();
>         nameIdx++)
>         {
>         std::cout << "File: " << (*nameIdx).c_str() << std::endl;
>         }
>     }
>
> Called as a regular function like:
>
> printFileNames(nameGenerator->GetFileNames());
>
> (BTW, why is the c_str() format used instead of std::cout << "File: " <<
> *nameIdx << std::endl;).
>
> As a method implementation, the call could simplify to something like:
>
> nameGenerator->PrintFileNames();
>
> Or it would have no input args and extract the file names within the method
> implementation (as in the Print method).
>
> Take care,
> Darren
>
>
>
>
> On Wed, Jun 3, 2009 at 11:59 AM, Darren Weber <
> darren.weber.lists at gmail.com> wrote:
>
>>
>> Thanks, Bill.
>>
>> It's not clear from the header what the regex engine is.  Is it a custom
>> regex or does it include a common regex library? e.g.:
>> http://www.gnu.org/s/libc/manual/html_node/Pattern-Matching.html
>> http://www.gnu.org/s/libc/manual/html_node/Regular-Expressions.html
>> http://www.pcre.org/
>>
>> If the class uses a regex library, the documentation could point to online
>> resources that define the regex language.  Unfortunately, there are subtle
>> differences among regex libraries and it can be difficult to debug a regex
>> without extended documentation and examples.
>>
>> http://www.regular-expressions.info/reference.html
>> http://www.regular-expressions.info/refadv.html
>> http://www.regular-expressions.info/refext.html
>> http://www.regular-expressions.info/refflavors.html
>>
>> Now I understand that the subMatch is a component of the regex.  (Why
>> didn't I understand that from the description?)  The header comment makes it
>> clear that this is
>>
>> /** The index of the submatch that will be used to sort the matches. */
>>
>> May I suggest this phrase is included in the description.
>>
>> So the sub-regex must be defined within the regex using the () notation.
>> For example, the test contains:
>>
>>   fit->SetRegularExpression("[^.]*.(.*)");
>>   fit->SetSubMatch(1);
>>
>> It appears that the regex engine doesn't require escapes for [] and ().
>> In this example, when the SetSubMatch method is called with the numeric
>> argument, it refers to the sub-regex pattern within "(.*)", which looks like
>> it might be the filename extension (bmp, gif, png, tif, etc.).  However, the
>> prior . is not escaped, so it's unclear whether it matches any character
>> ('.') or a period char ('\.' in most regex engines).
>>
>> So the SetSubMatch method will always take a numeric argument, the index
>> of the sub-regex (starting at 1, not 0).  For example, the following should
>> be designed to exclude any files that begin with any number of '.' chars,
>> then the file name is split into two sub-regex patterns to capture the file
>> name and the file extension (assuming the file only has one '.' char in it
>> to separate these parts of the full file name).  The period char '.' is not
>> part of either sub-regex (unless the full file name has more than one).
>>
>>   fit->SetRegularExpression("[^.]*(.*)\.(.*)$");
>>   fit->SetSubMatch(2);
>>
>> - [^.]* matches zero or more '.' char at the beginning of the string (the
>> '.' is not escaped within [ ]).
>> - (.*)\.(.*)$ matches patterns like "abcdef.xyz" at the end of a string,
>> sub 1 is "abcdef", sub 2 is "xyz".
>>
>> In this pattern, the second subexpression should be the file extension,
>> without a '.' char.  (Although the effect of this regex may depend on how
>> greedy the .* pattern is.)  The '\.' prior to the second sub-expression is
>> used to escape the usual meaning of the '.' char to match any char, so that
>> the file name can be split into the file name and its extension (assuming
>> the full file name has only one '.' char in it).
>>
>> May I suggest a couple of features?
>>
>> First, the SetSubMatch method could take an array of arguments.  It could
>> be possible to sort on more than one sub-regex, with the sort precedence
>> based on the values in the array.  In the example above, a call like the
>> following would sort first by the file extension, then by the file name.
>>
>>   unsigned int sub[2] = {2, 1};
>>   fit->SetSubMatch(sub);
>>
>> Second, the class might include a convenience method for debugging, to
>> print the file names.  The method might adapt some of the code in the test.
>> Perhaps call it PrintFileNames, PrintFileNamesSortedAlpha,
>> PrintFileNamesSortedNumeric.
>>
>> Thanks again!
>>
>> Take care,
>> Darren
>>
>>
>>
>>
>>
>> On Wed, Jun 3, 2009 at 5:06 AM, Bill Lorensen <bill.lorensen at gmail.com>wrote:
>>
>>> The test Testing/Code/IO/itkRegularExpressionSeriesFileNamesTest.cxx
>>> shows how to sort and print the results.
>>>
>>> On Wed, Jun 3, 2009 at 12:59 AM, Darren Weber
>>> <darren.weber.lists at gmail.com> wrote:
>>> >
>>> > The software guide and Examples/IO/ImageSeriesReadWrite2.cxx provide
>>> some
>>> > explanation of how to work with itk::RegularExpressionSeriesFileNames.
>>> >
>>> > However, I have not been able to find information on how to specify the
>>> > regex and sort command line arguments for it.  The file name regex
>>> might be
>>> > compatible with grep or sed, or some other regex engine?  What is the
>>> sort
>>> > input, is it another regex?
>>> >
>>> > What is the best way to debug the regex and sort inputs?  What is the
>>> > easiest way to get a std:cout list of the files after they are found
>>> and
>>> > sorted?
>>> >
>>> > Thanks in advance,
>>> > Darren
>>> >
>>> >
>>> > PS,
>>> >
>>> > Detailed Description
>>> >
>>> > Generate an ordered sequence of filenames that match a regular
>>> expression.
>>> >
>>> > This class generates an ordered sequence of files whose filenames match
>>> a
>>> > regular expression.  [What is the regex library?]  The file names are
>>> sorted
>>> > using a sub expression match selected by SubMatch.  [What does this
>>> mean?]
>>> > Regular expressions are a powerful, compact mechanism for parsing
>>> strings.
>>> > Expressions consist of the following metacharacters:
>>> >
>>> > ^ Matches at beginning of a line
>>> >
>>> > $ Matches at end of a line
>>> >
>>> > . Matches any single character
>>> >
>>> > [ ] Matches any character(s) inside the brackets
>>> >
>>> > [^ ] Matches any character(s) not inside the brackets
>>> >
>>> > Matches any character in range on either side of a dash
>>> >
>>> > * Matches preceding pattern zero or more times
>>> >
>>> > + Matches preceding pattern one or more times
>>> >
>>> > ? Matches preceding pattern zero or once only
>>> >
>>> > () Saves a matched expression and uses it in a later match
>>> >
>>> > Note that more than one of these metacharacters can be used in a single
>>> > regular expression in order to create complex search patterns. For
>>> example,
>>> > the pattern [^ab1-9] says to match any character sequence that does not
>>> > begin with the characters "ab" followed by numbers in the series one
>>> through
>>> > nine.
>>> >
>>> > Definition at line 72 of file itkRegularExpressionSeriesFileNames.h.
>>> >
>>> > _____________________________________
>>> > Powered by www.kitware.com
>>> >
>>> > Visit other Kitware open-source projects at
>>> > http://www.kitware.com/opensource/opensource.html
>>> >
>>> > Please keep messages on-topic and check the ITK FAQ at:
>>> > http://www.itk.org/Wiki/ITK_FAQ
>>> >
>>> > Follow this link to subscribe/unsubscribe:
>>> > http://www.itk.org/mailman/listinfo/insight-users
>>> >
>>> >
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.itk.org/pipermail/insight-users/attachments/20090603/4f20e228/attachment-0001.htm>


More information about the Insight-users mailing list