[Insight-users] how to use itk::RegularExpressionSeriesFileNames

Wed Jun 3 15:54:59 EDT 2009

Oh, the class prints the file names in it's print method:

nameGenerator->Print(std::cout, 0);

A function to print only the file names (based on the test code):

void printFileNames(std::vector<std::string> fileNames)
    {
    std::vector<std::string>::iterator nameIdx;
    std::cout << "File names --------" << std::endl;
    for (nameIdx = fileNames.begin();
        nameIdx != fileNames.end();
        nameIdx++)
        {
        std::cout << "File: " << (*nameIdx).c_str() << std::endl;
        }
    }

Called as a regular function like:

printFileNames(nameGenerator->GetFileNames());

(BTW, why is the c_str() format used instead of std::cout << "File: " <<
*nameIdx << std::endl;).

As a method implementation, the call could simplify to something like:

nameGenerator->PrintFileNames();

Or it would have no input args and extract the file names within the method
implementation (as in the Print method).

Take care,
Darren

On Wed, Jun 3, 2009 at 11:59 AM, Darren Weber
<darren.weber.lists at gmail.com>wrote:

>
> Thanks, Bill.
>
> It's not clear from the header what the regex engine is.  Is it a custom
> regex or does it include a common regex library? e.g.:
> http://www.gnu.org/s/libc/manual/html_node/Pattern-Matching.html
> http://www.gnu.org/s/libc/manual/html_node/Regular-Expressions.html
> http://www.pcre.org/
>
> If the class uses a regex library, the documentation could point to online
> resources that define the regex language.  Unfortunately, there are subtle
> differences among regex libraries and it can be difficult to debug a regex
> without extended documentation and examples.
>
> http://www.regular-expressions.info/reference.html
> http://www.regular-expressions.info/refadv.html
> http://www.regular-expressions.info/refext.html
> http://www.regular-expressions.info/refflavors.html
>
> Now I understand that the subMatch is a component of the regex.  (Why
> didn't I understand that from the description?)  The header comment makes it
> clear that this is
>
> /** The index of the submatch that will be used to sort the matches. */
>
> May I suggest this phrase is included in the description.
>
> So the sub-regex must be defined within the regex using the () notation.
> For example, the test contains:
>
>   fit->SetRegularExpression("[^.]*.(.*)");
>   fit->SetSubMatch(1);
>
> It appears that the regex engine doesn't require escapes for [] and ().  In
> this example, when the SetSubMatch method is called with the numeric
> argument, it refers to the sub-regex pattern within "(.*)", which looks like
> it might be the filename extension (bmp, gif, png, tif, etc.).  However, the
> prior . is not escaped, so it's unclear whether it matches any character
> ('.') or a period char ('\.' in most regex engines).
>
> So the SetSubMatch method will always take a numeric argument, the index of
> the sub-regex (starting at 1, not 0).  For example, the following should be
> designed to exclude any files that begin with any number of '.' chars, then
> the file name is split into two sub-regex patterns to capture the file name
> and the file extension (assuming the file only has one '.' char in it to
> separate these parts of the full file name).  The period char '.' is not
> part of either sub-regex (unless the full file name has more than one).
>
>   fit->SetRegularExpression("[^.]*(.*)\.(.*)$");
>   fit->SetSubMatch(2);
>
> - [^.]* matches zero or more '.' char at the beginning of the string (the
> '.' is not escaped within [ ]).
> - (.*)\.(.*)$ matches patterns like "abcdef.xyz" at the end of a string,
> sub 1 is "abcdef", sub 2 is "xyz".
>
> In this pattern, the second subexpression should be the file extension,
> without a '.' char.  (Although the effect of this regex may depend on how
> greedy the .* pattern is.)  The '\.' prior to the second sub-expression is
> used to escape the usual meaning of the '.' char to match any char, so that
> the file name can be split into the file name and its extension (assuming
> the full file name has only one '.' char in it).
>
> May I suggest a couple of features?
>
> First, the SetSubMatch method could take an array of arguments.  It could
> be possible to sort on more than one sub-regex, with the sort precedence
> based on the values in the array.  In the example above, a call like the
> following would sort first by the file extension, then by the file name.
>
>   unsigned int sub[2] = {2, 1};
>   fit->SetSubMatch(sub);
>
> Second, the class might include a convenience method for debugging, to
> print the file names.  The method might adapt some of the code in the test.
> Perhaps call it PrintFileNames, PrintFileNamesSortedAlpha,
> PrintFileNamesSortedNumeric.
>
> Thanks again!
>
> Take care,
> Darren
>
>
>
>
>
> On Wed, Jun 3, 2009 at 5:06 AM, Bill Lorensen <bill.lorensen at gmail.com>wrote:
>
>> The test Testing/Code/IO/itkRegularExpressionSeriesFileNamesTest.cxx
>> shows how to sort and print the results.
>>
>> On Wed, Jun 3, 2009 at 12:59 AM, Darren Weber
>> <darren.weber.lists at gmail.com> wrote:
>> >
>> > The software guide and Examples/IO/ImageSeriesReadWrite2.cxx provide
>> some
>> > explanation of how to work with itk::RegularExpressionSeriesFileNames.
>> >
>> > However, I have not been able to find information on how to specify the
>> > regex and sort command line arguments for it.  The file name regex might
>> be
>> > compatible with grep or sed, or some other regex engine?  What is the
>> sort
>> > input, is it another regex?
>> >
>> > What is the best way to debug the regex and sort inputs?  What is the
>> > easiest way to get a std:cout list of the files after they are found and
>> > sorted?
>> >
>> > Thanks in advance,
>> > Darren
>> >
>> >
>> > PS,
>> >
>> > Detailed Description
>> >
>> > Generate an ordered sequence of filenames that match a regular
>> expression.
>> >
>> > This class generates an ordered sequence of files whose filenames match
>> a
>> > regular expression.  [What is the regex library?]  The file names are
>> sorted
>> > using a sub expression match selected by SubMatch.  [What does this
>> mean?]
>> > Regular expressions are a powerful, compact mechanism for parsing
>> strings.
>> > Expressions consist of the following metacharacters:
>> >
>> > ^ Matches at beginning of a line
>> >
>> > $ Matches at end of a line
>> >
>> > . Matches any single character
>> >
>> > [ ] Matches any character(s) inside the brackets
>> >
>> > [^ ] Matches any character(s) not inside the brackets
>> >
>> > Matches any character in range on either side of a dash
>> >
>> > * Matches preceding pattern zero or more times
>> >
>> > + Matches preceding pattern one or more times
>> >
>> > ? Matches preceding pattern zero or once only
>> >
>> > () Saves a matched expression and uses it in a later match
>> >
>> > Note that more than one of these metacharacters can be used in a single
>> > regular expression in order to create complex search patterns. For
>> example,
>> > the pattern [^ab1-9] says to match any character sequence that does not
>> > begin with the characters "ab" followed by numbers in the series one
>> through
>> > nine.
>> >
>> > Definition at line 72 of file itkRegularExpressionSeriesFileNames.h.
>> >
>> > _____________________________________
>> > Powered by www.kitware.com
>> >
>> > Visit other Kitware open-source projects at
>> > http://www.kitware.com/opensource/opensource.html
>> >
>> > Please keep messages on-topic and check the ITK FAQ at:
>> > http://www.itk.org/Wiki/ITK_FAQ
>> >
>> > Follow this link to subscribe/unsubscribe:
>> > http://www.itk.org/mailman/listinfo/insight-users
>> >
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.itk.org/pipermail/insight-users/attachments/20090603/978c4298/attachment.htm>