[Insight-users] how to use itk::RegularExpressionSeriesFileNames

Darren Weber darren.weber.lists at gmail.com
Wed Jun 3 14:59:54 EDT 2009


Thanks, Bill.

It's not clear from the header what the regex engine is.  Is it a custom
regex or does it include a common regex library? e.g.:
http://www.gnu.org/s/libc/manual/html_node/Pattern-Matching.html
http://www.gnu.org/s/libc/manual/html_node/Regular-Expressions.html
http://www.pcre.org/

If the class uses a regex library, the documentation could point to online
resources that define the regex language.  Unfortunately, there are subtle
differences among regex libraries and it can be difficult to debug a regex
without extended documentation and examples.

http://www.regular-expressions.info/reference.html
http://www.regular-expressions.info/refadv.html
http://www.regular-expressions.info/refext.html
http://www.regular-expressions.info/refflavors.html

Now I understand that the subMatch is a component of the regex.  (Why didn't
I understand that from the description?)  The header comment makes it clear
that this is

/** The index of the submatch that will be used to sort the matches. */

May I suggest this phrase is included in the description.

So the sub-regex must be defined within the regex using the () notation.
For example, the test contains:

  fit->SetRegularExpression("[^.]*.(.*)");
  fit->SetSubMatch(1);

It appears that the regex engine doesn't require escapes for [] and ().  In
this example, when the SetSubMatch method is called with the numeric
argument, it refers to the sub-regex pattern within "(.*)", which looks like
it might be the filename extension (bmp, gif, png, tif, etc.).  However, the
prior . is not escaped, so it's unclear whether it matches any character
('.') or a period char ('\.' in most regex engines).

So the SetSubMatch method will always take a numeric argument, the index of
the sub-regex (starting at 1, not 0).  For example, the following should be
designed to exclude any files that begin with any number of '.' chars, then
the file name is split into two sub-regex patterns to capture the file name
and the file extension (assuming the file only has one '.' char in it to
separate these parts of the full file name).  The period char '.' is not
part of either sub-regex (unless the full file name has more than one).

  fit->SetRegularExpression("[^.]*(.*)\.(.*)$");
  fit->SetSubMatch(2);

- [^.]* matches zero or more '.' char at the beginning of the string (the
'.' is not escaped within [ ]).
- (.*)\.(.*)$ matches patterns like "abcdef.xyz" at the end of a string, sub
1 is "abcdef", sub 2 is "xyz".

In this pattern, the second subexpression should be the file extension,
without a '.' char.  (Although the effect of this regex may depend on how
greedy the .* pattern is.)  The '\.' prior to the second sub-expression is
used to escape the usual meaning of the '.' char to match any char, so that
the file name can be split into the file name and its extension (assuming
the full file name has only one '.' char in it).

May I suggest a couple of features?

First, the SetSubMatch method could take an array of arguments.  It could be
possible to sort on more than one sub-regex, with the sort precedence based
on the values in the array.  In the example above, a call like the following
would sort first by the file extension, then by the file name.

  unsigned int sub[2] = {2, 1};
  fit->SetSubMatch(sub);

Second, the class might include a convenience method for debugging, to print
the file names.  The method might adapt some of the code in the test.
Perhaps call it PrintFileNames, PrintFileNamesSortedAlpha,
PrintFileNamesSortedNumeric.

Thanks again!

Take care,
Darren




On Wed, Jun 3, 2009 at 5:06 AM, Bill Lorensen <bill.lorensen at gmail.com>wrote:

> The test Testing/Code/IO/itkRegularExpressionSeriesFileNamesTest.cxx
> shows how to sort and print the results.
>
> On Wed, Jun 3, 2009 at 12:59 AM, Darren Weber
> <darren.weber.lists at gmail.com> wrote:
> >
> > The software guide and Examples/IO/ImageSeriesReadWrite2.cxx provide some
> > explanation of how to work with itk::RegularExpressionSeriesFileNames.
> >
> > However, I have not been able to find information on how to specify the
> > regex and sort command line arguments for it.  The file name regex might
> be
> > compatible with grep or sed, or some other regex engine?  What is the
> sort
> > input, is it another regex?
> >
> > What is the best way to debug the regex and sort inputs?  What is the
> > easiest way to get a std:cout list of the files after they are found and
> > sorted?
> >
> > Thanks in advance,
> > Darren
> >
> >
> > PS,
> >
> > Detailed Description
> >
> > Generate an ordered sequence of filenames that match a regular
> expression.
> >
> > This class generates an ordered sequence of files whose filenames match a
> > regular expression.  [What is the regex library?]  The file names are
> sorted
> > using a sub expression match selected by SubMatch.  [What does this
> mean?]
> > Regular expressions are a powerful, compact mechanism for parsing
> strings.
> > Expressions consist of the following metacharacters:
> >
> > ^ Matches at beginning of a line
> >
> > $ Matches at end of a line
> >
> > . Matches any single character
> >
> > [ ] Matches any character(s) inside the brackets
> >
> > [^ ] Matches any character(s) not inside the brackets
> >
> > Matches any character in range on either side of a dash
> >
> > * Matches preceding pattern zero or more times
> >
> > + Matches preceding pattern one or more times
> >
> > ? Matches preceding pattern zero or once only
> >
> > () Saves a matched expression and uses it in a later match
> >
> > Note that more than one of these metacharacters can be used in a single
> > regular expression in order to create complex search patterns. For
> example,
> > the pattern [^ab1-9] says to match any character sequence that does not
> > begin with the characters "ab" followed by numbers in the series one
> through
> > nine.
> >
> > Definition at line 72 of file itkRegularExpressionSeriesFileNames.h.
> >
> > _____________________________________
> > Powered by www.kitware.com
> >
> > Visit other Kitware open-source projects at
> > http://www.kitware.com/opensource/opensource.html
> >
> > Please keep messages on-topic and check the ITK FAQ at:
> > http://www.itk.org/Wiki/ITK_FAQ
> >
> > Follow this link to subscribe/unsubscribe:
> > http://www.itk.org/mailman/listinfo/insight-users
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.itk.org/pipermail/insight-users/attachments/20090603/6c9f1a9a/attachment-0001.htm>


More information about the Insight-users mailing list