Hi Darren,<br><br>Thanks for pointing this out,<br><br>Yes, <br>the documentation doesn't fully specify the behavior <br>of this regular expression engine.<br><br><br>The actual functionality is provided by the kwsys library in<br>
<br> Insight/Utilities/kwsys<br><br>See the files<br><br> RegularExpression.cxx<br> <a href="http://RegularExpression.hxx.in">RegularExpression.hxx.in</a><br>
<br>You will notice that the documentation for <br><br> itkRegularExpressionSeriesFileNames <br><br>was extracted from the documentation in<br><br> <a href="http://RegularExpression.hxx.in">RegularExpression.hxx.in</a><br>
<br><br>This is a custom regular expression engine,<br>based on code that (at some point) was developed<br>at Texas Instruments.<br><br>If you want more details about the history of this code<br>we could track the sources of the files in kwsys.<br>
<br><br> Regards,<br><br><br> Luis<br><br><br>----------------------------------------<br><div class="gmail_quote">On Wed, Jun 3, 2009 at 2:59 PM, Darren Weber <span dir="ltr"><<a href="mailto:darren.weber.lists@gmail.com">darren.weber.lists@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>Thanks, Bill.<br><br>It's not clear from the header what the regex engine is. Is it a custom regex or does it include a common regex library? e.g.:<br>
<a href="http://www.gnu.org/s/libc/manual/html_node/Pattern-Matching.html" target="_blank">http://www.gnu.org/s/libc/manual/html_node/Pattern-Matching.html</a><br>
<a href="http://www.gnu.org/s/libc/manual/html_node/Regular-Expressions.html" target="_blank">http://www.gnu.org/s/libc/manual/html_node/Regular-Expressions.html</a><br><a href="http://www.pcre.org/" target="_blank">http://www.pcre.org/</a><br>
<br>If the class uses a regex library, the documentation could point to online resources that define the regex language. Unfortunately, there are subtle differences among regex libraries and it can be difficult to debug a regex without extended documentation and examples.<br>
<br><a href="http://www.regular-expressions.info/reference.html" target="_blank">http://www.regular-expressions.info/reference.html</a><br><a href="http://www.regular-expressions.info/refadv.html" target="_blank">http://www.regular-expressions.info/refadv.html</a><br>
<a href="http://www.regular-expressions.info/refext.html" target="_blank">http://www.regular-expressions.info/refext.html</a><br><a href="http://www.regular-expressions.info/refflavors.html" target="_blank">http://www.regular-expressions.info/refflavors.html</a><br>
<br>Now I understand that the subMatch is a component of the regex. (Why didn't I understand that from the description?) The header comment makes it clear that this is<br><br><span style="font-family: courier new,monospace;">/** The index of the submatch that will be used to sort the matches. */</span><br style="font-family: courier new,monospace;">
<br>May I suggest this phrase is included in the description.<br><br>So the sub-regex must be defined within the regex using the () notation. For example, the test contains:<br><br><span style="font-family: courier new,monospace;"> fit->SetRegularExpression("[^.]*.(.*)");</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> fit->SetSubMatch(1);</span><br><br>It appears that the regex engine doesn't require escapes for [] and (). In this example, when the SetSubMatch method is called with the numeric argument, it refers to the sub-regex pattern within "(.*)", which looks like it might be the filename extension (bmp, gif, png, tif, etc.). However, the prior . is not escaped, so it's unclear whether it matches any character ('.') or a period char ('\.' in most regex engines).<br>
<br>So the SetSubMatch method will always take a numeric argument, the index of the sub-regex (starting at 1, not 0). For example, the following should be designed to exclude any files that begin with any number of '.' chars, then the file name is split into two sub-regex patterns to capture the file name and the file extension (assuming the file only has one '.' char in it to separate these parts of the full file name). The period char '.' is not part of either sub-regex (unless the full file name has more than one).<br>
<br><span style="font-family: courier new,monospace;"> fit->SetRegularExpression("[^.]*(.*)\.(.*)$");</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">
fit->SetSubMatch(2);</span><br>
<br><span style="font-family: courier new,monospace;">- [^.]*</span> matches zero or more '.' char at the beginning of the string (the '.' is not escaped within [ ]).<br><span style="font-family: courier new,monospace;">- (.*)\.(.*)$ </span>matches patterns like "abcdef.xyz" at the end of a string, sub 1 is "abcdef", sub 2 is "xyz".<br>
<br>In this pattern, the second subexpression should be the file extension, without a '.' char. (Although the effect of this regex may depend on how greedy the .* pattern is.) The '\.' prior to the second sub-expression is used to escape the usual meaning of the '.' char to match any char, so that the file name can be split into the file name and its extension (assuming the full file name has only one '.' char in it).<br>
<br>May I suggest a couple of features?<br><br>First, the SetSubMatch method could take an array of arguments. It could be possible to sort on more than one sub-regex, with the sort precedence based on the values in the array. In the example above, a call like the following would sort first by the file extension, then by the file name.<br>
<br><span style="font-family: courier new,monospace;"> unsigned int sub[2] = {2, 1};</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> fit->SetSubMatch(sub);</span><br>
<br>Second, the class might include a convenience method for debugging, to print the file names. The method might adapt some of the code in the test. Perhaps call it PrintFileNames, PrintFileNamesSortedAlpha, PrintFileNamesSortedNumeric.<br>
<br>Thanks again!<br><br>Take care,<br><font color="#888888">Darren</font><div><div></div><div class="h5"><br><br><br><br><br><div class="gmail_quote">On Wed, Jun 3, 2009 at 5:06 AM, Bill Lorensen <span dir="ltr"><<a href="mailto:bill.lorensen@gmail.com" target="_blank">bill.lorensen@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">The test Testing/Code/IO/itkRegularExpressionSeriesFileNamesTest.cxx<br>
shows how to sort and print the results.<br>
<div><div></div><div><br>
On Wed, Jun 3, 2009 at 12:59 AM, Darren Weber<br>
<<a href="mailto:darren.weber.lists@gmail.com" target="_blank">darren.weber.lists@gmail.com</a>> wrote:<br>
><br>
> The software guide and Examples/IO/ImageSeriesReadWrite2.cxx provide some<br>
> explanation of how to work with itk::RegularExpressionSeriesFileNames.<br>
><br>
> However, I have not been able to find information on how to specify the<br>
> regex and sort command line arguments for it. The file name regex might be<br>
> compatible with grep or sed, or some other regex engine? What is the sort<br>
> input, is it another regex?<br>
><br>
> What is the best way to debug the regex and sort inputs? What is the<br>
> easiest way to get a std:cout list of the files after they are found and<br>
> sorted?<br>
><br>
> Thanks in advance,<br>
> Darren<br>
><br>
><br>
> PS,<br>
><br>
> Detailed Description<br>
><br>
> Generate an ordered sequence of filenames that match a regular expression.<br>
><br>
> This class generates an ordered sequence of files whose filenames match a<br>
> regular expression. [What is the regex library?] The file names are sorted<br>
> using a sub expression match selected by SubMatch. [What does this mean?]<br>
> Regular expressions are a powerful, compact mechanism for parsing strings.<br>
> Expressions consist of the following metacharacters:<br>
><br>
> ^ Matches at beginning of a line<br>
><br>
> $ Matches at end of a line<br>
><br>
> . Matches any single character<br>
><br>
> [ ] Matches any character(s) inside the brackets<br>
><br>
> [^ ] Matches any character(s) not inside the brackets<br>
><br>
> Matches any character in range on either side of a dash<br>
><br>
> * Matches preceding pattern zero or more times<br>
><br>
> + Matches preceding pattern one or more times<br>
><br>
> ? Matches preceding pattern zero or once only<br>
><br>
> () Saves a matched expression and uses it in a later match<br>
><br>
> Note that more than one of these metacharacters can be used in a single<br>
> regular expression in order to create complex search patterns. For example,<br>
> the pattern [^ab1-9] says to match any character sequence that does not<br>
> begin with the characters "ab" followed by numbers in the series one through<br>
> nine.<br>
><br>
> Definition at line 72 of file itkRegularExpressionSeriesFileNames.h.<br>
><br>
</div></div>> _____________________________________<br>
> Powered by <a href="http://www.kitware.com" target="_blank">www.kitware.com</a><br>
><br>
> Visit other Kitware open-source projects at<br>
> <a href="http://www.kitware.com/opensource/opensource.html" target="_blank">http://www.kitware.com/opensource/opensource.html</a><br>
><br>
> Please keep messages on-topic and check the ITK FAQ at:<br>
> <a href="http://www.itk.org/Wiki/ITK_FAQ" target="_blank">http://www.itk.org/Wiki/ITK_FAQ</a><br>
><br>
> Follow this link to subscribe/unsubscribe:<br>
> <a href="http://www.itk.org/mailman/listinfo/insight-users" target="_blank">http://www.itk.org/mailman/listinfo/insight-users</a><br>
><br>
><br>
</blockquote></div><br>
</div></div><br>_____________________________________<br>
Powered by <a href="http://www.kitware.com" target="_blank">www.kitware.com</a><br>
<br>
Visit other Kitware open-source projects at<br>
<a href="http://www.kitware.com/opensource/opensource.html" target="_blank">http://www.kitware.com/opensource/opensource.html</a><br>
<br>
Please keep messages on-topic and check the ITK FAQ at: <a href="http://www.itk.org/Wiki/ITK_FAQ" target="_blank">http://www.itk.org/Wiki/ITK_FAQ</a><br>
<br>
Follow this link to subscribe/unsubscribe:<br>
<a href="http://www.itk.org/mailman/listinfo/insight-users" target="_blank">http://www.itk.org/mailman/listinfo/insight-users</a><br>
<br></blockquote></div><br>