Antw: [Insight-users] Shape prior level sets: question about MAPCostFunction

Wed Feb 9 04:46:46 EST 2005

Zachary,

Thank you very much for your detailed explanation and the source code you have sent to me. I really appreciate that. Currently I am running some test examples with your classes and the first results are very satisfying, although I am still "re-tuning" and adapting some of the segmenation parameters to your modified cost function.

Thank you very much again,

Karl Fritscher

------------------------------------------------------------------------------------------------------
Dr. Karl D. Fritscher, M.Sc.
Institute for Biomedical Image Analysis (IBIA)
University for Health Sciences, Medical Informatics and Technology (UMIT)

Eduard-Wallnöfer-Zentrum 1
A-6060 Hall in Tirol, Austria

e-mail: karl.fritscher at umit.at
http://imwv.umit.at/

Tel.: +43 (0)50 8648 - 3825
Cell phone: +43 (0) 676 - 3306221

>>> Zachary Pincus <zpincus at stanford.edu> 02/02 9:32  >>>
Karl,

I hope you don't mind, but I'm cc'ing the insight-users list in case 
anyone else is interested in my very preliminary findings.

> Among others we are also using the ITK implementation of Michael 
> Leventon s work. Therefore I have been very interested in your 
> thoughts about the different itk classes, which are used for the shape 
> prior level set segmentation. I am sharing your opinion especially 
> about the itkShapePriorMAPCostFunction class and would really be very 
> interested in your modifications, if you are willing to code a new 
> class. Implementing the modified cost function in our framework and 
> comparing the results/performance would definitely be very interesting 
> for us.

I've been looking at the ShapePriorMAPCostFunction class to try to 
determine what I could do to make the shape-model part of the level set 
segmentation a bit more robust. (This class computes the term described 
in section 3.2 of Leventon's paper.)

I had found that even in my toy examples, there were knife-edge thin 
margins for success, depending on the initial level set and parameter 
choices, so I wanted to see if I could beef that up. So I looked at 
each term in isolation and tried to figure out how to make then 
individually robust, so that in combination they would hopefully be 
much less initialization-dependent. Note carefully that I have been 
optimizing for the type of images I use in my work (microscopy) -- some 
or all of these findings may not apply to your setting. However, I 
think some of these observations are general enough to be useful.

Firstly, I found that the "Inside Term" as defined by Leventon and 
calculated by ITK can be a potent inflationary factor. Basically, this 
term tries to keep the shape model *outside* of the evolving curve. 
This can lead to a nasty feed-forward loop if the shape model starts 
overlapping the curve. Specifically, if the shape model starts 
overlapping the current curve, then the optimization of the MAP 
function will drive the shape to expand in that region (if possible). 
But, the current curve is then influenced by the MAP shape model, so it 
likewise expands in that region, setting off this feed-forward chain.

I decided that for my work, I would prefer to change the definition of 
the inside term from Leventon's work. He states that P(current curve | 
estimated shape) is inversely proportional to the volume of the current 
curve outside the estimated shape. This is just one possible model for 
that probability. I noticed that both the evolving curve and the 
estimated shape are signed distance functions, so they can be directly 
compared. As such, I use the L1-norm of the difference between these 
functions (in the narrow band active region) as a similarity metric, 
and use that as a proxy for the probability. L2 and L-infinity norms 
(RMSD and maximum deviation) seem good too. In this way, I encourage 
the shape model to "stay near" the evolving curve. This is bad when the 
curve is very small compared to its desired final size, however -- in 
these regimes Leventon's initial suggestion is better. I am examining 
ways to switch between the two probability definitions based on the 
size of the curve.

Also, I noted that the Inside Term was proportional to the number of 
pixels in the active set, so when the curve expands, this term grows 
proportionally more weighty. This made these feed-forward effects 
worse. I took the step of normalizing the final value by the number of 
pixels in the active set. This step may not precisely map on to a 
probabilistic interpretation, but it seems pragmatically useful.

Next, I looked at the "Gradient Term" which calculates how well a shape 
model fits to an image. This term assumes that there is a gaussian 
distribution of edge intensities (e.g. 1-FeatureImage) normal to image 
edges, and tries to fit the shape model to this. (e.g. make the 
zero-level-set of the shape model line up with the tallest part of the 
gaussian.) Basically, this is supposed to encourage the shape model to 
sit squarely on image edges.

The most useful modification that I did make was to rescale the 
(1-FeatureImage) values so that they were in the range of heights of a 
gaussian curve. Note that (1-FeatureImage) is in the range [0,1] but a 
unit gaussian is [0, 0.4]. Practically speaking, this means that there 
was an asymmetry in error when the shape model was misaligned with the 
center of the gaussian: too little edge intensity (e.g. 1-FeatureImage 
is zero where the gaussian would expect it to be peaked) would be 
penalized a maximum of 0.4, while too much edge intensity was penalized 
at a maximum of 0.6 per pixel. This difference effectively drove the 
shape model to an edge-poor region of the image, the penalty for 
misalignment to an edge was higher than the penalty for misalignment to 
a not-edge. By rescaling, this was alleviated.

Another problem was the assumption that the 1-FeatureImage values had 
unit variance. With a GradientMagnitudeRecursiveGaussianImageFilter 
sigma of 1 and using a bounded reciprocal to calculate the feature 
image, the 1-FeatureImage values have a true variance of about 2.7. By 
assuming that the variance was 1, the MAP cost function basically put a 
floor on how well the gaussian could fit. This made the "proper fit" 
much more unstable and sometimes unattainable: if the 1-FeatureImage 
values had too-wide a spatial spread (too large a sigma in the 
derivative-of-gaussian filter), then it would be lower-error overall to 
push the shape model to a edge-poor region where the tails of the 
gaussian fit well, instead of having the shape model centered in the 
middle of the edge, where the too-narrow gaussian peak fit well, but 
the tails fit very poorly.

I thus modified the MAP cost function to properly estimate the 
variance, and then correct for that. The variance estimate is a bit 
funny, because you aren't really estimating variance from some sampled 
population. Instead, you have input values (the values of the shape 
model), and output values (the 1-FeatureImage pixel intensities at 
those points). Plotted on a cartesian plane, these look a bit like a 
non-normalized gaussian. (Cf. Leventon's Figure 7.) So the way to find 
the variance is to treat the input values as samples *weighted by the 
output values*, and take the variance of those weighted samples.

Think of it like this: Say the 1-FeatureImage had value 10 at shape 
model value 1: then we would pretend that there were ten "1" samples 
recorded. If the 1-FeatureImage had value 2 at shape model value -4, we 
would record two "-4" samples. Then we would take the variance of these 
samples. Now, there's a formula for weighted variance that does this 
directly and in one pass, so that's what I really do.

I also noted that with too few "layers" in the active set (too narrow a 
narrow band), the shape fit was rather unstable: by only seeing a 
"truncated" gaussian under the narrow band, the optimizer has a much 
harder time finding and staying at the true gaussian peak, and can 
slide more easily to local optima. This is the case regardless of 
whether you estimate true variance or assume a unit gaussian. Basically 
the problem is that the metric is "fit-to-gaussian" but the image 
provides a truncated gaussian. You could modify the fit term to try to 
fit to truncated gaussians, or to be more nonparametric. I haven't tied 
any of this -- I just widened the narrow band with a larger number of 
layers, so that the optimizer is looking at the full gaussian.

Finally, I also scale this value by the size of the active region, to 
keep it from blowing up as the region gets large.

I've attached my modifications, so you can look at them directly. The 
code isn't perfectly clean, but it's a start.

Zach Pincus

Department of Biochemistry and Program in Biomedical Informatics
Stanford University School of Medicine