[IGSTK-Developers] "many simple specialized" components vs. "fewer, more complex and general components"
Luis Ibanez
luis.ibanez at kitware.com
Tue Jun 5 22:09:52 EDT 2007
Hi Kevin,
Thanks for you thoughtful comments.
About your questions:
1) The assumption of uniform probabilities in the transitions
was made for the simplicity of presenting the argument.
In practice we should estimate this probabilities by measuring
the frequency of usage of the transitions in realistic scenarios,
hopefully inserted into the testing framework.
The interesting aspect of using the probabilities, is that they
provide a continuous gradation between transitions that always
happen, (a sequential process) and transitions that never happen
(an impossible transition).
It is clear that in practice the probabilities of the transitions
are not going to be uniform. In fact we should be able to use the
logs of current applications such as the needle biopsy, to measure
the frequency of transitions for many IGSTK classes.
2) When combining two independent components into a single one,
the combined state machine will have a number of states that
is the product (not the sum) of the components from both
original independent components. Same goes for the number of
inputs. For example if Sa had states {m,n}, and Sb had states
{x,y}, then the combined SaSb will have states {mx,my,nx,ny}.
3) The ambiguity in the usage of the term "exponential" or "logarithmic"
is my mistake for not fully stating the relationship. I should have
said that the measure of complexity is logarithmic with respect to
the number of states.
The intuition that complexity is exponential, comes from item (2)
above, that is, the fact that when adding functionality the number
of state gets multiplied every time. That is, when we duplicate the
number of features, complexity (if measured in number of states) goes
to the square.
I agree with you that we may have got sidetracked from Frank's original
topic, due to the loaded use of the "complexity" term. We may need input
from Frank in order to clarify if we are deviating from his original
intent.
Maybe we should adopt a different term, such as:
A) The "difficulty" of using IGSTK components or
B) The "developers effort" (in hours ? lines of code ?) needed
to use an IGSTK component.
In either case, I will insist that we should strive for finding
concepts that measurable in an objective way, if we want to have
a productive discussion that doesn't focus on how much we like
or dislike different styles of programming.
From a pragmatic point of view, we probably should simply proceed
to implement the components, and to use them into realistic
applications. That usage will reveal whether we are aiming at
the appropriate level of aggregation of functionality.
The agile approach that we are using will make easier to
refactor these components if we find that to be necessary.
Luis
-----------------------
Kevin Gary wrote:
> Hi all,
>
> I'm going to try and offer some thoughts as we are working on validating
> state machines out here (and we are either going crazy from the desert
> heat or from the complexity!). I was not part of the original tcon
> conversation on granularity, so I am having a little trouble trying to
> understand Frank's original email; I apologize if I misunderstand anything.
>
> Some conceptual points:
> 1. Components should have a "natural complexity" based on their
> cohesiveness. That is, they should intuitively encapsulate a set of
> behaviors and state, like a coarse-grained object. An agile approach
> here helps because their should be mindshare as to whether a given
> component are intuitively at the right level. I would also mention our
> community can help a lot, as original component developers may assume a
> component is easier to understand than it is to the novice.
> 2. Components and classes are not the same thing (or shouldn't be). In
> IGSTK they are though, which can add some confusion. A component may be
> implemented as a collection of classes aggregated in specific and
> meaningful ways. In fact I see no reason why IGSTK components could not
> be composed from multiple classes.
> 3. How a component should be used is very important and should be
> reified in the software. This can be a very tricky thing to do, as many
> toolkits such as IGSTK intend for their components to be used in ways
> that are unanticipated. As components at higher levels of aggregation
> encapsulate more specific use cases (#4), their interfaces (and state
> machines) should reflect that, this is commonly referred to as
> Intentional Programming.
> 4. I see no reason for components to be at the same level anyway. A
> component Ca may be implemented by a specific combination of components
> Cb and Cc. In fact, this composition style is often one moves from
> application-independent and reusable components to more
> application-specific (and thus narrowly scoped) ones.
>
> Some pointed comments from reading the email trace:
> - I would say there is a 4th type of complexity, and that is the
> complexity of a programmer to program within the IGSTK coding model.
> We've had this discussion several times before, along the lines of "are
> we making the coding model so different from standard practice that it
> actually makes our code unsafe?". I don't want to revisit that argument,
> except to say that it is a form of complexity - even if you can
> guarantee the state machine will not get into an unsafe state, you can
> still have many frustrating situations contrary to the developer's
> expectations. The Event model within IGSTK is an example to me. The
> application workflow effort (is that underway?) is an example solution.
> - The example in Luis' email of 5/31 (the "swiss-army-knife") is fine
> though I don't think it is a representative case. One really shouldn't
> implement a component in this way, using conditional logic over
> polymorphism. Of course, accounting for dynamic binding in an
> application is a form of conditional logic that needs to be tested as well.
>
> On measures:
> The Markov Chain analysis is interesting, but I have a couple questions:
> 1. Assuming uniform probabilities on transitions is not appropriate for
> IGSTK. In fact these components are constructed with some pretty blatant
> assumptions about what input is most likely next. We intend to create
> stochastic testing simulations in our tools to account for expected
> outcomes and fault scenarios.
> 2. Why is the number of states (inputs) in Ca equal to the product of
> the number of states (inputs) in Sa and Sb? As your assumption is Sa and
> Sb are independent of one another, I would think it would be the sum
> (basically, once the Ca's machine is set ready to run, there would be 2
> independent graphs in the machine with no interaction between them)?
> 3. It is counterintuitive that "a component with 100 transition has just
> double the complexity of a component with 10 transitions". In fact the
> literature on the state explosion problem suggests it is exponential,
> not logarithmic, and that is certainly the case for our coverage tools.
> Harel's seminal work on Statecharts and HSMs is found on compactness of
> representation (inline with your Kolmogorov complexity measure) in order
> to reduce states and transitions. This is important because your measure
> would suggest that larger-grained components are better. I think for
> IGSTK smaller state machines are better because they are more
> human-understandable and testable. The complexity that comes from
> composing components in other more flexible toolkits doesn't really
> exist in IGSTK due to its strong reliance on static binding and lack of
> external configuration files.
>
> We focus a lot on algorithmic complexity, and I'm not sure that was the
> original motivation for Frank's email. It would seem proper usage of
> components by application developers is. We also tend to think that our
> design reduces complexity, but complexity is an inherent attribute of
> the problem space, not the solution space - all we can do is move it
> around and manage it so our solutions are easy to understand, test, and
> reuse. On the one hand IGSTK helps with some of this but then introduces
> its own set of issues as suggested above. In any event, I think Frank's
> question is more about factoring complexity into granular components
> (and application versus framework components) than it is about the
> complexity of the algorithms and state machines. (?)
>
> There are also a lot of metrics out there for component complexity that
> are not algorithmic. The more traditional ones in object-oriented
> programming include dependency analysis, fan-in/out, coupling, and
> cohesion. There are also complexity measures that can be applied to
> statecharts that evaluate their structure - for example average
> branching factor. These all can be readily included into a DART
> dashboard with threshold measures defined that suggest warnings on the
> dashboard much like kwStyle.
>
> Finally, I'll say 9as I have said before) that IGSTK tends to rely
> solely on state machines and unit testing to achieve safety. There are a
> number of safety-oriented programming and engineering practices that we
> can also look at - error/fault/failure analysis, requirements
> management, and so forth. These may not sound Agile, but I think the
> application domain necessitates our considering them.
>
> Thanks,
> K2
>
>
>
>
> Luis Ibanez wrote:
>
>>
>> Hi David,
>>
>>
>> That sounds reasonable.
>>
>>
>> At this point, it is just a matter of defining an
>>
>>
>> "Objective Measure of Complexity"
>>
>>
>> and with it, we could proceed to define a Threshold of how much
>> "complexity" is acceptable in an IGSTK component.
>>
>>
>> The label "Too Complex" doesn't make any sense if we don't have an
>> objective metric that can tell use how much complexity is too much
>> complexity.
>>
>>
>> Without an objective measure we will end up engaging in pointless
>> discussions, because the degree of complexity will be left to the
>> subjective aesthetic perception of every developer.
>>
>>
>>
>> My suggestion for objectively measuring the complexity of an IGSTK
>> component is to use the notion of Markov Process / Chains:
>>
>> http://en.wikipedia.org/wiki/Markov_chain
>>
>> in the following way:
>>
>> In the State Machine of the component, take the transition table,
>> and evaluate the probabilities of every transition for being
>> invoked. Then compute the Entropy of that set of probabilities,
>> and use it as a measure of the "complexity" of the component.
>>
>>
>> In this context, a component with 5 states, and 7 inputs, will
>> have 35 transitions. In the plain case were all transitions are
>> equally likely to be triggered, their probabilities are 1/35.
>> then the component will have a complexity of
>>
>>
>> K = - Sum (from 1 to 35) of (1/35) [ log( 1/35 ) / log(2) ]
>>
>> K = 5.12 bits.
>>
>>
>> A component with 20 equally probable transitions will have a
>> complexity K = 4.32 bits.
>>
>>
>> I will suggest that acceptable threshold of complexity for IGSTK
>> components should be 5 bits. This corresponds to a state machine
>> table of 32 equally probable transitions.
>>
>>
>> If you look a the Wiki page that evaluates the completeness of
>> the transition tables in IGSTK state machines:
>>
>> http://public.kitware.com/IGSTKWIKI/index.php/State_Machine_Transition_Tables_Completeness
>>
>>
>> you will find that the components with the maximum number of
>> transitions are the ToolCalibration ant the Tracker, with:
>>
>> ToolCalibration : 171 Transitions : 164 of which are undefined
>> Tracker : 90 Transitions : 80 of which are undefined
>>
>> If we assume that the undefined transitions will never happen,
>> (which is probably the reason why the developers never considered
>> this transitions in the table, in the first place), and we assume
>> that the defined transitions are equally probable, then we get:
>>
>>
>> K( Tracker ) = 3.32 bits
>> K( ToolCalibration ) = 2.8 bits
>>
>>
>> In the case where some of the transitions are more likely than
>> others, the Entropy of the transition table will diminish and
>> therefore the K measure of complexity will be lower.
>>
>>
>> This measure of complexity reflects the intuition that a complex
>> components have more functionality ("transitions"), and that it
>> has more uncertainty about its current state. It also matches
>> the notion that more complex components will require more lines
>> of code for performing a 100% code coverage.
>>
>>
>> Note that this measure of complexity is logarithmic in nature:
>>
>>
>> a component with 100 transitions has just the double
>> of complexity of a component with 10 transitions.
>> That is, 6.6 bits versus 3.3 bits.
>>
>>
>> We should keep this in mind when we compare the complexity
>> of two components, or the complexity of two implementation
>> of the same component.
>>
>>
>> One nice property of this suggested measure is that if
>> we take two components Sa and Sb, as Frank suggested earlier,
>> each one with complexity measures K(Sa) and K(Sb) respectively,
>> and we assume that their functionalities are completely orthogonal,
>> that is, they are not redundant, and we fuse them together in a
>> single "more complex" component, the transition table of the combined
>> state machine in Ca will have a number of states equal to the product
>> of the number of states in Sa times the number of states in Sb.
>> Similarly its number of inputs will be the product of the number of
>> inputs in Sa times the number of inputs in Sb. As a result the
>> measure of complexity of Ca will satisfy:
>>
>>
>> K( Ca ) = K( Sa ) + K( Sb )
>>
>>
>> If Sa and Sb are not orthogonal, then the joint probability of
>> their transitions will not be the produce of the independent
>> probabilities, and we will find that Ca has a lower complexity
>> than the two independent Sa and Sb components.
>>
>> In this context we also can interpret the effect of factorizing
>> functionality of Sa, Sb into a C++ base class Sc.
>>
>>
>>
>> Luis
>>
>>
>>
>> -----------------
>> David Gobbi wrote:
>>
>>> Hi Luis,
>>>
>>> I'm with Frank on the idea that complex components are preferable to
>>> forcing the application programmer to write a complex app that has to
>>> connect many simple components into a complex web.
>>>
>>> As long as a component can be fully understood, code-covered, and
>>> tested, it is unfair to call that component "too complex". Splitting
>>> such a component in two "just because we can" is not a good enough
>>> reason, we must also justify our decision in terms of functionality.
>>>
>>> A problem with specialized components is that it means we have more
>>> components to test, and each component is likely to receive less
>>> testing (we don't have unlimited resources). Also, if the components
>>> are too constrained, then they will only be able to serve the needs of
>>> a very small audience.
>>>
>>> Our primary means of achieving safety should be through testing and
>>> code review. For the actual implementation of the code, we should
>>> focus on functionality.
>>>
>>> - David
>>>
>>>
>>> On 5/31/07, Luis Ibanez <luis.ibanez at kitware.com> wrote:
>>>
>>>>
>>>>
>>>> Hi Frank,
>>>>
>>>> I agree that we should strive to find the right balance
>>>> in the granularity of IGSTK components.
>>>>
>>>> From the Algorithmic Theory point of view, we will know
>>>> whether a component is attempting to do too much or not,
>>>> by counting the number of "if"-like statements in the code.
>>>>
>>>> That will include "if", "switch", and ternary "a?b:c"
>>>> statements. When we try to engulf in a single component
>>>> the functionalities that should be implemented in two or
>>>> more independent components, we will find ourselves
>>>> introducing:
>>>>
>>>> a) large numbers of states in the State Machine, or
>>>> b) large numbers of inputs in the State Machine, or
>>>> c) "if" conditions that split the different cases, or
>>>> d) "switch" statements that split different cases
>>>>
>>>> Some of them will presumably be driven by "enums" and "bool"
>>>> flags that set the components in "this mode" or "this other mode".
>>>> The presence of these elements will be an indication of a component
>>>> that has grown too complex and that should be refactored/slit
>>>> into simpler components.
>>>>
>>>> Where do we draw that line, is what is open for discussion,
>>>> and we probably have to do it on a case by case basis.
>>>>
>>>> From the pragmatic point of view, we can simply follow the practice
>>>> of agile programming. Let's start by putting a prototype
>>>> implementation of the component in the sandbox, and as part
>>>> of its code review we can discuss if it should be split into
>>>> multiple components or not.
>>>>
>>>> A clear sign will be how many lines of code do you need in the
>>>> test in order to ensure 100% code coverage of the component.
>>>> So, just by following our normal development process, the
>>>> components that are too complex will clearly stand out during
>>>> code reviews and during continuous dashboard testing.
>>>>
>>>>
>>>>
>>>> --------
>>>>
>>>>
>>>> Regarding the specific example that you mention:
>>>>
>>>> Before engaging in a discussion related to "complexity" we must
>>>> define what it means and how to measure it objectively.
>>>>
>>>> There are multiple concepts of complexity that we may want to
>>>> consider here, some of them are listed in the Wikipedia entry:
>>>>
>>>> http://en.wikipedia.org/wiki/Complexity
>>>>
>>>> When it comes to software, there are at least two measures of
>>>> complexity that are relevant:
>>>>
>>>>
>>>> 1) How many lines of code it takes to write a program.
>>>> This complexity measure is equivalent to Kolmogorov Complexity:
>>>>
>>>> http://en.wikipedia.org/wiki/Kolmogorov_complexity
>>>>
>>>> where the string to be generated is the sequence of states of
>>>> the application. States, here being the full set of variables
>>>> that completely defines the application.
>>>>
>>>>
>>>> 2) How many different options there are available for using a
>>>> program (or a routine, or a component). And therefore how
>>>> many decision should be made by the application developer
>>>> in order to configure the application for a particular
>>>> user case.
>>>>
>>>>
>>>> 3) How many steps are required from the user of the application
>>>> in order to perform a task. This is the "complexity" perceived
>>>> by a user.
>>>>
>>>>
>>>> In your suggested problem, you seem to be focused on (1) and (2),
>>>> rather than (3), and the underlying assumption seems to be that by
>>>> increasing the complexity of the components, we may be able to
>>>> reduce the complexity of an application.
>>>>
>>>>
>>>> Following your description of the problem, let's consider
>>>> the two cases:
>>>>
>>>> A) a component Ca
>>>> B) two components Sa and Sb
>>>>
>>>> where (Ca) offers the same functionality that (Sa+Sb)
>>>>
>>>> and the complexity of Ca, let's call it Comp(Ca) is larger than
>>>> the individual complexities of each Sa and Sb,
>>>>
>>>> That is
>>>>
>>>> Comp(Ca) >= Comp(Sa)
>>>> Comp(Ca) >= Comp(Sb)
>>>>
>>>>
>>>> From the application developer point of view, if we use the notion
>>>> of complexity (2), it comes down to how many method decision should
>>>> be made in order to use the component Ca, versus, how many decision
>>>> should be made in order to use Sa & Sb.
>>>>
>>>> For example, let's say that Ca is a "swiss-army-knife" image slicer,
>>>> that can do:
>>>>
>>>> a) 1 slice orthogonal to a needle, and touching the tip
>>>> b) 3 orthogonal slices parallel to image axes and passing
>>>> through the needle tip.
>>>>
>>>> and that Sa and Sb are respectively the independent components that
>>>> could do only (a) and only (b).
>>>>
>>>> From the point of view of the application developer, in the case
>>>> of using Ca, the application should have an "if" statement that
>>>> switches between the use of functionality (a) and functionality (b)
>>>> at compile time or at run time (or both). In the case of using Sa
>>>> and Sb, the application developers must also set an "if" statement
>>>> indicating when to display slices using Sa, and when to use Sb.
>>>>
>>>> In this context, from the point of view of the application developer,
>>>> and using the concept of complexity (2), there is no difference between
>>>> using Ca and using Sa+Sb.
>>>>
>>>> On the other hand, the testing scenario for Ca requires to exercise
>>>> all the features of Sa plus all the features of Sb, with the
>>>> aggravation
>>>> that some of the settings that make sense in the "Sb" mode of Ca,
>>>> may not make sense in the "Sa" mode of Ca.
>>>>
>>>>
>>>> Note also that it is quite likely that common functionalities of Sa
>>>> and Sb may be factorized into a base class Sab from which both Sa
>>>> and Sb will derive.
>>>>
>>>>
>>>> Before proceeding further with this discussion, we must define the
>>>> measures of complexity that we consider relevant and we should
>>>> establish
>>>> objective methods for measuring those complexity concepts.
>>>>
>>>> ---
>>>>
>>>>
>>>> Again, from the pragmatic point of view, I agree with Patrick, that
>>>> we should probably start writing prototypes in the sandbox, and base
>>>> our discussions in more concrete cases. We probably will need multiple
>>>> iterations of design/implementation/testing on every component before
>>>> we find the right balance between specialization and generality.
>>>> On the bright side, that is what agile programming is very good at.
>>>>
>>>>
>>>>
>>>>
>>>> Regards,
>>>>
>>>>
>>>> Luis
>>>>
>>>>
>>>>
>>>> -----------------------
>>>> Frank Lindseth wrote:
>>>> > Luis (and others),
>>>> >
>>>> > We had a long discussion about "many simple specialized" components
>>>> > vs. "fewer, more complex and general components" after you had to
>>>> leave
>>>> > the Tcon yesterday (we should probably have started with this
>>>> topic).
>>>> > It seems like the common opinion is that in order to make it simpler
>>>> > for the app. developer to satisfy the clinical user requirements
>>>> it's
>>>> > sensible to move a little bit in the more general direction for
>>>> some of
>>>> > the components, at the same time the components should not become so
>>>> > complex that it's not possible to test them in the ordinary way, we
>>>> > have to find the right balance.
>>>> > I know you have strong feelings about this Luis, but do you (or
>>>> anybody
>>>> > else for that matter) think that a compromise can be found somewhere
>>>> > along the simple comp./complex app - complex comp./simple app. line?
>>>> > As you know, this has been my main IGSTK concern from day one, and I
>>>> > really need some input as to what to except as our "IGSTK practical
>>>> > trial period" is about to end and we have to take the big decision
>>>> > regarding what to base future IGS efforts on (it looks promising
>>>> > regarding other issues, e.g. the "coordinate system" challenge).
>>>> >
>>>> > If we need to think in terms of concrete scenarios I believe that the
>>>> > slicer-comp. should be used (could be specialized both in terms of
>>>> > modality and functionality) .
>>>> > Some background information / discussion can be found here:
>>>> > http://public.kitware.com/IGSTKWIKI/index.php/
>>>> > Talk:DesignChallenges#Slicing
>>>> >
>>>> > A little scenario that can help to trigger some response to this
>>>> e-mail:
>>>> > User/surgeon would like to have an IGS system with a certain
>>>> complexity
>>>> > in terms of required functionality (will increase over the years, I
>>>> > know...).
>>>> > Such an app. could be realized in different ways depending on
>>>> the way
>>>> > the components are made:
>>>> > A) Many, simple and specialized components -> Complex app. will be
>>>> > needed (many obj. , switching, etc.) in order to satisfy the user
>>>> above.
>>>> > B) Fewer, more complex and general components. -> Simple app. (to
>>>> > satisfy user).
>>>> > C) Balanced comp. -> Balanced app. (to satisfy user).
>>>> >
>>>> > List of points that can push the balance in one or the other
>>>> direction:
>>>> > = User/surgeon
>>>> > -Overall safety (not the same as comp. safety):
>>>> > * It's easier to test a comp. then it is to test an app. (as long as
>>>> > the comp. is not to complex, i.e. up to a certain level)
>>>> > * A simple app. is safer and easier to test then a complex one.
>>>> > * A complex comp. is of course more difficult to to test then a
>>>> simple
>>>> > one, but we should think more like this: lets say that we have a
>>>> > complex comp. Ca that offers the same functionality as two simpler
>>>> > comp. Sa and Sb. As long as it's possible to test Ca, knowing
>>>> that this
>>>> > comp. work properly has added more to the overall safety then
>>>> testing
>>>> > Sa and Sb separately.
>>>> > * etc. (feel free to add points to this list)
>>>> >
>>>> > = App. developer:
>>>> > * In terms of building a user community, the easier it is to build a
>>>> > app. with a certain functionality, the better it is. The extreme case
>>>> > being that the app. dev. only connect the high level comp.
>>>> needed and
>>>> > make everything accessible to the user trough a gui.
>>>> > * etc. (feel free to add points to this list)
>>>> >
>>>> > = Comp. developer:
>>>> > * resources for dev. maintenance, doc. testing, etc.
>>>> > * etc. (feel free to add points to this list)
>>>> >
>>>> > Have a nice weekend everybody.
>>>> > Regards,
>>>> > Frank
>>>> >
>>>> > _______________________________________________
>>>> > IGSTK-Developers mailing list
>>>> > IGSTK-Developers at public.kitware.com
>>>> > http://public.kitware.com/cgi-bin/mailman/listinfo/igstk-developers
>>>> >
>>>> _______________________________________________
>>>> IGSTK-Developers mailing list
>>>> IGSTK-Developers at public.kitware.com
>>>> http://public.kitware.com/cgi-bin/mailman/listinfo/igstk-developers
>>>>
>>>
>> _______________________________________________
>> IGSTK-Developers mailing list
>> IGSTK-Developers at public.kitware.com
>> http://public.kitware.com/cgi-bin/mailman/listinfo/igstk-developers
>
>
>
More information about the IGSTK-Developers
mailing list