[Insight-developers] gdcmUtil.cxx polling for a MAC Address

Mathieu Malaterre mathieu.malaterre at gmail.com
Sat May 3 15:23:07 EDT 2008


David,

On Sat, Apr 19, 2008 at 3:07 PM, David Clunie <dclunie at dclunie.com> wrote:
> kent williams wrote:
>
>  > The workaround you're using means you're not generating a unique UID for
>  > your DICOM files, and therefore aren't compliant with the DICOM spec. Not
>  > that it should stop you -- no one else complies completely with the DICOM
>  > spec, why should you?
>
>  This attitude makes trying to achieve a high level of interoperability and
>  compatibility harder.
>
>  Are you surprised when clinical IT staff reluctant to allow any images
>  produced by research software onto the clinical network ?
>
>  Rest assured that there are plenty of people who achieve 100% compliance
>  with the DICOM standard on a routine basis, as well as plenty of tools
>  to test the compliance of one's own output.

100% ? I am not sure what it even mean to say 100% compliance, as the
standard is a moving standard: it requires fixing gray area, it needs
CP for plain mistake and from one year to the other a bunch of
supplements are added. So the best one can say is: 100% compliance
with DICOM 200X, with CP X,Y, SUPP Z.

Because I have seen so many mistakes done in DICOM files, I am
guessing that there is still one thing, those systems do not handle
properly: it's the person behind the console, the one entering Person
Name with a space as separator, a street address with improper
locale...

Anyway I see you point, and rest assure that we strive to make GDCM as
compliant as possible.


>  Mathieu Malaterre wrote:
>
>  > As Kent says, it is not much of an issues. It will simply
>  > exponentially increase the chance of your customer to produce
>  > duplicate DICOM files throughout the world. the real issue lie in the
>  > fact the the ITK root is used by potentially dozen of people at the
>
>  > same time, so producing unique identifier (software implementation,
>  > not hardware) is  a hard problem. To minimize the risk of collision I
>  > used a machine identifier (MAC address), a time identifier (up to
>
>  > millisecond) and in the couple of bytes remaining I stuffed in a
>  > couple of random bytes.
>  > So the questions are:
>  >
>  > 1. Are your customer going to produce DICOM files ?
>  > 2. If 1) do they have there own root UID ?
>  > 3. Do they plan to produce DICOM from two machines at the same time ?
>  >
>  > If 2) and not 3) then it is safe to remove the machine identifier as
>  > collision cannot appear. In which case a simple #cmakedefine to
>  > replace the MAC address by an empty string is valid.
>
>  It is never safe nor appropriate to generate UIDs that are not expected
>  to be globally unique.
>
>  Even if one were totally convinced that the generated DICOM image files
>  were never going to escape from the lab, even within a single lab, two
>  instances of the same code may generate a conflict sooner or later,
>  as you point out.
>
>  It is not as hard as it used to be to find sources of uniqueness on most
>  platforms these days, even in the absence of a network interface with
>  a MAC address.
>
>  Regardless, I think it is the toolkit implementer's responsibility to make
>  sure that this works and that the programmer using the toolkit is not
>  bothered by this issue, and is not able to produce "bad" DICOM files that
>  have dubious UIDs.
>
>  In my earlier toolkits I made obtaining a root the
>  responsibility of the user; nowadays I know better and just take care of
>  it internally in my more recent toolkits. For example in Java I use the
>  java.rmi.dgc.VMID and fall back to trying to find a MAC address instead
>  only if VMID.isUnique() is false. I recall looking once at the JRE source
>  code to find out where Sun is getting the VMID, though I don't remember
>  the details, but one could use the same approach if one needed to replicate
>  this in C++ in the absence of library support for this sort of thing.

Just before I forgot, you can manually generate duplicate VMID on your
own machine:

http://groups.google.com/group/comp.lang.java.programmer/browse_thread/thread/b2abc855e5a49e2

You should switch to java.util.UUID

>  The hardest part is finding a way to fit everything into 64 bytes. I also
>  use a counter nowadays, rather than just "a few random bytes", since I
>  found that if I asked for UIDs fast enough I could generate them faster
>  than millisecond precision making the timer insufficient for
> disambiguation;
>  the counter value took up fewer bytes than a sufficiently random number.

Most implementation of UUID (*) will generate a 16 byte value, which
converted to a base 10 number will fit in 39 bytes (**). Since a '.'
is required between the root and the suffix, it means that root cannot
theoretically be more than 24 bytes.

Taking the p(n) formula from:

http://en.wikipedia.org/wiki/UUID#Random_UUID_probability_of_duplicates

Loosing those two bytes will multiply the exp() part of the
probability by a factor of exp( 1. / 2 ^16 )

For instance instead of a 0.0000000004 probability, I have now:

1 - (1 - 0.0000000004) / exp( 1. / 2 ^16 )  = 1.5259072641771176e-05

Thus loosing those two bytes will multiply the probabiliy by a factor of ~38147

AFAIK this is the best I can do. If anyone has any more comment, this
is the implementation that I'll use instead. This will also get rid of
the mac adress issue as the uuid will be given by a third party
library.

Thanks for your time,
-- 
Mathieu
(*) http://en.wikipedia.org/wiki/UUID
(**) I could not get to use the '.' for use in a base 11 number, since
rules for the '.' byte are too constraining. Furthermore even using a
base 11 the number requires 38 bytes to be represented, which only
very slightly modify the initial problem.


More information about the Insight-developers mailing list