How to generate a UUID? (was possible data format... Re: Re: dbms3.r 01)
[1/18] from: brett::codeconscious::com at: 16-Jan-2002 18:45
Hi Gregg and others!
> I thought the intent was to assign a UUID to each record, which it would
use
> forever, so location was not important.
This little line reminded me that I had this as an outstanding question in
my mind. How exactly does one generate a UUID?
And perhaps (useful to Gabriele too) has got a function that does this?
Brett.
[2/18] from: greggirwin::mindspring::com at: 16-Jan-2002 1:48
Re: How to generate a UUID? (was possible data format... Re: Re: dbms3.r
Hi Brett,
<< This little line reminded me that I had this as an outstanding question
in my mind. How exactly does one generate a UUID? And perhaps (useful to
Gabriele too) has got a function that does this? >>
Under Win32, you can use the CoCreateGUID function. It uses an OSF DCE
specified algorithm based on:
The current date and time.
A clock sequence and related persistent state to deal with retrograde
motion of clocks.
A forcibly incremented counter to deal with high-frequency allocations.
The globally unique IEEE machine identifier.
I've built custom object IDs using the timestamp, counter, and machine
identifier, along with a "type" ID which specified the kind of object, but I
still used CoCreateGUID to get the machine ID. I've also done it where the
machine ID was pulled from a file which was part of the configuration
process for the software.
I'm not sure what would be the best portable solution, or if there is a
single best solution, but you could probably concoct something reasonably
reliable by storing a timestamp, and maybe some other info, when the
make-guid script is run the first time, or if the info file has been
deleted. Basically, the machine ID part is replaced by the installation
timestamp and maybe something like the CHECKSUM(s) of some randomly selected
files on the system.
--Gregg
[3/18] from: brett:codeconscious at: 16-Jan-2002 22:16
Thanks for the info. Gregg.
It would be nice to have a native function for it. Probably something useful
in a distributed world.
Brett.
[4/18] from: petr:krenzelok:trz:cz at: 16-Jan-2002 12:36
Brett Handley wrote:
> Thanks for the info. Gregg.
>
> It would be nice to have a native function for it. Probably something useful
> in a distributed world.
checksum/secure .... :-)
-pekr-
[5/18] from: greggirwin:mindspring at: 16-Jan-2002 11:50
Hi Pekr,
<< checksum/secure .... :-) >>
I don't know if "cryptographically secure" equates to "universally unique",
but I'd be glad if it did. :)
--Gregg
[6/18] from: joel:neely:fedex at: 16-Jan-2002 19:48
Hi, Gregg,
Gregg Irwin wrote:
> Hi Pekr,
>
> << checksum/secure .... :-) >>
>
> I don't know if "cryptographically secure" equates to
> "universally unique", but I'd be glad if it did. :)
>
Absent documentation, who knows (?), but not likely. By the
pigeonhole principle, any function that returns fixed-length
output from variable-length input (where length of input can
be longer than length of output) MUST map multiple inputs to
the same output.
The phrase "cryptographically secure" normally implies that
the output effectively randomizes the input, and it is hard
to invert.
I just ran CHECKSUM/SECURE on REBOL source files that were
971 bytes, 158 bytes, and 1116 bytes long and got a 20-byte
result for each. Since there are only 2**160 possible
20-byte values, feeding large numbers of large inputs is
guaranteed to produce duplicate outputs. (Of course, you'll
need a fast CPU... ;-)
-jn-
--
; sub REBOL {}; sub head ($) {@_[0]}
REBOL []
# despam: func [e] [replace replace/all e ":" "." "#" "@"]
; sub despam {my ($e) = @_; $e =~ tr/:#/.@/; return "\n$e"}
print head reverse despam "moc:xedef#yleen:leoj" ;
[7/18] from: joel:neely:fedex at: 16-Jan-2002 20:16
Hi, again, Gregg,
Gregg Irwin wrote:
> I don't know if "cryptographically secure" equates to
> "universally unique", but I'd be glad if it did. :)
>
(Should have checked before the previous note...) CHECKSUM/SECURE
is represented as being SHA, which does produce a 160-bit "secure"
hash of its input. According to the _Federal_Register_ ["Proposed
Federal Information Processing Standard for Secure Hash Standard",
v. 57, n. 21, 31 Jan 1992, pp. 3747-3749]
The SHA is called secure because it is designed to be compu-
tationally infeasible to recover a message corresponding to
a given message digest, or to find two different messages
which will produce the same message digest. Any change to a
message in transit will, with a very high probability, result
in a different message digest, and the signature will fail
to verify.
According to Bruce Schneier in _Applied_Cryptography_ (2d edition,
John Wiley & Sons, 1996)
There are no known cryptographic attacks against SHA.
Because it produces a 160-bit hash, it is more resistant
to brute-force attacks (including birthday attacks) than
128-bit hash functions...
Needless to say, none of this implies that synonyms fail to exist
(see previous note), but simply that it should be infeasibly hard
to construct a synonym for a given input.
-jn-
--
; sub REBOL {}; sub head ($) {@_[0]}
REBOL []
# despam: func [e] [replace replace/all e ":" "." "#" "@"]
; sub despam {my ($e) = @_; $e =~ tr/:#/.@/; return "\n$e"}
print head reverse despam "moc:xedef#yleen:leoj" ;
[8/18] from: brett:codeconscious at: 17-Jan-2002 18:55
I had thought of using checksum/secure as an ID some time back. But the
synonym possibility made be nervous. Why?
(1) Such a chance is greater than zero
(2) I believe that Murphy's law is understated.
(3) I've programming so long I'm automatically suspicious of possibilities
:)
> Needless to say, none of this implies that synonyms fail to exist
> (see previous note), but simply that it should be infeasibly hard
> to construct a synonym for a given input.
Apply that quote to the idea of using checksum/secure as an ID and it sounds
comforting to me.
I wonder if it is "infeasibly hard" over a lifetime?
Brett.
[9/18] from: joel:neely:fedex at: 17-Jan-2002 6:03
Re: How to generate a UUID? (was possible data format...Re:Re: dbms3.r 0
Hi, Brett,
Brett Handley wrote:
> I had thought of using checksum/secure as an ID some time back.
> But the synonym possibility made be nervous. Why?
> (1) Such a chance is greater than zero
> (2) I believe that Murphy's law is understated.
> (3) I've programming so long I'm automatically suspicious of
> possibilities :)
>
I agree on all counts.
> > Needless to say, none of this implies that synonyms fail to exist
> > (see previous note), but simply that it should be infeasibly hard
> > to construct a synonym for a given input.
>
> Apply that quote to the idea of using checksum/secure as an ID and
> it sounds comforting to me.
> I wonder if it is "infeasibly hard" over a lifetime?
>
Well, "infeasibly hard" means the probability that a random message
would have the same checksum should be about 6.84227765783589E-49
and you shouldn't be able to do better crafting one yourself.
The issue is not that synonyms won't happen, but that they're
unlikely and you can't predict WHEN they'll happen. Therefore they
are "safe" for indicating whether a message has been tampered with.
-jn-
--
; sub REBOL {}; sub head ($) {@_[0]}
REBOL []
# despam: func [e] [replace replace/all e ":" "." "#" "@"]
; sub despam {my ($e) = @_; $e =~ tr/:#/.@/; return "\n$e"}
print head reverse despam "moc:xedef#yleen:leoj" ;
[10/18] from: brett::codeconscious::com at: 17-Jan-2002 23:27
> Well, "infeasibly hard" means the probability that a random message
> would have the same checksum should be about 6.84227765783589E-49
> and you shouldn't be able to do better crafting one yourself.
>
> The issue is not that synonyms won't happen, but that they're
> unlikely and you can't predict WHEN they'll happen. Therefore they
> are "safe" for indicating whether a message has been tampered with.
So um, that means using it to produce a UUID is ok as long as I'm willing to
accept the odds
of a clash (very unlikely).
I've more chance of being hit by a bus than having to deal with the
consequences of a little ID clash in my system. Therefore I should focus my
energies on watching out for buses than coding around clashes of IDs.
Maybe I should go to bed !
Brett.
[11/18] from: greggirwin:mindspring at: 17-Jan-2002 10:33
Re: How to generate a UUID? (was possible data format... Re:Re: dbms3.r
Hi Joel,
Thanks for all the info. Great stuff!
<< Needless to say, none of this implies that synonyms fail to exist
(see previous note), but simply that it should be infeasibly hard
to construct a synonym for a given input. >>
OK, so the only thing we have to do, if I understand things correctly, is to
come up with a unique string of data and checksum/secure will generate a
hash from that which is very likely *also* unique. :)
--Gregg
[12/18] from: sunandadh:aol at: 17-Jan-2002 14:05
Re: How to generate a UUID?
[greggirwin--mindspring--com] writes:
> OK, so the only thing we have to do, if I understand things correctly, is to
> come up with a unique string of data and checksum/secure will generate a
> hash from that which is very likely *also* unique. :)
Nice analysis! Which bounces us (if I'm following right) right back to the
original question.
My take on it is to get a guaranteed unique string (given there are
guarantees in this world) that will work across the world, it needs three
parts:
pppp-rrrr-ssss
where
pppp - PREFIX issued by a central and trusted server somewhere. You may only
need to get one prefix per installation of your application so we don't have
a great overhead in getting this.
rrrr - RANDOM. Is some random number we add to that prefix. This part is
optional, IF you believe the central server will never, ever, fail or be
restarted, or hacked so that it produces (very occasionally) duplicate unique
prefixes, THEN you can dispense with it. On the other hand, how long it is
indicates your faith in the central server.
So all the UUIDs in your application begin pppp-rrrr- That leaves
ssss SERIAL. The serial part that YOU assign in YOUR application. It's your
job to make sure that these are always and forever unique within the
application.
Put the three together, and you may have a unique ID.
(I used to work in the automotive industry. Every vehicle in the world, on
leaving its factory, has a unique VIN -- Vehicle Identification Number --
stamped on its chassis. Do you want to know how many vehicles share the same
VIN through manufacturing IT cock-ups? The answer is an embarrassingly large
number, even with short-run luxury car makers.)
I got code somewhere that generates ssss's. Maybe IOS could offer a pppp
service, and then we'd be ready to roll.
Sunanda.
[13/18] from: greggirwin:mindspring at: 17-Jan-2002 22:42
Hi Sunanda,
<< Nice analysis! Which bounces us (if I'm following right) right back to
the
original question. >>
Ex-actly. :)
Good thoughts on the elements needed to assemble a UID. My preference would
be not to require an external server if it can be avoided. I think a
standalone machine needs to be able to generate UIDs as well.
What you call PREFIX, I would probably call "Creation Location". To me, this
equates to the unique machine identifier as used in the OSF DCE spec. There
could be varying implementations for this, one of which *could* be an
external server. If RT could provide a native 'machine-uid function, that
would be ideal. Otherwise we have to synthesize it somehow, or make it a
configuration option.
The SERIAL part should be easy enough, though I see this just as a counter
combined with a timestamp, which you may not given your experience. The
biggest risk I see would be two instances of the script running at the same
time and each being called inside the resolution of what now/precise gives
us. If that's a risk worth considering, you could use a file as your shared
memory space, though that wouldn't be my first choice.
Clock resets also have to be considered. I like having a timestamp segment
because it can be useful information in and of itself.
This leaves the RANDOM part. You could let the application provide this
(e.g. the make-uid function has a /with <random-segment-seed> refinement) to
make it easy, and maybe just have a very simple implementation (using
random/seed + random/secure) beyond that.
--Gregg
[14/18] from: sunandadh:aol at: 18-Jan-2002 8:54
Hi Gregg,
> What you call PREFIX, I would probably call "Creation Location". To me,
this
> equates to the unique machine identifier as used in the OSF DCE spec. There
> could be varying implementations for this, one of which *could* be an
> external server. If RT could provide a native 'machine-uid function, that
> would be ideal. Otherwise we have to synthesize it somehow, or make it a
> configuration option.
I think we have slightly different models in mind.
You, I think, are looking for a "prefix" unique to a machine. I was chipping
in an approach to a slightly different problem: to get a unique "prefix" for
an "application node" (to coin a phrase). That's a bit of an application
running in one machine; but it might migrate to another; and that machine may
be running more than one application, each of which needs to generate its own
UUIDs.
But yes, a machine-uid function (that gets the CPU model and serial for
machines that support that; or the ethernet address for machines with a card;
or _something_ unique) would be a big step in the right direction for both of
us.
Sunanda.
[15/18] from: greggirwin:mindspring at: 18-Jan-2002 11:10
Hi Sunanda,
<< I think we have slightly different models in mind. >>
I think you're right. :)
<< You, I think, are looking for a "prefix" unique to a machine. I was
chipping
in an approach to a slightly different problem: to get a unique "prefix" for
an "application node" (to coin a phrase). That's a bit of an application
running in one machine; but it might migrate to another; and that machine
may
be running more than one application, each of which needs to generate its
own
UUIDs. >>
Right. I've used an approach somewhere between the two we're discussing, to
generate OIDs (Object IDs) for an object-centric DB design. To do that, I
used the GUID created by CoCreateGUID (under Windows) and re-defined one
segment to use as your PREFIX segment.
One use I've been thinking of, and why I think generic global uniqueness is
important (beyond application specific uniqueness requirements), is that of
messages and agents which may "travel".
I'll be doing some more research on this, and maybe even hack up a little
test. Maybe, together, we can come up with something that will work for both
of us.
--Gregg
[16/18] from: joel:neely:fedex at: 18-Jan-2002 13:16
Hi, Gregg, Sunanda, et al,
Gregg Irwin wrote:
> One use I've been thinking of, and why I think generic global
> uniqueness is important (beyond application specific uniqueness
> requirements), is that of messages and agents which may "travel".
>
> I'll be doing some more research on this, and maybe even hack up
> a little test. Maybe, together, we can come up with something
> that will work for both of us.
>
Just to connect this thread with the well-deserved resurgence of
interest in Rugby...
Imagine a Rugby-based registry that works as follows:
- A developer submits a well-defined set of parameters (e.g.
developer's name, company (if relevant), project, version,
date) to a Rugby service.
- The reponse is an ID guaranteed to be unique because only
(the single instance of!) that service distributes them.
- Anyone can subsequently supply the ID to the service and get
back the identification data from the initial request.
Thoughts?
-jn-
[17/18] from: greggirwin:mindspring at: 18-Jan-2002 14:16
Hi Joel,
<< - A developer submits a well-defined set of parameters (e.g.
developer's name, company (if relevant), project, version,
date) to a Rugby service.
- The reponse is an ID guaranteed to be unique because only
(the single instance of!) that service distributes them.
- Anyone can subsequently supply the ID to the service and get
back the identification data from the initial request. >>
I can think of lots of variations on that theme which would be useful,
though not necessarily related to uniqueness.
Having a central server certainly helps, but then you are dependent on it.
In some cases you may want at least some element of control over the
structure and content of the UID (Suananda's approach) and, in that
scenario, it's very helpful.
I've been thinking that the UIDs are plain, structured data. Now, you've got
me thinking that an encrypted UID could also be useful. Hmmm. More to think
about.
--Gregg
[18/18] from: sunandadh:aol at: 18-Jan-2002 19:06
Hi Joel,
> Imagine a Rugby-based registry that works as follows:
> - A developer submits a well-defined set of parameters (e.g.
<<quoted lines omitted: 4>>
> - Anyone can subsequently supply the ID to the service and get
> back the identification data from the initial request.
I see this as potentially problematic as the service must either:
-- Have a hashing algorithm that is guaranteed never to clash on different
strings. This takes us back to the "how unique is checksum/secure" question;
or
-- Keep a database of the parameters and the UUID issued for it. I wouldn't
trust a database to keep its data clean enough for this over the potential
decades of life cycle. Also, keeping a database means someone can't use the
service anonymously
My thoughts for the equivalent service was that it could just issue an
integer.
One way that could be done:
Issue an integer based on the exact UTC time as got from a trusted atomic
clock service. That way, no database needed--just some instore memory to
check you didn't just issue that number a microsecond ago. (And a little bit
of code to check that the clock hasn't gone backwards during a leap-second
removal).
If the service returned a signed 64 bit integer which was (in effect) number
of microseconds from 01/Jan/2002, it could issue a million unique numbers a
second for nearly 300,000 years....It might then be safe to start again from
zero as many of the early applications will have been retired. If not, we
could always use that 64th bit and gain another 300,000 years before
upsetting our users.
This isn't really a full-blown UUID process because it couldn't handle more
than 1,000,000 requests a second.
But remember, my model was that we are producing a unique prefix. It's up to
the application to generate its own serial number. The serial number alone
makes the thing unique within the application. Appending it to the number
from the UUID service produces something that should be unique across the
earth.
More complications ensure if you want a UUID that is unique across the solar
system.
Sunanda.
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted