Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

How to generate a UUID? (was possible data format... Re: Re: dbms3.r 01)

 [1/18] from: brett::codeconscious::com at: 16-Jan-2002 18:45


Hi Gregg and others!
> I thought the intent was to assign a UUID to each record, which it would
use
> forever, so location was not important.
This little line reminded me that I had this as an outstanding question in my mind. How exactly does one generate a UUID? And perhaps (useful to Gabriele too) has got a function that does this? Brett.

 [2/18] from: greggirwin::mindspring::com at: 16-Jan-2002 1:48

Re: How to generate a UUID? (was possible data format... Re: Re: dbms3.r


Hi Brett, << This little line reminded me that I had this as an outstanding question in my mind. How exactly does one generate a UUID? And perhaps (useful to Gabriele too) has got a function that does this? >> Under Win32, you can use the CoCreateGUID function. It uses an OSF DCE specified algorithm based on: The current date and time. A clock sequence and related persistent state to deal with retrograde motion of clocks. A forcibly incremented counter to deal with high-frequency allocations. The globally unique IEEE machine identifier. I've built custom object IDs using the timestamp, counter, and machine identifier, along with a "type" ID which specified the kind of object, but I still used CoCreateGUID to get the machine ID. I've also done it where the machine ID was pulled from a file which was part of the configuration process for the software. I'm not sure what would be the best portable solution, or if there is a single best solution, but you could probably concoct something reasonably reliable by storing a timestamp, and maybe some other info, when the make-guid script is run the first time, or if the info file has been deleted. Basically, the machine ID part is replaced by the installation timestamp and maybe something like the CHECKSUM(s) of some randomly selected files on the system. --Gregg

 [3/18] from: brett:codeconscious at: 16-Jan-2002 22:16


Thanks for the info. Gregg. It would be nice to have a native function for it. Probably something useful in a distributed world. Brett.

 [4/18] from: petr:krenzelok:trz:cz at: 16-Jan-2002 12:36


Brett Handley wrote:
> Thanks for the info. Gregg. > > It would be nice to have a native function for it. Probably something useful > in a distributed world.
checksum/secure .... :-) -pekr-

 [5/18] from: greggirwin:mindspring at: 16-Jan-2002 11:50


Hi Pekr, << checksum/secure .... :-) >> I don't know if "cryptographically secure" equates to "universally unique", but I'd be glad if it did. :) --Gregg

 [6/18] from: joel:neely:fedex at: 16-Jan-2002 19:48


Hi, Gregg, Gregg Irwin wrote:
> Hi Pekr, > > << checksum/secure .... :-) >> > > I don't know if "cryptographically secure" equates to > "universally unique", but I'd be glad if it did. :) >
Absent documentation, who knows (?), but not likely. By the pigeonhole principle, any function that returns fixed-length output from variable-length input (where length of input can be longer than length of output) MUST map multiple inputs to the same output. The phrase "cryptographically secure" normally implies that the output effectively randomizes the input, and it is hard to invert. I just ran CHECKSUM/SECURE on REBOL source files that were 971 bytes, 158 bytes, and 1116 bytes long and got a 20-byte result for each. Since there are only 2**160 possible 20-byte values, feeding large numbers of large inputs is guaranteed to produce duplicate outputs. (Of course, you'll need a fast CPU... ;-) -jn- -- ; sub REBOL {}; sub head ($) {@_[0]} REBOL [] # despam: func [e] [replace replace/all e ":" "." "#" "@"] ; sub despam {my ($e) = @_; $e =~ tr/:#/.@/; return "\n$e"} print head reverse despam "moc:xedef#yleen:leoj" ;

 [7/18] from: joel:neely:fedex at: 16-Jan-2002 20:16


Hi, again, Gregg, Gregg Irwin wrote:
> I don't know if "cryptographically secure" equates to > "universally unique", but I'd be glad if it did. :) >
(Should have checked before the previous note...) CHECKSUM/SECURE is represented as being SHA, which does produce a 160-bit "secure" hash of its input. According to the _Federal_Register_ ["Proposed Federal Information Processing Standard for Secure Hash Standard", v. 57, n. 21, 31 Jan 1992, pp. 3747-3749] The SHA is called secure because it is designed to be compu- tationally infeasible to recover a message corresponding to a given message digest, or to find two different messages which will produce the same message digest. Any change to a message in transit will, with a very high probability, result in a different message digest, and the signature will fail to verify. According to Bruce Schneier in _Applied_Cryptography_ (2d edition, John Wiley & Sons, 1996) There are no known cryptographic attacks against SHA. Because it produces a 160-bit hash, it is more resistant to brute-force attacks (including birthday attacks) than 128-bit hash functions... Needless to say, none of this implies that synonyms fail to exist (see previous note), but simply that it should be infeasibly hard to construct a synonym for a given input. -jn- -- ; sub REBOL {}; sub head ($) {@_[0]} REBOL [] # despam: func [e] [replace replace/all e ":" "." "#" "@"] ; sub despam {my ($e) = @_; $e =~ tr/:#/.@/; return "\n$e"} print head reverse despam "moc:xedef#yleen:leoj" ;

 [8/18] from: brett:codeconscious at: 17-Jan-2002 18:55


I had thought of using checksum/secure as an ID some time back. But the synonym possibility made be nervous. Why? (1) Such a chance is greater than zero (2) I believe that Murphy's law is understated. (3) I've programming so long I'm automatically suspicious of possibilities :)
> Needless to say, none of this implies that synonyms fail to exist > (see previous note), but simply that it should be infeasibly hard > to construct a synonym for a given input.
Apply that quote to the idea of using checksum/secure as an ID and it sounds comforting to me. I wonder if it is "infeasibly hard" over a lifetime? Brett.

 [9/18] from: joel:neely:fedex at: 17-Jan-2002 6:03

Re: How to generate a UUID? (was possible data format...Re:Re: dbms3.r 0


Hi, Brett, Brett Handley wrote:
> I had thought of using checksum/secure as an ID some time back. > But the synonym possibility made be nervous. Why? > (1) Such a chance is greater than zero > (2) I believe that Murphy's law is understated. > (3) I've programming so long I'm automatically suspicious of > possibilities :) >
I agree on all counts.
> > Needless to say, none of this implies that synonyms fail to exist > > (see previous note), but simply that it should be infeasibly hard > > to construct a synonym for a given input. > > Apply that quote to the idea of using checksum/secure as an ID and > it sounds comforting to me. > I wonder if it is "infeasibly hard" over a lifetime? >
Well, "infeasibly hard" means the probability that a random message would have the same checksum should be about 6.84227765783589E-49 and you shouldn't be able to do better crafting one yourself. The issue is not that synonyms won't happen, but that they're unlikely and you can't predict WHEN they'll happen. Therefore they are "safe" for indicating whether a message has been tampered with. -jn- -- ; sub REBOL {}; sub head ($) {@_[0]} REBOL [] # despam: func [e] [replace replace/all e ":" "." "#" "@"] ; sub despam {my ($e) = @_; $e =~ tr/:#/.@/; return "\n$e"} print head reverse despam "moc:xedef#yleen:leoj" ;

 [10/18] from: brett::codeconscious::com at: 17-Jan-2002 23:27


> Well, "infeasibly hard" means the probability that a random message > would have the same checksum should be about 6.84227765783589E-49 > and you shouldn't be able to do better crafting one yourself. > > The issue is not that synonyms won't happen, but that they're > unlikely and you can't predict WHEN they'll happen. Therefore they > are "safe" for indicating whether a message has been tampered with.
So um, that means using it to produce a UUID is ok as long as I'm willing to accept the odds of a clash (very unlikely). I've more chance of being hit by a bus than having to deal with the consequences of a little ID clash in my system. Therefore I should focus my energies on watching out for buses than coding around clashes of IDs. Maybe I should go to bed ! Brett.

 [11/18] from: greggirwin:mindspring at: 17-Jan-2002 10:33

Re: How to generate a UUID? (was possible data format... Re:Re: dbms3.r


Hi Joel, Thanks for all the info. Great stuff! << Needless to say, none of this implies that synonyms fail to exist (see previous note), but simply that it should be infeasibly hard to construct a synonym for a given input. >> OK, so the only thing we have to do, if I understand things correctly, is to come up with a unique string of data and checksum/secure will generate a hash from that which is very likely *also* unique. :) --Gregg

 [12/18] from: sunandadh:aol at: 17-Jan-2002 14:05

Re: How to generate a UUID?


[greggirwin--mindspring--com] writes:
> OK, so the only thing we have to do, if I understand things correctly, is to > come up with a unique string of data and checksum/secure will generate a > hash from that which is very likely *also* unique. :)
Nice analysis! Which bounces us (if I'm following right) right back to the original question. My take on it is to get a guaranteed unique string (given there are guarantees in this world) that will work across the world, it needs three parts: pppp-rrrr-ssss where pppp - PREFIX issued by a central and trusted server somewhere. You may only need to get one prefix per installation of your application so we don't have a great overhead in getting this. rrrr - RANDOM. Is some random number we add to that prefix. This part is optional, IF you believe the central server will never, ever, fail or be restarted, or hacked so that it produces (very occasionally) duplicate unique prefixes, THEN you can dispense with it. On the other hand, how long it is indicates your faith in the central server. So all the UUIDs in your application begin pppp-rrrr- That leaves ssss SERIAL. The serial part that YOU assign in YOUR application. It's your job to make sure that these are always and forever unique within the application. Put the three together, and you may have a unique ID. (I used to work in the automotive industry. Every vehicle in the world, on leaving its factory, has a unique VIN -- Vehicle Identification Number -- stamped on its chassis. Do you want to know how many vehicles share the same VIN through manufacturing IT cock-ups? The answer is an embarrassingly large number, even with short-run luxury car makers.) I got code somewhere that generates ssss's. Maybe IOS could offer a pppp service, and then we'd be ready to roll. Sunanda.

 [13/18] from: greggirwin:mindspring at: 17-Jan-2002 22:42


Hi Sunanda, << Nice analysis! Which bounces us (if I'm following right) right back to the original question. >> Ex-actly. :) Good thoughts on the elements needed to assemble a UID. My preference would be not to require an external server if it can be avoided. I think a standalone machine needs to be able to generate UIDs as well. What you call PREFIX, I would probably call "Creation Location". To me, this equates to the unique machine identifier as used in the OSF DCE spec. There could be varying implementations for this, one of which *could* be an external server. If RT could provide a native 'machine-uid function, that would be ideal. Otherwise we have to synthesize it somehow, or make it a configuration option. The SERIAL part should be easy enough, though I see this just as a counter combined with a timestamp, which you may not given your experience. The biggest risk I see would be two instances of the script running at the same time and each being called inside the resolution of what now/precise gives us. If that's a risk worth considering, you could use a file as your shared memory space, though that wouldn't be my first choice. Clock resets also have to be considered. I like having a timestamp segment because it can be useful information in and of itself. This leaves the RANDOM part. You could let the application provide this (e.g. the make-uid function has a /with <random-segment-seed> refinement) to make it easy, and maybe just have a very simple implementation (using random/seed + random/secure) beyond that. --Gregg

 [14/18] from: sunandadh:aol at: 18-Jan-2002 8:54


Hi Gregg,
> What you call PREFIX, I would probably call "Creation Location". To me,
this
> equates to the unique machine identifier as used in the OSF DCE spec. There > could be varying implementations for this, one of which *could* be an > external server. If RT could provide a native 'machine-uid function, that > would be ideal. Otherwise we have to synthesize it somehow, or make it a > configuration option.
I think we have slightly different models in mind. You, I think, are looking for a "prefix" unique to a machine. I was chipping in an approach to a slightly different problem: to get a unique "prefix" for an "application node" (to coin a phrase). That's a bit of an application running in one machine; but it might migrate to another; and that machine may be running more than one application, each of which needs to generate its own UUIDs. But yes, a machine-uid function (that gets the CPU model and serial for machines that support that; or the ethernet address for machines with a card; or _something_ unique) would be a big step in the right direction for both of us. Sunanda.

 [15/18] from: greggirwin:mindspring at: 18-Jan-2002 11:10


Hi Sunanda, << I think we have slightly different models in mind. >> I think you're right. :) << You, I think, are looking for a "prefix" unique to a machine. I was chipping in an approach to a slightly different problem: to get a unique "prefix" for an "application node" (to coin a phrase). That's a bit of an application running in one machine; but it might migrate to another; and that machine may be running more than one application, each of which needs to generate its own UUIDs. >> Right. I've used an approach somewhere between the two we're discussing, to generate OIDs (Object IDs) for an object-centric DB design. To do that, I used the GUID created by CoCreateGUID (under Windows) and re-defined one segment to use as your PREFIX segment. One use I've been thinking of, and why I think generic global uniqueness is important (beyond application specific uniqueness requirements), is that of messages and agents which may "travel". I'll be doing some more research on this, and maybe even hack up a little test. Maybe, together, we can come up with something that will work for both of us. --Gregg

 [16/18] from: joel:neely:fedex at: 18-Jan-2002 13:16


Hi, Gregg, Sunanda, et al, Gregg Irwin wrote:
> One use I've been thinking of, and why I think generic global > uniqueness is important (beyond application specific uniqueness > requirements), is that of messages and agents which may "travel". > > I'll be doing some more research on this, and maybe even hack up > a little test. Maybe, together, we can come up with something > that will work for both of us. >
Just to connect this thread with the well-deserved resurgence of interest in Rugby... Imagine a Rugby-based registry that works as follows: - A developer submits a well-defined set of parameters (e.g. developer's name, company (if relevant), project, version, date) to a Rugby service. - The reponse is an ID guaranteed to be unique because only (the single instance of!) that service distributes them. - Anyone can subsequently supply the ID to the service and get back the identification data from the initial request. Thoughts? -jn-

 [17/18] from: greggirwin:mindspring at: 18-Jan-2002 14:16


Hi Joel, << - A developer submits a well-defined set of parameters (e.g. developer's name, company (if relevant), project, version, date) to a Rugby service. - The reponse is an ID guaranteed to be unique because only (the single instance of!) that service distributes them. - Anyone can subsequently supply the ID to the service and get back the identification data from the initial request. >> I can think of lots of variations on that theme which would be useful, though not necessarily related to uniqueness. Having a central server certainly helps, but then you are dependent on it. In some cases you may want at least some element of control over the structure and content of the UID (Suananda's approach) and, in that scenario, it's very helpful. I've been thinking that the UIDs are plain, structured data. Now, you've got me thinking that an encrypted UID could also be useful. Hmmm. More to think about. --Gregg

 [18/18] from: sunandadh:aol at: 18-Jan-2002 19:06


Hi Joel,
> Imagine a Rugby-based registry that works as follows: > - A developer submits a well-defined set of parameters (e.g.
<<quoted lines omitted: 4>>
> - Anyone can subsequently supply the ID to the service and get > back the identification data from the initial request.
I see this as potentially problematic as the service must either: -- Have a hashing algorithm that is guaranteed never to clash on different strings. This takes us back to the "how unique is checksum/secure" question; or -- Keep a database of the parameters and the UUID issued for it. I wouldn't trust a database to keep its data clean enough for this over the potential decades of life cycle. Also, keeping a database means someone can't use the service anonymously My thoughts for the equivalent service was that it could just issue an integer. One way that could be done: Issue an integer based on the exact UTC time as got from a trusted atomic clock service. That way, no database needed--just some instore memory to check you didn't just issue that number a microsecond ago. (And a little bit of code to check that the clock hasn't gone backwards during a leap-second removal). If the service returned a signed 64 bit integer which was (in effect) number of microseconds from 01/Jan/2002, it could issue a million unique numbers a second for nearly 300,000 years....It might then be safe to start again from zero as many of the early applications will have been retired. If not, we could always use that 64th bit and gain another 300,000 years before upsetting our users. This isn't really a full-blown UUID process because it couldn't handle more than 1,000,000 requests a second. But remember, my model was that we are producing a unique prefix. It's up to the application to generate its own serial number. The serial number alone makes the thing unique within the application. Appending it to the number from the UUID service produces something that should be unique across the earth. More complications ensure if you want a UUID that is unique across the solar system. Sunanda.

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted