Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search

[REBOL] Re: Finding a value in a series?

From: joel:neely:fedex at: 22-May-2001 14:50

Hi, Charles, Charles Wardell wrote:
> I really appreciate your help. Do you believe that this would > perform as well as the native Select? >
The obvious off-the-top-of-the-head answers are: 1) Of course (if "as well" means "as correctly"), and 2) Of course not (if "as well" means "as quickly"). However, to avoid sounding like a smart-alec, I decided to run some real benchmarks and quantify the answer. In the course of doing so, I ran into some real surprises! I tried to think of several obvious implementation variations on the first quick-and-dirty LOCATE -- still returning a position within a block for the reasons described earlier. I came up with these: locatepath: func [b [block!] t [integer!]] [ forall b [ if t = b/1/1 [return b] ] ] locatepick: func [b [block!] t [integer!]] [ forall b [ if t = pick pick b 1 1 [return b] ] ] locatefirst: func [b [block!] t [integer!]] [ forall b [ if t = first first b [return b] ] ] pickat: func [b [block!] t [integer!]] [ repeat i length? b [ if t = pick pick b i 1 [return at b i] ] tail b ] All of these would use the block-in-a-block scheme from the original question: blk2: [[1 a b] [2 c d] [3 e f]] Based on your second "perform as well?" question, they would need to be compared with the simplistic find/skip block1 t 2 which uses a flattened data structure that would look like this: blk1: [1 [a b] 2 [c d] 3 [e f]] I built a pair of such data blocks with the same content, with a thousand logical records each, with the test code searching for the last key in the block via all of the above methods. I ran several tests of 10,000 searches, averaging times and recycling memory every 1,000 searches. One mild surprise was that there was significant variation between the averages from different tests. I can only assume from those variations (and from observing the system behavior otherwise) that there were intermittent activities (e.g., garbage collection?) that would happen "every now and then" and perturb the measurements. It's no surprise that the FIND/SKIP approach is by far the fastest alternative (but, of course, it doesn't use your desired data structure). The first major surprise is that the PICKAT beats all other options for performance against your desired two-level block structure. It takes about 17 to 20 times as long as the FIND/SKIP code. Given that PICKAT uses as much or more high-level REBOL code per inner cycle as any other option, I expected it to be one of the slower ones. Not so! The next major surprise is that LOCATEFIRST, LOCATEPICK, and LOCATEPATH consistenly rank in a different order on W2K and Linux. On W2K, LOCATEPATH takes about 2 1/3 times as long as PICKAT, LOCATEFIRST takes a little under 2 1/2 times as long, and LOCATEPICK takes a little over 2 1/2 times as long. However, on Linux, LOCATEPATH is the slowest, running about 10% longer than LOCATEFIRST. Again, one might speculate about the affects of different memory managment schemes, but I really have no explanation for the W2k-vs-Linux performance diffs. However, the punchline is clear. Shorter-looking code is *NOT* necessarily shorter-running. Among these contenders, PICKAT is the winner. -jn-