Mailing List Archive: Re: Finding a value in a series?

[REBOL] Re: Finding a value in a series?

From: joel:neely:fedex at: 22-May-2001 14:50


Hi, Charles,

Charles Wardell wrote:
> I really appreciate your help. Do you believe that this would
> perform as well as the native Select?
>

The obvious off-the-top-of-the-head answers are:

1)  Of course (if "as well" means "as correctly"), and
2)  Of course not (if "as well" means "as quickly").

However, to avoid sounding like a smart-alec, I decided to run
some real benchmarks and quantify the answer.  In the course
of doing so, I ran into some real surprises!

I tried to think of several obvious implementation variations on
the first quick-and-dirty LOCATE -- still returning a position
within a block for the reasons described earlier.   I came up
with these:

    locatepath: func [b [block!] t [integer!]] [
         forall b [
             if t = b/1/1 [return b]
         ]
    ]

    locatepick: func [b [block!] t [integer!]] [
        forall b [
            if t = pick pick b 1 1 [return b]
        ]
    ]

    locatefirst: func [b [block!] t [integer!]] [
        forall b [
            if t = first first b [return b]
        ]
    ]

    pickat: func [b [block!] t [integer!]] [
        repeat i length? b [
            if t = pick pick b i 1 [return at b i]
        ]
        tail b
   ]

All of these would use the block-in-a-block scheme from the
original question:

    blk2: [[1 a b] [2 c d] [3 e f]]

Based on your second "perform as well?" question, they would
need to be compared with the simplistic

    find/skip block1 t 2

which uses a flattened data structure that would look like this:

    blk1: [1 [a b] 2 [c d] 3 [e f]]

I built a pair of such data blocks with the same content, with
a thousand logical records each, with the test code searching
for the last key in the block via all of the above methods.
I ran several tests of 10,000 searches, averaging times and
recycling memory every 1,000 searches.  One mild surprise was
that there was significant variation between the averages from
different tests.  I can only assume from those variations (and
from observing the system behavior otherwise) that there were
intermittent activities (e.g., garbage collection?) that would
happen "every now and then" and perturb the measurements.

It's no surprise that the FIND/SKIP approach is by far the
fastest alternative (but, of course, it doesn't use your
desired data structure).

The first major surprise is that the PICKAT beats all other
options for performance against your desired two-level block
structure.  It takes about 17 to 20 times as long as the
FIND/SKIP code.  Given that PICKAT uses as much or more
high-level REBOL code per inner cycle as any other option,
I expected it to be one of the slower ones.  Not so!

The next major surprise is that LOCATEFIRST, LOCATEPICK, and
LOCATEPATH consistenly rank in a different order on W2K and
Linux.

On W2K, LOCATEPATH takes about 2 1/3 times as long as PICKAT,
LOCATEFIRST takes a little under 2 1/2 times as long, and
LOCATEPICK takes a little over 2 1/2 times as long.  However,
on Linux, LOCATEPATH is the slowest, running about 10% longer
than LOCATEFIRST.  Again, one might speculate about the
affects of different memory managment schemes, but I really
have no explanation for the W2k-vs-Linux performance diffs.

However, the punchline is clear.  Shorter-looking code is
*NOT* necessarily shorter-running.  Among these contenders,
PICKAT is the winner.

-jn-