Mailing List Archive: Re: FW: Re: [append][series]Appending to a series of strings

[REBOL] Re: FW: Re: [append][series]Appending to a series of strings

From: joel:neely:fedex at: 20-Nov-2003 13:22


Hi, Ladislav,

Actually, it's not faster (for sufficiently large cases).  See below.

Ladislav Mecir wrote:
> Hi, my solution using Parse (I think, that it is much faster, than
 > other solutions):
>

I did a bit of benchmarking with functions that use each of the three
strategies to generate a block of answers (value/count pairs).  I'll
include those functions at the end, in case anyone wants to verify
that I didn't mangle any code.

Using a SCORES block of random numbers between 0 and 100 (inclusive),
the iterative version scales up better than the parse-based version
as the size of the SCORES block increases (all times in seconds):

     size    iterative  remove-each  parse-based
      1000        1E-2        0.28         0
     10000       0.351        2.874        0.24
    100000       3.024       36.813        4.637
    200000       4.787       --            4.556
    300000       6.74        --            6.94
    400000       8.832       --           14.371
    500000      11.887       --           18.517
   1000000      21.891       --           39.227

I gave up VERY quickly on the remove-each-based version; it gets
eaten alive by memory management overhead.

Since the parse-based version sorts (a copy of) the scores block,
its time complexity must be at least O (n log n).

The iterative version is only O (n), so it will be faster for
sufficiently large n.

I should also point out that the iterative version requires only
one value at a time; it can work on an arbitrarily large set of
values (e.g., being retrieved across a network connection, read
from a huge data file, resulting from computation in a loop, etc.),
but the remove- and  parse-based versions require the entire set
to be available for sorting.

There's one final point, but I'll post it separately.

-jn-

Ladislav Mecir wrote:
>    scores: clear []
>    loop 30 [append scores random 20]
>    group: [p: set i integer! any i q: (print ["score:" i "tallies:"
> offset? p q])]
>    parse probe sort scores [any group]
>
>>-----Original Message-----
>>From: Anton Rolls [mailto:[antonr--iinet--net--au]]
>>
>>; initialize some random scores
>>scores: clear []
>>loop 30 [append scores random 20]
>>
>>; figure out how many of each score
>>tallies: clear []
>>foreach uscore sort unique scores [
>>	append/only tallies reduce [
 >>          uscore
 >>          length? remove-each score copy scores [uscore <> score]
 >>      ]
>>]
>>

SOURCE CODE FOR TIMED FUNCTIONS IS GIVEN BELOW:

;iterative version

     tally-i: func [
         scores [block!]
         /local tallies result
     ][
         tallies: copy []
         foreach score scores [
             either found? here: select tallies score [
                 here/1: here/1 + 1
             ][
                 insert tail tallies reduce [score copy [1]]
             ]
         ]
         result: make block! length? tallies
         foreach [score tally] sort/skip tallies 2 [
             insert tail result score
             insert tail result tally
         ]
         result
     ]

;remove-each-based version

     tally-r: func [
         scores [block!]
         /local tallies
     ][
         tallies: clear []
         foreach uscore sort unique scores [
	        append/only tallies reduce [
                 uscore
                 length? remove-each score copy scores [uscore <> score]
             ]
         ]
     ]

;parse-based version

     tally-p: func [
         scores [block!]
         /local group p q result
     ][
         result: copy []
         group: [
             p: set i integer! any i q: (
                 insert tail result i
                 insert tail result offset? p q
             )
         ]
         parse sort copy scores [any group]
         result
     ]

--
----------------------------------------------------------------------
Joel Neely            joelDOTneelyATfedexDOTcom           901-263-4446

Enron Accountingg in a Nutshell: 1c=$0.01=($0.10)**2=(10c)**2=100c=$1