Mailing List Archive: Re: percent! - new datatype request

[REBOL] Re: percent! - new datatype request

From: joel:neely:fedex at: 13-Jun-2002 7:45


Hi, Carl

Carl Read wrote:
> >>>    ++$tally{$word}
>

 vs.

> add-word: function [tally [block!] word [string!]][temp][
>     either temp: find/skip tally word 2 [
>         temp/2: temp/2 + 1
>     ][
>         append tally reduce [word 1]
>     ]
> ]
>

Your approach is essentially the same as the one I've used for
such tasks.  By my count 16 bytes versus 179 bytes, or 11 times
as much code.  I know that this is a dinky example (and if I
take out all of the whitespace used to "pretty-print" the REBOL
version, it shrinks to 139 bytes -- although that renders it
significantly less readable), but here's the point I was trying
to make with this dinky example:

    Manipulating non-trivial data structures in REBOL, even
    something as simple as a look-up table, involves much more
    code and much more intpreter overhead than with several
    other languages in the same space -- 'net applications.

    I truly believe that addressing that issue would be of
    significant value in promoting the wider use of REBOL.

Just for the exercise, I coded the word-count problem in both
Perl (using essentially the same code previously posted) and
REBOL (using essentially the same approach you provided).  I
wrote both of them to use a hard-coded file name, to avoid the
additional overhead of command-line parameter handling.  For a
sample file, I used the text of "Alice in Wonderland" from the
Gutenberg Project, with the following typical results (median
of three trials):

    (/export/home/jneely/try)# wc alice.txt
        2415   26434  145197 alice.txt

    (/export/home/jneely/try)# time tally.pl > /dev/null

    real    0m0.31s
    user    0m0.30s
    sys     0m0.01s

    (/export/home/jneely/try)# time tally.r > /dev/null

    real    0m10.35s
    user    0m10.24s
    sys     0m0.09s

For a ~142k file, Perl produced a sorted word-count list in under
1/3 second, while REBOL required over 10 1/3 seconds for the same
data.

I'm certainly not claiming that producing a dictionary of "Alice
in Wonderland" is typical of the workload of all REBOL programmers,
but I do believe that the type of processing involved *is* typical
of many server-side scripts, whether CGI or "back-room" data
reduction -- read one or more text files, parse them down and do
some processing on the analyzed text, then spit out some results.

The trade-off between native and mezzanine code is a significant
performance factor in REBOL.  The fact that I can write a
function to do some task doesn't change the fact that there's a
run-time cost influenced by how much of the work of that function
must be expressed in mezzanine/user code.

I'm certainly not ignoring the value of graphical user interfaces
(I've been a Mac user since 1984, and my primary personal box is
an iBook running Mac OS X), but I think it will be hard to sell
the idea of a GUI whose underlying data manipulation is awkward
to write and sluggish in performance.  (Not to mention the fact
that performance is *T*H*E* issue in server-side programming.
Although Java was widely hyped as a universal client-side
language, most of what I'm seeing in the Java word now is on the
server side -- now that there are JVMs with adequate performance.)

Hence my personal opinion that the cause of REBOL would be more
advanced at this point by attention to facilities for working
with non-trivial data structures and performance, than by adding
another data type.

As for the /View discussion, I'll reply separately.

-jn-

--
; Joel Neely                             joeldotneelyatfedexdotcom
REBOL [] do [ do func [s] [ foreach [a b] s [prin b] ] sort/skip
do function [s] [t] [ t: "" foreach [a b] s [repend t [b a]] t ] {
| e s m!zauafBpcvekexEohthjJakwLrngohOqrlryRnsctdtiub} 2 ]