Mailing List Archive: Re: Coffee break problem anyone?

[REBOL] Re: Coffee break problem anyone?

From: joel:neely:fedex at: 4-Nov-2003 11:32


Hi, Sunanda,

Nice little problem!  (And, I admit to being more interested in the
algorithm design aspect than the REBOL/PARSE aspect...)

[SunandaDH--aol--com] wrote:
> The problem is this......You have a block containing integers, like:
>
> sparse-array: [ 12 13 14 15 16 17
...
>              ]
>
> There are runs of consecutive, ascending integers: 12--17, 7--8 etc.
 > We want to find the location in the block of the first longest,
 > such sequence.
>
> The answer for this dataset is 20--26.   101--107 is as long, but
 > isn't the  first. 59--52 is longer, but it is descending.
>

This is a great little problem for comparing two approaches that handle
state differently:  classic stateful structured programming massages
variables to maintain state, while the Jackson Structured Methodology
maps data structure to algorithm structure as much as possible.

Incidentally, I'm extending the spec to include the requirement that
we should get a zero-length run if the array is empty...

The "classic" stateful approach says "We have an array; process it one
element at a time, and keep track of whether each new element extends
a run or begins a new one."  This normally gets a little tricky when
we have to address boundary conditions (e.g. what do we compare the
first element against, as it has no predecessor?)  Here's a standard
design based on that strategy:

find-run-stateful: func [
     data [block!]
     /local maxpos maxlen seqpos seqlen prev curr
][
     maxpos: maxlen: 0                   ; initially no runs
     seqpos: seqlen: 0                   ; imaginary first run
     prev: data/1                        ; pretend first is...
                                         ; ... its own previous
     for pos 1 length? data 1 [          ; count across array
         curr: data/:pos                 ; get new current value
         either curr - 1 = prev [        ; if current extends run...
             seqlen: seqlen + 1          ; ... just bump length
         ][                              ; ... otherwise
             if seqlen > maxlen [        ; if last run was new best...
                 maxpos: seqpos          ; ... remember position
                 maxlen: seqlen          ; ... and length
             ]
             seqpos: pos                 ; start new run here...
             seqlen: 1                   ; ... with length of 1
         ]
         prev: curr                      ; remember for comparison
     ]
                                         ; we can have a run underway
                                         ; when falling off the end
     if seqlen > maxlen [                ; so we must check its length
         maxpos: seqpos                  ; to see if it was the best
         maxlen: seqlen                  ; run in the array
     ]
     reduce [maxpos maxlen]
]

The tricky bits in this design are:
- initializing the imaginary "previous run" before the first
   element of the block, in such a way that it won't be extended
   by the first element, and it won't look like a better run than
   any that we actually find;
- remembering to check the run which goes off the end of the block.

Notice that the above design just assumes that the block is an iteration
of numbers, and so has to figure out what to do with each number
(including the first!).  It maintains two invariants:

- SEQPOS and SEQLEN represent the run in progress which each element
   either extends or terminates (by starting a new one);
- MAXPOS and MAXLEN represent the longest *completed* run yet seen.

The Jackson method says instead, "The argument block is an iteration
of runs, and a run is an iteration of consecutive numbers" and so has
two loops: the outer one iterates across runs in the block, and the
inner one iterates across numbers in a single run.

find-run-Jackson: func [
     data [block!]
     /local maxpos maxlen seqpos seqlen prev curr
][
     maxpos: maxlen: 0                   ; initially no runs
     while [not empty? data] [           ; process another run...
         seqpos: index? data             ; ... starting here
         seqlen: 0                       ; ... initially empty
         prev: none                      ; ... no previous value
         while [                         ; same run as long as...
             all [
                 not empty? data         ; ... more data and
                 any [
                     (curr: data/1) - 1  ; ... current value
                         = prev          ; ... extends this run, or
                     zero? seqlen        ; ... this is a new run
                 ]
             ]
         ][                              ; same run, so...
             prev: curr                  ; ... save value to compare
             seqlen: seqlen + 1          ; ... extend length
             data: next data             ; ... shift the data
         ]
         if seqlen > maxlen [            ; if that run is new best...
             maxpos: seqpos              ; ... remember position
             maxlen: seqlen              ; ... and length
         ]
     ]
     reduce [maxpos maxlen]
]

Notice that now there's a single, obvious place for the test for new
best run -- right after the inner loop across a run's numbers!  The
condition on the inner loop has to satisfy the outer loops condition
first, and then add its own criteria (which uses ZERO? SEQLEN to mean
that we have a new run, and therefore don't care whether the previous
value was one less than the current one).

To see the value of this structural change, imagine that we now need
to return a triplet of numbers; in addition to the position and length
of the longest consecutive run, also return the count of how many runs
of that length existed.  Where would that code go in the Jackson
version?  Where would it have to appear in the other one?

-jn-

--
----------------------------------------------------------------------
Joel Neely            joelDOTneelyATfedexDOTcom           901-263-4446

Enron Accountingg in a Nutshell: 1c=$0.01=($0.10)**2=(10c)**2=100c=$1