[REBOL] Re: Coffee break problem anyone?
From: joel:neely:fedex at: 5-Nov-2003 13:37
First, let me offer an improvement and then address some more of the
good discussion points you raise. (There's a little challenge at
the end for the interested who wade through to that point! ;-)
Minor issue first: instead of initializing run length to zero and
incrementing for each added element, just directly calculate the
length once the run is finished.
Now for the more interesting stuff...
Instead of saying that:
- a block is an iteration of runs, and
- a run is an iteration of numbers;
block --*-- run --*-- number
lets say that:
- a block is an iteration of runs, and
- a run is a first number, followed by any consecutive numbers.
+-- first number
+--*-- consecutive number
This (again) addresses a boundary issue; the first value in a run
plays a different role than the other numbers in the run. Tricky
setup of initial conditions so that the first value can be treated
just like all the others adds more complexity to the code.
That said, here's a version that implements that view:
where-j: func [
/local maxpos maxlen runpos runlen prev curr
maxpos: maxlen: 0 ; default empty run
while [not empty? block] [ ; more data = more runs
runpos: index? block ; start new run here
prev: block/1 ; remember first value
block: next block ; done with first
prev + 1 = curr: block/1 ; extending the run
prev: curr ; save comparison value
block: next block ; move on
runlen: (index? block) - runpos ; now compute length
if runlen > maxlen [ ; update best run?
reduce [maxpos maxlen] ; return best run
> ... the insight that (in effect) the procedures can fall naturally out of an
> analysis of the data structures can lead to some very elegant solutions.
> Of course there are situations where neither approach works, and then you
> need other approaches too. Object-orientation is one claim to the next step as it
> merges both approaches. I never quite saw the point there, either, but it
> does work in some areas.
I'm not clear on what you mean by "both approaches"... I view JSP
(Jackson Structured Programming) simply as a specific type of
structured programming which has strong heuristics about which
structure(s) should drive the design.
> And then there are situations that seem completely off the
> deterministic axis.
??? Do you mean non-deterministic programming, or just programming
when the criteria are only vaguely stated?
> Consider this progression:
> -- Find the sum of the numbers (simply almost any way: write a loop;
> use a map function, etc)
> -- Find the longest run with the highest end value (101--107 in the
> original data)
An easy extension of the above program:
- add a local MAXTOP initialized to anything
- change the condition at the end from
runlen > maxlen
any [runlen > maxlen all [runlen = maxlen prev > maxtop]]
- change the "new best run" block to include
(For why this was easy, see below...)
> -- Find the run (as defined in the original problem) whose run length
> is the modal length of runs
First, we have to change "the run" to "a run", since the answer is no
longer unique. For the data
[10 8 9 6 7 4 5 0 1 2]
there are runs of length 1, 2, 2, 2, and 3, so the mode length is 2,
and there are three runs that have that length.
However, something more significant has happened. The previous change
to the problem (longest run with highest upper value) actually had THE
SAME STRUCTURE as the original problem: find the "best" run, where we
have a simple test for determining whether one run is "better" than
another. In other words, we can generate runs one at a time, and test
each one against the current "best" to see which one wins.
But in this last change, we now are asking for a run based on some
property/ies of the entire collection of runs, so we must generate (and
retain) all of them. We can (trivially) change the above function to
return all runs, then write a separate "pick a median" function to
operate on that collection (or we build the additional stuff into the
end of the modified function). In either case we now have a two-step
1) transform the offered block into a collection of runs;
2) do something with the collection of runs.
That's why the first change was easy and the second one was harder.
> --Find the two runs that are most nearly similar (contain mostly the
> same numbers)
I think I can paraphrase that as "find two runs whose intersection is
at least as long as any other intersection of distinct runs", which is
easy to solve, still using the two-step approach above.
This one is also not uniquely determined without more constraints; to
see why, consider
[1 2 3 4 5 2 3 4 5 6 3 4 5 6 7]
> -- The numbers are RGB values from a JPG. Find the average color tone
> of the object the female nearest the camera is looking at. Break out
> the neural networks for this one!
Now we've jumped from a programming problem to a specification problem!
The other tasks were well-defined (there could be no argument about
whether an answer is correct or not), but not so with this last one.
(I recently saw a documentary on TV in which a critical question was
whether a certain figure in a Renaissance painting was male or female.
The "experts" couldn't agree. ;-)
> In my actual case, the patterns I was looking for grew more complex,
> so I reasoned that the requirement was becoming unreasonable as the
> preceding data structures weren't designed for easy sifting. Luckily,
> I could rewrite history (not always a luxury in this industry) and
> simplified the problem almost out of existence.
My complements!!! Sometimes the best way to solve a problem is to
redifine the terms of discourse so that the problem simply disappears!
> Finally, this is getting a bit long, but just to stick up for the
> "finite machine state" approach I took in the original code...
I certainly use FSM-inspired design at times myself, but the nasty
part (IMHO) is always the transition between states. Not implementing
the transition (usually just updating a variable) but making it clear
to the eye/mind of the reader the pattern of transitions that will
In that regard, I suggest that the (improved) JSP-inspired version
above can be regarded as a FSM as well. It's just that each "state"
corresponds to a region of code, and the state transitions are simply
the standard flow-control!
In the updated FSM version, State 0 is *always* followed by State 1
(modulo falling off of the end) and a series of occurrences of State 1
are likewise followed by a State 0 (same caveat). The JSP version
makes that organization much more obvious (IMHO). Finally, there is
still considerable redundancy in the FSM version -- it's just in time
rather than in codespace! Notice that the "update largest" test is
done *for each element* of each run, and the sets for a maximum-length
run will be done for each element after the length of its nearest
competitor has been passed. The FSM code contains these expressions
only once, but evaluates them *MANY* more times than the equivalent
ones in the JSP version.
I'm not criticizing here, but pointing out the trade-offs (which just
happen to be particularly dramatic in this case! ;-) It is possible
to optimize for one measure (in this case, minimizing source code
redundancy) in such a way that it pessimizes another measure (the
number of times the code is -- redundantly -- evaluated).
Note that the JSP design heuristics don't minimize *ALL* aspects of
code redundancy; the NOT EMPTY? test is really in there twice!!!
Brownie points to anyone who can show why! ;-)