Coffee break problem anyone?
[1/19] from: SunandaDH::aol::com at: 4-Nov-2003 7:02
Here's a little problem I needed an answer to yesterday.
As I needed the answer yesterday, I hacked together some code. But there has
got to be a more elegant way, surely. Maybe using parse on a block?
The problem is this......You have a block containing integers, like this:
sparse-array: [ 12 13 14 15 16 17
7 8
20 21 22 23 24 25 26
19
59 58 57 56 55 54 53 52
20 21 22 23
101 102 103 104 105 106 107
]
They are not in ascending order, and there are some numbers duplicated. But
there are runs of consecutive, ascending integers: 12--17, 7--8 etc. We want
to find the location in the block of the first longest, such sequence.
The answer for this dataset is 20--26. 101--107 is as long, but isn't the
first. 59--52 is longer, but it is descending.
My solution is below. Someone can do better, surely!
Thanks,
Sunanda
;;=============================
find-longest-run: func [data-array [block!]
/local current-state
current-location
run-size
largest-run-size
largest-run-start
]
[
current-state: 0
current-location: data-array
run-size: 0
largest-run-size: 0
largest-run-start: data-array ;; in case there are zero data items
Forever
[if tail? current-location
[if run-size > largest-run-size
[largest-run-size: run-size
largest-run-start: run-start
]
break ;; we're done
]
switch current-state
[
;; State 0: start of a new run
;; ---------------------------
0 [
run-start: current-location
run-size: 1
current-location: next current-location
current-state: 1
] ;; end state 0
;; State 1: next item
;; ------------------
1 [
either (1 + first back current-location) = current-location/1
[
;; The run continues
;; -----------------
run-size: run-size + 1
current-location: next current-location
]
[
;; Run's ended: check if it was the largest so far
;; -----------------------------------------------
if run-size > largest-run-size
[largest-run-size: run-size
largest-run-start: run-start
]
current-state: 0
]
] ;; end state 1
] ;; switch
] ;; forever
return reduce [largest-run-size largest-run-start]
] ;; func
;; Run some tests
;; --------------
set [run-size run-start] find-longest-run sparse-array
print ["Length:" run-size "Looks like:" copy/part run-start run-size]
set [run-size run-start] find-longest-run [1 2 3 4 5 ] ;; just one run
print ["Length:" run-size "Looks like:" copy/part run-start run-size]
set [run-size run-start] find-longest-run [999] ;; just one data point
print ["Length:" run-size "Looks like:" copy/part run-start run-size]
set [run-size run-start] find-longest-run [] ;; empty dataset
print ["Length:" run-size "Looks like:" copy/part run-start run-size]
[2/19] from: g:santilli:tiscalinet:it at: 4-Nov-2003 14:05
Hi Sunanda,
On Tuesday, November 4, 2003, 1:02:28 PM, you wrote:
Sac> As I needed the answer yesterday, I hacked together some code. But there has
Sac> got to be a more elegant way, surely. Maybe using parse on a block?
With PARSE, a solution could be:
block: [
12 13 14 15 16 17
7 8
20 21 22 23 24 25 26
19
59 58 57 56 55 54 53 52
20 21 22 23
101 102 103 104 105 106 107
]
results: []
parse block [
some [
start:
set value integer!
any [
(value: value + 1)
1 1 value
]
finish:
(insert/only results reduce [negate offset? start finish index? start copy/part start
finish])
]
]
foreach result sort results [
print ["Sequence:" result/3 "(len" negate result/1 "pos" result/2 ")"]
]
This is less efficient because it is using SORT to find the best
result, however has the advantage of giving the other results too.
The sort can be avoided, anyway.
Regards,
Gabriele.
--
Gabriele Santilli <[g--santilli--tiscalinet--it]> -- REBOL Programmer
Amiga Group Italia sez. L'Aquila --- SOON: http://www.rebol.it/
[3/19] from: rotenca:telvia:it at: 4-Nov-2003 16:42
This is my try with parse:
find-longest-run: func [data-array [block!] /local x next-x start res len val
h test][
start: data-array
res: reduce [0 data-array]
test: [
res/1 < len: offset? start h
insert/only insert clear res len start
]
parse start [
set x number! (next-x: x + 1)
any [
h: [
set x 1 1 next-x |
set x number! (all test start: h)
]
(next-x: x + 1)
]
(all test)
]
res
]
---
Ciao
Romano
[4/19] from: patrick:philipot:laposte at: 4-Nov-2003 16:46
Hi Gabriele,
Wooaooh, brilliant !
But could you (or some parse guru) explain a bit.
I don't understand the second part of the any rule (1 1 value).
Regards
Patrick
[5/19] from: lmecir:mbox:vol:cz at: 4-Nov-2003 16:58
Hi Sunanda,
here is my tea break result:
find-longest-run: function [
[catch]
data-array [block!]
] [
longest-run-size
longest-run-start
size
position
number
] [
if empty? data-array [throw make error! "a non-empty array expected"]
longest-run-start: position: back tail data-array
longest-run-size: size: 1
while [not head? position] [
if not number? first position [throw make error! "an integer array expected"]
number: first position
position: back position
either (first position) = (number - 1) [
size: size + 1
if size >= longest-run-size [
longest-run-size: size
longest-run-start: position
]
] [size: 1]
]
reduce [longest-run-size longest-run-start]
]
-L
----- Original Message ----- >
Here's a little problem I needed an answer to yesterday.
[6/19] from: g:santilli:tiscalinet:it at: 4-Nov-2003 17:20
Hi Patrick,
On Tuesday, November 4, 2003, 4:46:23 PM, you wrote:
ppln> But could you (or some parse guru) explain a bit.
ppln> I don't understand the second part of the any rule (1 1 value).
If you want to match an given integer (like for example 2) in a
block, you can't just use:
parse [2] [2]
because the integer in the rule is interpreted as a repetition,
like in:
parse "aa" [2 "a"]
so you have to use:
parse [2] [1 1 2]
So 1 1 value matches the integer referred to by 'value.
Regards,
Gabriele.
--
Gabriele Santilli <[g--santilli--tiscalinet--it]> -- REBOL Programmer
Amiga Group Italia sez. L'Aquila --- SOON: http://www.rebol.it/
[7/19] from: joel:neely:fedex at: 4-Nov-2003 11:32
Hi, Sunanda,
Nice little problem! (And, I admit to being more interested in the
algorithm design aspect than the REBOL/PARSE aspect...)
[SunandaDH--aol--com] wrote:
> The problem is this......You have a block containing integers, like:
>
> sparse-array: [ 12 13 14 15 16 17
...
> ]
>
> There are runs of consecutive, ascending integers: 12--17, 7--8 etc.
> We want to find the location in the block of the first longest,
> such sequence.
>
> The answer for this dataset is 20--26. 101--107 is as long, but
> isn't the first. 59--52 is longer, but it is descending.
>
This is a great little problem for comparing two approaches that handle
state differently: classic stateful structured programming massages
variables to maintain state, while the Jackson Structured Methodology
maps data structure to algorithm structure as much as possible.
Incidentally, I'm extending the spec to include the requirement that
we should get a zero-length run if the array is empty...
The "classic" stateful approach says "We have an array; process it one
element at a time, and keep track of whether each new element extends
a run or begins a new one." This normally gets a little tricky when
we have to address boundary conditions (e.g. what do we compare the
first element against, as it has no predecessor?) Here's a standard
design based on that strategy:
find-run-stateful: func [
data [block!]
/local maxpos maxlen seqpos seqlen prev curr
][
maxpos: maxlen: 0 ; initially no runs
seqpos: seqlen: 0 ; imaginary first run
prev: data/1 ; pretend first is...
; ... its own previous
for pos 1 length? data 1 [ ; count across array
curr: data/:pos ; get new current value
either curr - 1 = prev [ ; if current extends run...
seqlen: seqlen + 1 ; ... just bump length
][ ; ... otherwise
if seqlen > maxlen [ ; if last run was new best...
maxpos: seqpos ; ... remember position
maxlen: seqlen ; ... and length
]
seqpos: pos ; start new run here...
seqlen: 1 ; ... with length of 1
]
prev: curr ; remember for comparison
]
; we can have a run underway
; when falling off the end
if seqlen > maxlen [ ; so we must check its length
maxpos: seqpos ; to see if it was the best
maxlen: seqlen ; run in the array
]
reduce [maxpos maxlen]
]
The tricky bits in this design are:
- initializing the imaginary "previous run" before the first
element of the block, in such a way that it won't be extended
by the first element, and it won't look like a better run than
any that we actually find;
- remembering to check the run which goes off the end of the block.
Notice that the above design just assumes that the block is an iteration
of numbers, and so has to figure out what to do with each number
(including the first!). It maintains two invariants:
- SEQPOS and SEQLEN represent the run in progress which each element
either extends or terminates (by starting a new one);
- MAXPOS and MAXLEN represent the longest *completed* run yet seen.
The Jackson method says instead, "The argument block is an iteration
of runs, and a run is an iteration of consecutive numbers" and so has
two loops: the outer one iterates across runs in the block, and the
inner one iterates across numbers in a single run.
find-run-Jackson: func [
data [block!]
/local maxpos maxlen seqpos seqlen prev curr
][
maxpos: maxlen: 0 ; initially no runs
while [not empty? data] [ ; process another run...
seqpos: index? data ; ... starting here
seqlen: 0 ; ... initially empty
prev: none ; ... no previous value
while [ ; same run as long as...
all [
not empty? data ; ... more data and
any [
(curr: data/1) - 1 ; ... current value
= prev ; ... extends this run, or
zero? seqlen ; ... this is a new run
]
]
][ ; same run, so...
prev: curr ; ... save value to compare
seqlen: seqlen + 1 ; ... extend length
data: next data ; ... shift the data
]
if seqlen > maxlen [ ; if that run is new best...
maxpos: seqpos ; ... remember position
maxlen: seqlen ; ... and length
]
]
reduce [maxpos maxlen]
]
Notice that now there's a single, obvious place for the test for new
best run -- right after the inner loop across a run's numbers! The
condition on the inner loop has to satisfy the outer loops condition
first, and then add its own criteria (which uses ZERO? SEQLEN to mean
that we have a new run, and therefore don't care whether the previous
value was one less than the current one).
To see the value of this structural change, imagine that we now need
to return a triplet of numbers; in addition to the position and length
of the longest consecutive run, also return the count of how many runs
of that length existed. Where would that code go in the Jackson
version? Where would it have to appear in the other one?
-jn-
--
----------------------------------------------------------------------
Joel Neely joelDOTneelyATfedexDOTcom 901-263-4446
Enron Accountingg in a Nutshell: 1c=$0.01=($0.10)**2=(10c)**2=100c=$1
[8/19] from: patrick:philipot:laposte at: 4-Nov-2003 18:44
Thanks Gabriele,
So to parse [1 2 3] one has to write ...
>> parse [1 2 3][1 1 1 1 1 2 1 1 3]
== true
Quite odd, isn't it ?
Regards
Patrick
[9/19] from: SunandaDH:aol at: 5-Nov-2003 2:28
Thanks to (in order of message receipt at my end) Gabriele, Gabriele, Romano,
Ladislav, and Joel for showing me better approaches.
Parse is so powerful, I should try to learn to use it for more things that
just breaking strings apart.
Joel:
> This is a great little problem for comparing two approaches that handle
> state differently: classic stateful structured programming massages
> variables to maintain state, while the Jackson Structured Methodology
> maps data structure to algorithm structure as much as possible.
Interesting discussion!
I suspect most of us start out "procedural" -- trying to apply processes to
what we see as static data structures. I admit I never was a great fan of JSP,
but the insight that (in effect) the procedures can fall naturally out of an
analysis of the data structures can lead to some very elegant solutions.
Of course there are situations where neither approach works, and then you
need other approaches too. Object-orientation is one claim to the next step as it
merges both approaches. I never quite saw the point there, either, but it
does work in some areas.
And then there are situations that seem completely off the deterministic
axis. Consider this progression:
-- Find the sum of the numbers (simply almost any way: write a loop; use a
map function, etc)
-- Find the longest run with the highest end value (101--107 in the original
data)
-- Find the run (as defined in the original problem) whose run length is the
modal length of runs
--Find the two runs that are most nearly similar (contain mostly the same
numbers)
-- The numbers are RGB values from a JPG. Find the average color tone of the
object the female nearest the camera is looking at. Break out the neural
networks for this one!
In my actual case, the patterns I was looking for grew more complex, so I
reasoned that the requirement was becoming unreasonable as the preceding data
structures weren't designed for easy sifting. Luckily, I could rewrite history
(not always a luxury in this industry) and simplified the problem almost out of
existence.
Finally, this is getting a bit long, but just to stick up for the "finite
machine state" approach I took in the original code. It isn't necessary to
duplicate tests as my hacked-out original did -- see new improved (but already
redundant) model below. But it is, as Joel notes, tricky to handle boundary
conditions -- note the non-obvious way that largest-run-size is initialised.
Thanks again everyone,
Sunanda
sparse-array: [ 12 13 14 15 16 17
7 8
20 21 22 23 24 25 26
19
59 58 57 56 55 54 53 52
20 21 22 23
101 102 103 104 105 106 107
]
;;=============================
find-longest-run: func [data-array [block!]
/local current-state
current-location
run-size
largest-run-size
largest-run-start
]
[
current-state: 0
current-location: data-array
run-size: 0
largest-run-size: minimum 1 length? data-array ;; in case there is exactly
one element
largest-run-start: data-array ;; in case there are zero data items
while [not tail? current-location]
[
switch current-state
[
;; State 0: start of a new run
;; ---------------------------
0 [
run-start: current-location
run-size: 1
current-location: next current-location
current-state: 1
] ;; end state 0
;; State 1: next item
;; ------------------
1 [
either run-start/1 + run-size = current-location/1
[
;; The run continues
;; -----------------
run-size: run-size + 1
current-location: next current-location
;; Update largest, if it's biggest so far
;; --------------------------------------
if run-size > largest-run-size
[largest-run-size: run-size
largest-run-start: run-start
]
]
[
current-state: 0 ;; the run has ended -- go start another one
]
] ;; end state 1
] ;; switch
] ;; forever
return reduce [largest-run-size largest-run-start]
] ;; func
;; Run some tests
;; --------------
set [run-size run-start] find-longest-run sparse-array
print ["Length:" run-size "Looks like:" copy/part run-start run-size]
set [run-size run-start] find-longest-run [1 2 3 4 5 ] ;; just one run
print ["Length:" run-size "Looks like:" copy/part run-start run-size]
set [run-size run-start] find-longest-run [999] ;; just one data point
print ["Length:" run-size "Looks like:" copy/part run-start run-size]
set [run-size run-start] find-longest-run [] ;; empty dataset
print ["Length:" run-size "Looks like:" copy/part run-start run-size]
[10/19] from: nitsch-lists:netcologne at: 5-Nov-2003 14:45
Am Mittwoch, 5. November 2003 08:28 schrieb [SunandaDH--aol--com]:
> Thanks to (in order of message receipt at my end) Gabriele, Gabriele,
> Romano, Ladislav, and Joel for showing me better approaches.
>
> Parse is so powerful, I should try to learn to use it for more things that
> just breaking strings apart.
>
Closed? I am to late? I guess its like Joels classic modell:
sparse-array: [12 13 14 15 16 17 18
7 8
20 21 22 23 24 25 26
19
59 58 57 56 55 54 53 52
20 21 22 23
101 102 103 104 105 106 107
]
b: p: sparse-array ;b: begin of run
best-b: none best-l: 0
forall p [
set [n1 n2] p
if 1 + n1 <> n2 [;close run ;and if at end n2 = none ;)
if best-l < l: 1 + subtract index? p index? b [
best-b: b best-l: l
]
b: next p
]
]
? best-b ? best-l
probe copy/part best-b best-l
;Volker
[11/19] from: lmecir:mbox:vol:cz at: 5-Nov-2003 16:24
Hi Sunanda,
> ... but just to stick up for the "finite
> machine state" approach I took in the original code...
you don't need SWITCH for a Finite State Machine implementation in Rebol (as has been
discussed some time ago, Joel and others supplied nice examples), you can use functions
or code blocks and DO.
-Ladislav
[12/19] from: SunandaDH:aol at: 5-Nov-2003 13:20
Volker:
> Closed? I am to late? I guess its like Joels classic modell:
It's never too late for a coffee-break.
Thanks for the entry....I get the feeling that if I stare at it long enough,
it'll start to make sense; or maybe I've had too much coffee today :-)
Anyone wanting a bigger challenge (may need a whole pot of coffee), there is
an ongoing programming competition here:
http://members.aol.com/bitzenbeitz/Contests/
REBOL *ought* to be a good language for solving it -- and there's money as
prizes,
Sunanda.
[13/19] from: joel:neely:fedex at: 5-Nov-2003 13:37
Hi, Sunanda,
First, let me offer an improvement and then address some more of the
good discussion points you raise. (There's a little challenge at
the end for the interested who wade through to that point! ;-)
Minor issue first: instead of initializing run length to zero and
incrementing for each added element, just directly calculate the
length once the run is finished.
Now for the more interesting stuff...
Instead of saying that:
- a block is an iteration of runs, and
- a run is an iteration of numbers;
block --*-- run --*-- number
lets say that:
- a block is an iteration of runs, and
- a run is a first number, followed by any consecutive numbers.
+-- first number
|
block --*--+
|
+--*-- consecutive number
This (again) addresses a boundary issue; the first value in a run
plays a different role than the other numbers in the run. Tricky
setup of initial conditions so that the first value can be treated
just like all the others adds more complexity to the code.
That said, here's a version that implements that view:
where-j: func [
block [block!]
/local maxpos maxlen runpos runlen prev curr
][
maxpos: maxlen: 0 ; default empty run
while [not empty? block] [ ; more data = more runs
runpos: index? block ; start new run here
prev: block/1 ; remember first value
block: next block ; done with first
while [
prev + 1 = curr: block/1 ; extending the run
][
prev: curr ; save comparison value
block: next block ; move on
]
runlen: (index? block) - runpos ; now compute length
if runlen > maxlen [ ; update best run?
maxpos: runpos
maxlen: runlen
]
]
reduce [maxpos maxlen] ; return best run
]
[SunandaDH--aol--com] wrote:
> ... the insight that (in effect) the procedures can fall naturally out of an
> analysis of the data structures can lead to some very elegant solutions.
>
> Of course there are situations where neither approach works, and then you
> need other approaches too. Object-orientation is one claim to the next step as it
> merges both approaches. I never quite saw the point there, either, but it
> does work in some areas.
>
I'm not clear on what you mean by "both approaches"... I view JSP
(Jackson Structured Programming) simply as a specific type of
structured programming which has strong heuristics about which
structure(s) should drive the design.
> And then there are situations that seem completely off the
> deterministic axis.
>
??? Do you mean non-deterministic programming, or just programming
when the criteria are only vaguely stated?
> Consider this progression:
>
> -- Find the sum of the numbers (simply almost any way: write a loop;
> use a map function, etc)
>
Easy. Agreed.
> -- Find the longest run with the highest end value (101--107 in the
> original data)
>
An easy extension of the above program:
- add a local MAXTOP initialized to anything
- change the condition at the end from
runlen > maxlen
to
any [runlen > maxlen all [runlen = maxlen prev > maxtop]]
- change the "new best run" block to include
maxtop: prev
(For why this was easy, see below...)
> -- Find the run (as defined in the original problem) whose run length
> is the modal length of runs
>
First, we have to change "the run" to "a run", since the answer is no
longer unique. For the data
[10 8 9 6 7 4 5 0 1 2]
there are runs of length 1, 2, 2, 2, and 3, so the mode length is 2,
and there are three runs that have that length.
However, something more significant has happened. The previous change
to the problem (longest run with highest upper value) actually had THE
SAME STRUCTURE as the original problem: find the "best" run, where we
have a simple test for determining whether one run is "better" than
another. In other words, we can generate runs one at a time, and test
each one against the current "best" to see which one wins.
But in this last change, we now are asking for a run based on some
property/ies of the entire collection of runs, so we must generate (and
retain) all of them. We can (trivially) change the above function to
return all runs, then write a separate "pick a median" function to
operate on that collection (or we build the additional stuff into the
end of the modified function). In either case we now have a two-step
approach:
1) transform the offered block into a collection of runs;
2) do something with the collection of runs.
That's why the first change was easy and the second one was harder.
> --Find the two runs that are most nearly similar (contain mostly the
> same numbers)
>
I think I can paraphrase that as "find two runs whose intersection is
at least as long as any other intersection of distinct runs", which is
easy to solve, still using the two-step approach above.
This one is also not uniquely determined without more constraints; to
see why, consider
[1 2 3 4 5 2 3 4 5 6 3 4 5 6 7]
> -- The numbers are RGB values from a JPG. Find the average color tone
> of the object the female nearest the camera is looking at. Break out
> the neural networks for this one!
>
Now we've jumped from a programming problem to a specification problem!
The other tasks were well-defined (there could be no argument about
whether an answer is correct or not), but not so with this last one.
(I recently saw a documentary on TV in which a critical question was
whether a certain figure in a Renaissance painting was male or female.
The "experts" couldn't agree. ;-)
> In my actual case, the patterns I was looking for grew more complex,
> so I reasoned that the requirement was becoming unreasonable as the
> preceding data structures weren't designed for easy sifting. Luckily,
> I could rewrite history (not always a luxury in this industry) and
> simplified the problem almost out of existence.
>
My complements!!! Sometimes the best way to solve a problem is to
redifine the terms of discourse so that the problem simply disappears!
> Finally, this is getting a bit long, but just to stick up for the
> "finite machine state" approach I took in the original code...
>
I certainly use FSM-inspired design at times myself, but the nasty
part (IMHO) is always the transition between states. Not implementing
the transition (usually just updating a variable) but making it clear
to the eye/mind of the reader the pattern of transitions that will
actually occur.
In that regard, I suggest that the (improved) JSP-inspired version
above can be regarded as a FSM as well. It's just that each "state"
corresponds to a region of code, and the state transitions are simply
the standard flow-control!
In the updated FSM version, State 0 is *always* followed by State 1
(modulo falling off of the end) and a series of occurrences of State 1
are likewise followed by a State 0 (same caveat). The JSP version
makes that organization much more obvious (IMHO). Finally, there is
still considerable redundancy in the FSM version -- it's just in time
rather than in codespace! Notice that the "update largest" test is
done *for each element* of each run, and the sets for a maximum-length
run will be done for each element after the length of its nearest
competitor has been passed. The FSM code contains these expressions
only once, but evaluates them *MANY* more times than the equivalent
ones in the JSP version.
I'm not criticizing here, but pointing out the trade-offs (which just
happen to be particularly dramatic in this case! ;-) It is possible
to optimize for one measure (in this case, minimizing source code
redundancy) in such a way that it pessimizes another measure (the
number of times the code is -- redundantly -- evaluated).
Note that the JSP design heuristics don't minimize *ALL* aspects of
code redundancy; the NOT EMPTY? test is really in there twice!!!
Brownie points to anyone who can show why! ;-)
-jn-
[14/19] from: g:santilli:tiscalinet:it at: 6-Nov-2003 11:47
Hi Patrick,
On Tuesday, November 4, 2003, 6:44:28 PM, you wrote:
>>> parse [1 2 3][1 1 1 1 1 2 1 1 3]
ppln> == true
ppln> Quite odd, isn't it ?
Yes, but actually I never had to do something like that, so I
think it's not a big problem. Ladislav has proposed adding a
LITERAL keyword to PARSE to handle such a situation.
Regards,
Gabriele.
--
Gabriele Santilli <[g--santilli--tiscalinet--it]> -- REBOL Programmer
Amiga Group Italia sez. L'Aquila --- SOON: http://www.rebol.it/
[15/19] from: greggirwin:mindspring at: 6-Nov-2003 6:44
Hi Gabriele,
>>>> parse [1 2 3][1 1 1 1 1 2 1 1 3]
...
GS> Yes, but actually I never had to do something like that, so I
GS> think it's not a big problem. Ladislav has proposed adding a
GS> LITERAL keyword to PARSE to handle such a situation.
Some kind of escape in parse rules might be nice. I haven't needed it
either but, if you're dealing with a small set of special numbers in a
particular situation, we can at least work around it. e.g.
make-int-rule: func [val][append copy [1 1] val]
make-int-rules: func [values [block!]][
foreach val values [
set to word! join '` val make-int-rule val
]
]
make-int-rules [2 27 94]
; I chose ` so it looks sort of like a lit-number! format. :)
print parse [2 27 94] [`2 `27 `94]
-- Gregg
[16/19] from: SunandaDH:aol at: 7-Nov-2003 10:28
Hi Joel,
Thanks for some fascinating discussion points. I've only got time to reply to
a couple.
If think we'd pretty much agree that a problem solution depends on the
structure of the problem. But we might disagree in specific cases about the best
approach to take.
> I'm not clear on what you mean by "both approaches"... I view JSP
> (Jackson Structured Programming) simply as a specific type of
> structured programming which has strong heuristics about which
> structure(s) should drive the design.
Sorry. I was being vague. I was distinguishing a process-oriented approach
(what do I need to do?) from a data-oriented approach (what data structures do
I need to work on?)
Your point is that it is usually best to start with the data structures. I'd
agree. But it isn't always that simple. And starting with processes can (and
probably does) lead to quite characteristically different solutions. Which is
best
is often subjective -- or "proved" by the passage of time and the
ravages that maintenance makes to the original solution.
A sort of analogous example.....
I went to London a couple of weeks ago. I could describe that trip in many
different ways. Here's three:
1. In terms of processes (I used a folding bicycle and a train).
2. In terms of waypoints ("data"): Home --> Station --> Station -->
Destination
3. In terms of purpose (to discuss an organization's global internet
strategy).
I could have used the same processes if some of the intermediate waypoints
had been different (Home --> Station can be via a canal towpath, a cycle track,
or duking it out with the traffic; there is more than one train route to
London).
But I would have needed to use different processes (taxi, car, plane) if the
waypoints had been slightly different (via New York for example).
And lots of things would have had to change if the purpose had been different
(say I was delivering a sofa).
Back to IT design, it may be better, in some circumstances, to start with
why am I doing this?
rather than "what am I doing it to?" or "what am I doing
it with?".
> > -- The numbers are RGB values from a JPG. Find the average color tone
> > of the object the female nearest the camera is looking at. Break out
<<quoted lines omitted: 5>>
> whether a certain figure in a Renaissance painting was male or female.
> The "experts" couldn't agree. ;-)
I'm not so sure I completely agree. Two reasons.
First, taking a purely procedural approach, I can start a solution to this
problem:
females-in-image: copy []
foreach item my-image
[item-type: analyze-item item
if item-type/female [append females-in-image item]
]
This highlights both the strength and the weakness of a top-down procedural
approach.....
I can easily map out the code at the highest levels. But need faith that I
will be able to fill out the lower levels. Today, analyze-item isn't going to be
easy. In a few years time, such functions might come free in cornflakes
packets.
My second reason is that it is *always* a matter of interpretation, even with
the original "well-defined" task.
The actual original task was to find the best why to describe the differences
between two version of the same file. There is a lot of subjectivity there.
Examples:
orginal-file: ["A" "B" "C"]
new-file: ["B" "C" "A"]
Topographically, of course, these are identical. But that isn't the answer
that is going to get the program signed off.
The "simplest" description of the differences, in terms of how to transform
one into the other. is:
-- Delete ["A" "B" "C"]
-- Insert ["B" "C" "A"]
That's not going to win any prizes either, although it would be "best" if the
files had been:
orginal-file: ["A" "B" "C"]
new-file: ["X" "Y" "Z"]
An acceptable solution might be:
--Insert ["B" "C"] at start
--Delete ["B" "C"] at end
A "better" solution might be:
--Delete ["A"] at start
--Insert ["A"] at end
But "best" and "better" depend on the resources available. In this case, they
are a little restricted:
** REBOL has a limited recursion stack (2000-odd items) so an off-the-shelf
solution using recursion is out of bounds.
** Even unravelling the algorithms to remove the recursion doesn't help --
take a look at GNU's Diff.c for example. It's full of dire warnings about the
algorithm being "to expensive" in some conditions. We wanted something that was
reasonably deterministic (timing scaling according to the size of files, not
the number of differences).
That led to a serious of hacks (I'm not proud of the design) that takes a
series of linear passes at the two files with a few sorts. It works, I'm happy.
But we can still argue over the "best" way to describe some changes. Is:
orginal-file: ["A" "A" "A"]
new-file: ["A" "A" "A" "A"]
an insertion at the beginning? At the end? An intermediate point? Or some
combination of moves, deletions and insertions?
So the original, apparently "well-defined", task emerged from an interplay of
the algorithm design (which was constrained by current limitations in REBOL),
combined with a subjective decision to use REBOL rather than Python or C) and
a subjective debate on the "best" way to chose between alternative
presentations of the results.
The experts finally had to agree in this case, but we're still mumbling over
some of the fine details. Even the most seemingly objective task design has
hidden subjective depths.
Thanks again for the stimulation. Now I really do need a coffee break,
Sunanda.
[17/19] from: joel:neely:fedex at: 7-Nov-2003 10:44
Hi, Sunanda,
[SunandaDH--aol--com] wrote:
> Your point is that it is usually best to start with the data
> structures. I'd agree. But it isn't always that simple.
>
I agree. Sorry for not being more precise. What I should have
said (adding the omitted conditions/details) was:
For problems where the input/output (or argument/result)
data structures are already defined, it is usually very
helpful in design, implementation, and maintenance to
use the I/O (a/r) structures as much as possible as guides
for the algorithm structure.
This approach usually helps minimize redundant code, gives
unambiguous guidance to where each part of the code should
be placed in the algorithm, and minimizes the risk of bugs
arising from accidental mismatches between the flow of the
algorithm and the flow of the data.
Of course, in cases where the nature of the data are somewhat
up in the air
(e.g. the problem is more vaguely specified,
or the designer is given latitude to choose data/representation
structures) there's clearly not so much heuristic guidance.
Also, if the structure of the data changes, it may imply
significant rework of the program.
> The actual original task was to find the best why to describe
> the differences between two version of the same file. There is
> a lot of subjectivity there.
>
That's exactly what I meant by not "well-defined". I don't mean
that as a negative description, but simply as an indication that
there may be a period of more "exploratory programming" to try
different ideas before choosing one as the basis for final design
and implementation (or that the program may very well simply
evolve, as various ideas/heuristics are added and tweaked).
I suggest that in such a case, there's benefit from a program
structure that makes it easy to figure out where to put such
heuristics (and where to find them when its time to change or
delete them;-)
> But "best" and "better" depend on the resources available.
> In this case, they are a little restricted:
>
Again, we'll certainly agree that the juggling and comprimises
made when shoehorning a 10-pound algorithm into a 5-pound
interpreter are at least as much art as science! ;-)
--
----------------------------------------------------------------------
Joel Neely joelDOTneelyATfedexDOTcom 901-263-4446
Enron Accountingg in a Nutshell: 1c=$0.01=($0.10)**2=(10c)**2=100c=$1
[18/19] from: antonr:iinet:au at: 8-Nov-2003 15:04
Hi Sunanda,
> I can easily map out the code at the highest levels. But need
> faith that I
> will be able to fill out the lower levels. Today, analyze-item
> isn't going to be
> easy. In a few years time, such functions might come free in cornflakes
> packets.
...
Very good points, Sunanda. Pragmatism in getting the job done
can wreck an idealism.
> ** REBOL has a limited recursion stack (2000-odd items) so an
> off-the-shelf
> solution using recursion is out of bounds.
Just a note:
You can implement your own stack to have a much larger size than 2000.
> Sunanda.
Anton.
[19/19] from: SunandaDH:aol at: 8-Nov-2003 10:58
Hi Joel,
> Of course, in cases where the nature of the data are somewhat
> "up in the air" (e.g. the problem is more vaguely specified,
> or the designer is given latitude to choose data/representation
> structures) there's clearly not so much heuristic guidance.
But why do I only ever get such problems!?
> Also, if the structure of the data changes, it may imply
> significant rework of the program.
That's a crucial point. You can be utterly 100% confident that anything that
involves humans will change and usually completely unpredictably. The changes
can hit any part of a system, and cumulatively -- if not instantly -- can have
devasting effects.
I don't think any methodology addresses the need for continual, random change.
> Again, we'll certainly agree that the juggling and comprimises
> made when shoehorning a 10-pound algorithm into a 5-pound
> interpreter are at least as much art as science! ;-)
Yes, well that makes it more fun!
Sunanda.
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted