Coffee break problem anyone?

[1/19] from: SunandaDH::aol::com at: 4-Nov-2003 7:02

Here's a little problem I needed an answer to yesterday. As I needed the answer yesterday, I hacked together some code. But there has got to be a more elegant way, surely. Maybe using parse on a block? The problem is this......You have a block containing integers, like this: sparse-array: [ 12 13 14 15 16 17 7 8 20 21 22 23 24 25 26 19 59 58 57 56 55 54 53 52 20 21 22 23 101 102 103 104 105 106 107 ] They are not in ascending order, and there are some numbers duplicated. But there are runs of consecutive, ascending integers: 12--17, 7--8 etc. We want to find the location in the block of the first longest, such sequence. The answer for this dataset is 20--26. 101--107 is as long, but isn't the first. 59--52 is longer, but it is descending. My solution is below. Someone can do better, surely! Thanks, Sunanda ;;============================= find-longest-run: func [data-array [block!] /local current-state current-location run-size largest-run-size largest-run-start ] [ current-state: 0 current-location: data-array run-size: 0 largest-run-size: 0 largest-run-start: data-array ;; in case there are zero data items Forever [if tail? current-location [if run-size > largest-run-size [largest-run-size: run-size largest-run-start: run-start ] break ;; we're done ] switch current-state [ ;; State 0: start of a new run ;; --------------------------- 0 [ run-start: current-location run-size: 1 current-location: next current-location current-state: 1 ] ;; end state 0 ;; State 1: next item ;; ------------------ 1 [ either (1 + first back current-location) = current-location/1 [ ;; The run continues ;; ----------------- run-size: run-size + 1 current-location: next current-location ] [ ;; Run's ended: check if it was the largest so far ;; ----------------------------------------------- if run-size > largest-run-size [largest-run-size: run-size largest-run-start: run-start ] current-state: 0 ] ] ;; end state 1 ] ;; switch ] ;; forever return reduce [largest-run-size largest-run-start] ] ;; func ;; Run some tests ;; -------------- set [run-size run-start] find-longest-run sparse-array print ["Length:" run-size "Looks like:" copy/part run-start run-size] set [run-size run-start] find-longest-run [1 2 3 4 5 ] ;; just one run print ["Length:" run-size "Looks like:" copy/part run-start run-size] set [run-size run-start] find-longest-run [999] ;; just one data point print ["Length:" run-size "Looks like:" copy/part run-start run-size] set [run-size run-start] find-longest-run [] ;; empty dataset print ["Length:" run-size "Looks like:" copy/part run-start run-size]

[2/19] from: g:santilli:tiscalinet:it at: 4-Nov-2003 14:05

Hi Sunanda, On Tuesday, November 4, 2003, 1:02:28 PM, you wrote: Sac> As I needed the answer yesterday, I hacked together some code. But there has Sac> got to be a more elegant way, surely. Maybe using parse on a block? With PARSE, a solution could be: block: [ 12 13 14 15 16 17 7 8 20 21 22 23 24 25 26 19 59 58 57 56 55 54 53 52 20 21 22 23 101 102 103 104 105 106 107 ] results: [] parse block [ some [ start: set value integer! any [ (value: value + 1) 1 1 value ] finish: (insert/only results reduce [negate offset? start finish index? start copy/part start finish]) ] ] foreach result sort results [ print ["Sequence:" result/3 "(len" negate result/1 "pos" result/2 ")"] ] This is less efficient because it is using SORT to find the best result, however has the advantage of giving the other results too. The sort can be avoided, anyway. Regards, Gabriele. -- Gabriele Santilli <[g--santilli--tiscalinet--it]> -- REBOL Programmer Amiga Group Italia sez. L'Aquila --- SOON: http://www.rebol.it/

[3/19] from: rotenca:telvia:it at: 4-Nov-2003 16:42

This is my try with parse: find-longest-run: func [data-array [block!] /local x next-x start res len val h test][ start: data-array res: reduce [0 data-array] test: [ res/1 < len: offset? start h insert/only insert clear res len start ] parse start [ set x number! (next-x: x + 1) any [ h: [ set x 1 1 next-x | set x number! (all test start: h) ] (next-x: x + 1) ] (all test) ] res ] --- Ciao Romano

[4/19] from: patrick:philipot:laposte at: 4-Nov-2003 16:46

Hi Gabriele, Wooaooh, brilliant ! But could you (or some parse guru) explain a bit. I don't understand the second part of the any rule (1 1 value). Regards Patrick

[5/19] from: lmecir:mbox:vol:cz at: 4-Nov-2003 16:58

Hi Sunanda, here is my tea break result: find-longest-run: function [ [catch] data-array [block!] ] [ longest-run-size longest-run-start size position number ] [ if empty? data-array [throw make error! "a non-empty array expected"] longest-run-start: position: back tail data-array longest-run-size: size: 1 while [not head? position] [ if not number? first position [throw make error! "an integer array expected"] number: first position position: back position either (first position) = (number - 1) [ size: size + 1 if size >= longest-run-size [ longest-run-size: size longest-run-start: position ] ] [size: 1] ] reduce [longest-run-size longest-run-start] ] -L ----- Original Message ----- > Here's a little problem I needed an answer to yesterday.

[6/19] from: g:santilli:tiscalinet:it at: 4-Nov-2003 17:20

Hi Patrick, On Tuesday, November 4, 2003, 4:46:23 PM, you wrote: ppln> But could you (or some parse guru) explain a bit. ppln> I don't understand the second part of the any rule (1 1 value). If you want to match an given integer (like for example 2) in a block, you can't just use: parse [2] [2] because the integer in the rule is interpreted as a repetition, like in: parse "aa" [2 "a"] so you have to use: parse [2] [1 1 2] So 1 1 value matches the integer referred to by 'value. Regards, Gabriele. -- Gabriele Santilli <[g--santilli--tiscalinet--it]> -- REBOL Programmer Amiga Group Italia sez. L'Aquila --- SOON: http://www.rebol.it/

[7/19] from: joel:neely:fedex at: 4-Nov-2003 11:32

Hi, Sunanda, Nice little problem! (And, I admit to being more interested in the algorithm design aspect than the REBOL/PARSE aspect...) [SunandaDH--aol--com] wrote:

> The problem is this......You have a block containing integers, like: > > sparse-array: [ 12 13 14 15 16 17

...

> ] > > There are runs of consecutive, ascending integers: 12--17, 7--8 etc. > We want to find the location in the block of the first longest, > such sequence. > > The answer for this dataset is 20--26. 101--107 is as long, but > isn't the first. 59--52 is longer, but it is descending. >

This is a great little problem for comparing two approaches that handle state differently: classic stateful structured programming massages variables to maintain state, while the Jackson Structured Methodology maps data structure to algorithm structure as much as possible. Incidentally, I'm extending the spec to include the requirement that we should get a zero-length run if the array is empty... The "classic" stateful approach says "We have an array; process it one element at a time, and keep track of whether each new element extends a run or begins a new one." This normally gets a little tricky when we have to address boundary conditions (e.g. what do we compare the first element against, as it has no predecessor?) Here's a standard design based on that strategy: find-run-stateful: func [ data [block!] /local maxpos maxlen seqpos seqlen prev curr ][ maxpos: maxlen: 0 ; initially no runs seqpos: seqlen: 0 ; imaginary first run prev: data/1 ; pretend first is... ; ... its own previous for pos 1 length? data 1 [ ; count across array curr: data/:pos ; get new current value either curr - 1 = prev [ ; if current extends run... seqlen: seqlen + 1 ; ... just bump length ][ ; ... otherwise if seqlen > maxlen [ ; if last run was new best... maxpos: seqpos ; ... remember position maxlen: seqlen ; ... and length ] seqpos: pos ; start new run here... seqlen: 1 ; ... with length of 1 ] prev: curr ; remember for comparison ] ; we can have a run underway ; when falling off the end if seqlen > maxlen [ ; so we must check its length maxpos: seqpos ; to see if it was the best maxlen: seqlen ; run in the array ] reduce [maxpos maxlen] ] The tricky bits in this design are: - initializing the imaginary "previous run" before the first element of the block, in such a way that it won't be extended by the first element, and it won't look like a better run than any that we actually find; - remembering to check the run which goes off the end of the block. Notice that the above design just assumes that the block is an iteration of numbers, and so has to figure out what to do with each number (including the first!). It maintains two invariants: - SEQPOS and SEQLEN represent the run in progress which each element either extends or terminates (by starting a new one); - MAXPOS and MAXLEN represent the longest *completed* run yet seen. The Jackson method says instead, "The argument block is an iteration of runs, and a run is an iteration of consecutive numbers" and so has two loops: the outer one iterates across runs in the block, and the inner one iterates across numbers in a single run. find-run-Jackson: func [ data [block!] /local maxpos maxlen seqpos seqlen prev curr ][ maxpos: maxlen: 0 ; initially no runs while [not empty? data] [ ; process another run... seqpos: index? data ; ... starting here seqlen: 0 ; ... initially empty prev: none ; ... no previous value while [ ; same run as long as... all [ not empty? data ; ... more data and any [ (curr: data/1) - 1 ; ... current value = prev ; ... extends this run, or zero? seqlen ; ... this is a new run ] ] ][ ; same run, so... prev: curr ; ... save value to compare seqlen: seqlen + 1 ; ... extend length data: next data ; ... shift the data ] if seqlen > maxlen [ ; if that run is new best... maxpos: seqpos ; ... remember position maxlen: seqlen ; ... and length ] ] reduce [maxpos maxlen] ] Notice that now there's a single, obvious place for the test for new best run -- right after the inner loop across a run's numbers! The condition on the inner loop has to satisfy the outer loops condition first, and then add its own criteria (which uses ZERO? SEQLEN to mean that we have a new run, and therefore don't care whether the previous value was one less than the current one). To see the value of this structural change, imagine that we now need to return a triplet of numbers; in addition to the position and length of the longest consecutive run, also return the count of how many runs of that length existed. Where would that code go in the Jackson version? Where would it have to appear in the other one? -jn- -- ---------------------------------------------------------------------- Joel Neely joelDOTneelyATfedexDOTcom 901-263-4446 Enron Accountingg in a Nutshell: 1c=$0.01=($0.10)**2=(10c)**2=100c=$1

[8/19] from: patrick:philipot:laposte at: 4-Nov-2003 18:44

Thanks Gabriele, So to parse [1 2 3] one has to write ...

>> parse [1 2 3][1 1 1 1 1 2 1 1 3]

== true Quite odd, isn't it ? Regards Patrick

[9/19] from: SunandaDH:aol at: 5-Nov-2003 2:28

Thanks to (in order of message receipt at my end) Gabriele, Gabriele, Romano, Ladislav, and Joel for showing me better approaches. Parse is so powerful, I should try to learn to use it for more things that just breaking strings apart. Joel:

> This is a great little problem for comparing two approaches that handle > state differently: classic stateful structured programming massages > variables to maintain state, while the Jackson Structured Methodology > maps data structure to algorithm structure as much as possible.

Interesting discussion! I suspect most of us start out "procedural" -- trying to apply processes to what we see as static data structures. I admit I never was a great fan of JSP, but the insight that (in effect) the procedures can fall naturally out of an analysis of the data structures can lead to some very elegant solutions. Of course there are situations where neither approach works, and then you need other approaches too. Object-orientation is one claim to the next step as it merges both approaches. I never quite saw the point there, either, but it does work in some areas. And then there are situations that seem completely off the deterministic axis. Consider this progression: -- Find the sum of the numbers (simply almost any way: write a loop; use a map function, etc) -- Find the longest run with the highest end value (101--107 in the original data) -- Find the run (as defined in the original problem) whose run length is the modal length of runs --Find the two runs that are most nearly similar (contain mostly the same numbers) -- The numbers are RGB values from a JPG. Find the average color tone of the object the female nearest the camera is looking at. Break out the neural networks for this one! In my actual case, the patterns I was looking for grew more complex, so I reasoned that the requirement was becoming unreasonable as the preceding data structures weren't designed for easy sifting. Luckily, I could rewrite history (not always a luxury in this industry) and simplified the problem almost out of existence. Finally, this is getting a bit long, but just to stick up for the "finite machine state" approach I took in the original code. It isn't necessary to duplicate tests as my hacked-out original did -- see new improved (but already redundant) model below. But it is, as Joel notes, tricky to handle boundary conditions -- note the non-obvious way that largest-run-size is initialised. Thanks again everyone, Sunanda sparse-array: [ 12 13 14 15 16 17 7 8 20 21 22 23 24 25 26 19 59 58 57 56 55 54 53 52 20 21 22 23 101 102 103 104 105 106 107 ] ;;============================= find-longest-run: func [data-array [block!] /local current-state current-location run-size largest-run-size largest-run-start ] [ current-state: 0 current-location: data-array run-size: 0 largest-run-size: minimum 1 length? data-array ;; in case there is exactly one element largest-run-start: data-array ;; in case there are zero data items while [not tail? current-location] [ switch current-state [ ;; State 0: start of a new run ;; --------------------------- 0 [ run-start: current-location run-size: 1 current-location: next current-location current-state: 1 ] ;; end state 0 ;; State 1: next item ;; ------------------ 1 [ either run-start/1 + run-size = current-location/1 [ ;; The run continues ;; ----------------- run-size: run-size + 1 current-location: next current-location ;; Update largest, if it's biggest so far ;; -------------------------------------- if run-size > largest-run-size [largest-run-size: run-size largest-run-start: run-start ] ] [ current-state: 0 ;; the run has ended -- go start another one ] ] ;; end state 1 ] ;; switch ] ;; forever return reduce [largest-run-size largest-run-start] ] ;; func ;; Run some tests ;; -------------- set [run-size run-start] find-longest-run sparse-array print ["Length:" run-size "Looks like:" copy/part run-start run-size] set [run-size run-start] find-longest-run [1 2 3 4 5 ] ;; just one run print ["Length:" run-size "Looks like:" copy/part run-start run-size] set [run-size run-start] find-longest-run [999] ;; just one data point print ["Length:" run-size "Looks like:" copy/part run-start run-size] set [run-size run-start] find-longest-run [] ;; empty dataset print ["Length:" run-size "Looks like:" copy/part run-start run-size]

[10/19] from: nitsch-lists:netcologne at: 5-Nov-2003 14:45

Am Mittwoch, 5. November 2003 08:28 schrieb [SunandaDH--aol--com]:

> Thanks to (in order of message receipt at my end) Gabriele, Gabriele, > Romano, Ladislav, and Joel for showing me better approaches. > > Parse is so powerful, I should try to learn to use it for more things that > just breaking strings apart. >

Closed? I am to late? I guess its like Joels classic modell: sparse-array: [12 13 14 15 16 17 18 7 8 20 21 22 23 24 25 26 19 59 58 57 56 55 54 53 52 20 21 22 23 101 102 103 104 105 106 107 ] b: p: sparse-array ;b: begin of run best-b: none best-l: 0 forall p [ set [n1 n2] p if 1 + n1 <> n2 [;close run ;and if at end n2 = none ;) if best-l < l: 1 + subtract index? p index? b [ best-b: b best-l: l ] b: next p ] ] ? best-b ? best-l probe copy/part best-b best-l ;Volker

[11/19] from: lmecir:mbox:vol:cz at: 5-Nov-2003 16:24

Hi Sunanda,

> ... but just to stick up for the "finite > machine state" approach I took in the original code...

you don't need SWITCH for a Finite State Machine implementation in Rebol (as has been discussed some time ago, Joel and others supplied nice examples), you can use functions or code blocks and DO. -Ladislav

[12/19] from: SunandaDH:aol at: 5-Nov-2003 13:20

Volker:

> Closed? I am to late? I guess its like Joels classic modell:

It's never too late for a coffee-break. Thanks for the entry....I get the feeling that if I stare at it long enough, it'll start to make sense; or maybe I've had too much coffee today :-) Anyone wanting a bigger challenge (may need a whole pot of coffee), there is an ongoing programming competition here: http://members.aol.com/bitzenbeitz/Contests/ REBOL *ought* to be a good language for solving it -- and there's money as prizes, Sunanda.

[13/19] from: joel:neely:fedex at: 5-Nov-2003 13:37

Hi, Sunanda, First, let me offer an improvement and then address some more of the good discussion points you raise. (There's a little challenge at the end for the interested who wade through to that point! ;-) Minor issue first: instead of initializing run length to zero and incrementing for each added element, just directly calculate the length once the run is finished. Now for the more interesting stuff... Instead of saying that: - a block is an iteration of runs, and - a run is an iteration of numbers; block --*-- run --*-- number lets say that: - a block is an iteration of runs, and - a run is a first number, followed by any consecutive numbers. +-- first number | block --*--+ | +--*-- consecutive number This (again) addresses a boundary issue; the first value in a run plays a different role than the other numbers in the run. Tricky setup of initial conditions so that the first value can be treated just like all the others adds more complexity to the code. That said, here's a version that implements that view: where-j: func [ block [block!] /local maxpos maxlen runpos runlen prev curr ][ maxpos: maxlen: 0 ; default empty run while [not empty? block] [ ; more data = more runs runpos: index? block ; start new run here prev: block/1 ; remember first value block: next block ; done with first while [ prev + 1 = curr: block/1 ; extending the run ][ prev: curr ; save comparison value block: next block ; move on ] runlen: (index? block) - runpos ; now compute length if runlen > maxlen [ ; update best run? maxpos: runpos maxlen: runlen ] ] reduce [maxpos maxlen] ; return best run ] [SunandaDH--aol--com] wrote:

> ... the insight that (in effect) the procedures can fall naturally out of an > analysis of the data structures can lead to some very elegant solutions. > > Of course there are situations where neither approach works, and then you > need other approaches too. Object-orientation is one claim to the next step as it > merges both approaches. I never quite saw the point there, either, but it > does work in some areas. >

I'm not clear on what you mean by "both approaches"... I view JSP (Jackson Structured Programming) simply as a specific type of structured programming which has strong heuristics about which structure(s) should drive the design.

> And then there are situations that seem completely off the > deterministic axis. >

??? Do you mean non-deterministic programming, or just programming when the criteria are only vaguely stated?

> Consider this progression: > > -- Find the sum of the numbers (simply almost any way: write a loop; > use a map function, etc) >

Easy. Agreed.

> -- Find the longest run with the highest end value (101--107 in the > original data) >

An easy extension of the above program: - add a local MAXTOP initialized to anything - change the condition at the end from runlen > maxlen to any [runlen > maxlen all [runlen = maxlen prev > maxtop]] - change the "new best run" block to include maxtop: prev (For why this was easy, see below...)

> -- Find the run (as defined in the original problem) whose run length > is the modal length of runs >

First, we have to change "the run" to "a run", since the answer is no longer unique. For the data [10 8 9 6 7 4 5 0 1 2] there are runs of length 1, 2, 2, 2, and 3, so the mode length is 2, and there are three runs that have that length. However, something more significant has happened. The previous change to the problem (longest run with highest upper value) actually had THE SAME STRUCTURE as the original problem: find the "best" run, where we have a simple test for determining whether one run is "better" than another. In other words, we can generate runs one at a time, and test each one against the current "best" to see which one wins. But in this last change, we now are asking for a run based on some property/ies of the entire collection of runs, so we must generate (and retain) all of them. We can (trivially) change the above function to return all runs, then write a separate "pick a median" function to operate on that collection (or we build the additional stuff into the end of the modified function). In either case we now have a two-step approach: 1) transform the offered block into a collection of runs; 2) do something with the collection of runs. That's why the first change was easy and the second one was harder.

> --Find the two runs that are most nearly similar (contain mostly the > same numbers) >

I think I can paraphrase that as "find two runs whose intersection is at least as long as any other intersection of distinct runs", which is easy to solve, still using the two-step approach above. This one is also not uniquely determined without more constraints; to see why, consider [1 2 3 4 5 2 3 4 5 6 3 4 5 6 7]

> -- The numbers are RGB values from a JPG. Find the average color tone > of the object the female nearest the camera is looking at. Break out > the neural networks for this one! >

Now we've jumped from a programming problem to a specification problem! The other tasks were well-defined (there could be no argument about whether an answer is correct or not), but not so with this last one. (I recently saw a documentary on TV in which a critical question was whether a certain figure in a Renaissance painting was male or female. The "experts" couldn't agree. ;-)

> In my actual case, the patterns I was looking for grew more complex, > so I reasoned that the requirement was becoming unreasonable as the > preceding data structures weren't designed for easy sifting. Luckily, > I could rewrite history (not always a luxury in this industry) and > simplified the problem almost out of existence. >

My complements!!! Sometimes the best way to solve a problem is to redifine the terms of discourse so that the problem simply disappears!

> Finally, this is getting a bit long, but just to stick up for the > "finite machine state" approach I took in the original code... >

I certainly use FSM-inspired design at times myself, but the nasty part (IMHO) is always the transition between states. Not implementing the transition (usually just updating a variable) but making it clear to the eye/mind of the reader the pattern of transitions that will actually occur. In that regard, I suggest that the (improved) JSP-inspired version above can be regarded as a FSM as well. It's just that each "state" corresponds to a region of code, and the state transitions are simply the standard flow-control! In the updated FSM version, State 0 is *always* followed by State 1 (modulo falling off of the end) and a series of occurrences of State 1 are likewise followed by a State 0 (same caveat). The JSP version makes that organization much more obvious (IMHO). Finally, there is still considerable redundancy in the FSM version -- it's just in time rather than in codespace! Notice that the "update largest" test is done *for each element* of each run, and the sets for a maximum-length run will be done for each element after the length of its nearest competitor has been passed. The FSM code contains these expressions only once, but evaluates them *MANY* more times than the equivalent ones in the JSP version. I'm not criticizing here, but pointing out the trade-offs (which just happen to be particularly dramatic in this case! ;-) It is possible to optimize for one measure (in this case, minimizing source code redundancy) in such a way that it pessimizes another measure (the number of times the code is -- redundantly -- evaluated). Note that the JSP design heuristics don't minimize *ALL* aspects of code redundancy; the NOT EMPTY? test is really in there twice!!! Brownie points to anyone who can show why! ;-) -jn-

[14/19] from: g:santilli:tiscalinet:it at: 6-Nov-2003 11:47

Hi Patrick, On Tuesday, November 4, 2003, 6:44:28 PM, you wrote:

>>> parse [1 2 3][1 1 1 1 1 2 1 1 3]

ppln> == true ppln> Quite odd, isn't it ? Yes, but actually I never had to do something like that, so I think it's not a big problem. Ladislav has proposed adding a LITERAL keyword to PARSE to handle such a situation. Regards, Gabriele. -- Gabriele Santilli <[g--santilli--tiscalinet--it]> -- REBOL Programmer Amiga Group Italia sez. L'Aquila --- SOON: http://www.rebol.it/

[15/19] from: greggirwin:mindspring at: 6-Nov-2003 6:44

Hi Gabriele,

>>>> parse [1 2 3][1 1 1 1 1 2 1 1 3]

... GS> Yes, but actually I never had to do something like that, so I GS> think it's not a big problem. Ladislav has proposed adding a GS> LITERAL keyword to PARSE to handle such a situation. Some kind of escape in parse rules might be nice. I haven't needed it either but, if you're dealing with a small set of special numbers in a particular situation, we can at least work around it. e.g. make-int-rule: func [val][append copy [1 1] val] make-int-rules: func [values [block!]][ foreach val values [ set to word! join '` val make-int-rule val ] ] make-int-rules [2 27 94] ; I chose ` so it looks sort of like a lit-number! format. :) print parse [2 27 94] [`2 `27 `94] -- Gregg

[16/19] from: SunandaDH:aol at: 7-Nov-2003 10:28

Hi Joel, Thanks for some fascinating discussion points. I've only got time to reply to a couple. If think we'd pretty much agree that a problem solution depends on the structure of the problem. But we might disagree in specific cases about the best approach to take.

> I'm not clear on what you mean by "both approaches"... I view JSP > (Jackson Structured Programming) simply as a specific type of > structured programming which has strong heuristics about which > structure(s) should drive the design.

Sorry. I was being vague. I was distinguishing a process-oriented approach (what do I need to do?) from a data-oriented approach (what data structures do I need to work on?) Your point is that it is usually best to start with the data structures. I'd agree. But it isn't always that simple. And starting with processes can (and probably does) lead to quite characteristically different solutions. Which is best is often subjective -- or "proved" by the passage of time and the ravages that maintenance makes to the original solution. A sort of analogous example..... I went to London a couple of weeks ago. I could describe that trip in many different ways. Here's three: 1. In terms of processes (I used a folding bicycle and a train). 2. In terms of waypoints ("data"): Home --> Station --> Station --> Destination 3. In terms of purpose (to discuss an organization's global internet strategy). I could have used the same processes if some of the intermediate waypoints had been different (Home --> Station can be via a canal towpath, a cycle track, or duking it out with the traffic; there is more than one train route to London). But I would have needed to use different processes (taxi, car, plane) if the waypoints had been slightly different (via New York for example). And lots of things would have had to change if the purpose had been different (say I was delivering a sofa). Back to IT design, it may be better, in some circumstances, to start with why am I doing this? rather than "what am I doing it to?" or "what am I doing it with?".

> > -- The numbers are RGB values from a JPG. Find the average color tone > > of the object the female nearest the camera is looking at. Break out

<<quoted lines omitted: 5>>

> whether a certain figure in a Renaissance painting was male or female. > The "experts" couldn't agree. ;-)

I'm not so sure I completely agree. Two reasons. First, taking a purely procedural approach, I can start a solution to this problem: females-in-image: copy [] foreach item my-image [item-type: analyze-item item if item-type/female [append females-in-image item] ] This highlights both the strength and the weakness of a top-down procedural approach..... I can easily map out the code at the highest levels. But need faith that I will be able to fill out the lower levels. Today, analyze-item isn't going to be easy. In a few years time, such functions might come free in cornflakes packets. My second reason is that it is *always* a matter of interpretation, even with the original "well-defined" task. The actual original task was to find the best why to describe the differences between two version of the same file. There is a lot of subjectivity there. Examples: orginal-file: ["A" "B" "C"] new-file: ["B" "C" "A"] Topographically, of course, these are identical. But that isn't the answer that is going to get the program signed off. The "simplest" description of the differences, in terms of how to transform one into the other. is: -- Delete ["A" "B" "C"] -- Insert ["B" "C" "A"] That's not going to win any prizes either, although it would be "best" if the files had been: orginal-file: ["A" "B" "C"] new-file: ["X" "Y" "Z"] An acceptable solution might be: --Insert ["B" "C"] at start --Delete ["B" "C"] at end A "better" solution might be: --Delete ["A"] at start --Insert ["A"] at end But "best" and "better" depend on the resources available. In this case, they are a little restricted: ** REBOL has a limited recursion stack (2000-odd items) so an off-the-shelf solution using recursion is out of bounds. ** Even unravelling the algorithms to remove the recursion doesn't help -- take a look at GNU's Diff.c for example. It's full of dire warnings about the algorithm being "to expensive" in some conditions. We wanted something that was reasonably deterministic (timing scaling according to the size of files, not the number of differences). That led to a serious of hacks (I'm not proud of the design) that takes a series of linear passes at the two files with a few sorts. It works, I'm happy. But we can still argue over the "best" way to describe some changes. Is: orginal-file: ["A" "A" "A"] new-file: ["A" "A" "A" "A"] an insertion at the beginning? At the end? An intermediate point? Or some combination of moves, deletions and insertions? So the original, apparently "well-defined", task emerged from an interplay of the algorithm design (which was constrained by current limitations in REBOL), combined with a subjective decision to use REBOL rather than Python or C) and a subjective debate on the "best" way to chose between alternative presentations of the results. The experts finally had to agree in this case, but we're still mumbling over some of the fine details. Even the most seemingly objective task design has hidden subjective depths. Thanks again for the stimulation. Now I really do need a coffee break, Sunanda.

[17/19] from: joel:neely:fedex at: 7-Nov-2003 10:44

Hi, Sunanda, [SunandaDH--aol--com] wrote:

> Your point is that it is usually best to start with the data > structures. I'd agree. But it isn't always that simple. >

I agree. Sorry for not being more precise. What I should have said (adding the omitted conditions/details) was: For problems where the input/output (or argument/result) data structures are already defined, it is usually very helpful in design, implementation, and maintenance to use the I/O (a/r) structures as much as possible as guides for the algorithm structure. This approach usually helps minimize redundant code, gives unambiguous guidance to where each part of the code should be placed in the algorithm, and minimizes the risk of bugs arising from accidental mismatches between the flow of the algorithm and the flow of the data. Of course, in cases where the nature of the data are somewhat up in the air (e.g. the problem is more vaguely specified, or the designer is given latitude to choose data/representation structures) there's clearly not so much heuristic guidance. Also, if the structure of the data changes, it may imply significant rework of the program.

> The actual original task was to find the best why to describe > the differences between two version of the same file. There is > a lot of subjectivity there. >

That's exactly what I meant by not "well-defined". I don't mean that as a negative description, but simply as an indication that there may be a period of more "exploratory programming" to try different ideas before choosing one as the basis for final design and implementation (or that the program may very well simply evolve, as various ideas/heuristics are added and tweaked). I suggest that in such a case, there's benefit from a program structure that makes it easy to figure out where to put such heuristics (and where to find them when its time to change or delete them;-)

> But "best" and "better" depend on the resources available. > In this case, they are a little restricted: >

Again, we'll certainly agree that the juggling and comprimises made when shoehorning a 10-pound algorithm into a 5-pound interpreter are at least as much art as science! ;-) -- ---------------------------------------------------------------------- Joel Neely joelDOTneelyATfedexDOTcom 901-263-4446 Enron Accountingg in a Nutshell: 1c=$0.01=($0.10)**2=(10c)**2=100c=$1

[18/19] from: antonr:iinet:au at: 8-Nov-2003 15:04

Hi Sunanda,

> I can easily map out the code at the highest levels. But need > faith that I > will be able to fill out the lower levels. Today, analyze-item > isn't going to be > easy. In a few years time, such functions might come free in cornflakes > packets.

... Very good points, Sunanda. Pragmatism in getting the job done can wreck an idealism.

> ** REBOL has a limited recursion stack (2000-odd items) so an > off-the-shelf > solution using recursion is out of bounds.

Just a note: You can implement your own stack to have a much larger size than 2000.

> Sunanda.

Anton.

[19/19] from: SunandaDH:aol at: 8-Nov-2003 10:58

Hi Joel,

> Of course, in cases where the nature of the data are somewhat > "up in the air" (e.g. the problem is more vaguely specified, > or the designer is given latitude to choose data/representation > structures) there's clearly not so much heuristic guidance.

But why do I only ever get such problems!?

> Also, if the structure of the data changes, it may imply > significant rework of the program.

That's a crucial point. You can be utterly 100% confident that anything that involves humans will change and usually completely unpredictably. The changes can hit any part of a system, and cumulatively -- if not instantly -- can have devasting effects. I don't think any methodology addresses the need for continual, random change.

> Again, we'll certainly agree that the juggling and comprimises > made when shoehorning a 10-pound algorithm into a 5-pound > interpreter are at least as much art as science! ;-)

Yes, well that makes it more fun! Sunanda.

Notes

Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted