Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

More on expressions

 [1/25] from: joel::neely::fedex::com at: 1-Feb-2003 10:21


Inspired by the discussions of ANY and ALL, I thought I'd share this little puzzle, prefaced with a tad of background. Some languages are described as "imperative" or "procedural" because the programmer is writing commands to be obeyed. PASCAL, COBOL, FORTRAN, and (the way most published code appears) C come to mind. Other languages are designated as "functional" because the programmer is defining functions whose (side-effect-free) evaluation yields the desired result. SCHEME, ML, and HASKELL come to mind. I think of REBOL as a hybrid which I'll call "expressional" because we write expressions to be evaluated, but often the (side-)effect of those evaluations is the goal. For instance APPEND has both a resulting value and an effect; sometimes we want one, or the other, or both. Thinking clearly about what our expressions can do for us is a very stimulating way to explore the power of programming concepts. Here is a little puzzle along that vein. GIVEN: Three variables, A, B, and C which we'll assume have been set to numeric values. CHALLENGE: Write an expression whose value is the median of the values. A trival (but time-consuming) solution is second sort reduce [a b c] since the median of a set of values is "the one in the middle if they are in order". CONSTRAINT: Just to make it iteresting ;-) the expression MAY NOT use any of these REBOL words/functions: sort if either first second third nor any path expressions. The trick, of course, is to find an expression that eliminates the need for any explicit "decision" or "selection" activity. Have fun! -jn-

 [2/25] from: sunandadh:aol at: 1-Feb-2003 13:56


Joel:
> GIVEN: Three variables, A, B, and C which we'll assume have > been set to numeric values.
<<quoted lines omitted: 4>>
> sort if either first second third > nor any path expressions.
I'll have a go. The median is the largest left after you remove the largest, so: pick maximum-of head remove maximum-of reduce [a b c] 1 Sunanda.

 [3/25] from: joel:neely:fedex at: 1-Feb-2003 13:18


Hi, Sunanda, Ingenious! But I'll have to be picky... (Who, me? ;-) [SunandaDH--aol--com] wrote:
> Joel: > > GIVEN: Three variables, A, B, and C which we'll assume have
<<quoted lines omitted: 11>>
> largest, so: > pick maximum-of head remove maximum-of reduce [a b c] 1
Since pick some-expresion 1 is really the same as first expression I'll have to say that this violates the spirit (if not the letter) of the constraints. Just to repeat the hint at the end of the original post:
> The trick, of course, is to find an expression that eliminates the > need for any explicit "decision" or "selection" activity. >
How about another go? -jn-

 [4/25] from: sunandadh:aol at: 1-Feb-2003 15:01


Joel:
> How about another go?
My pleasure! But something tells me you won't like this either. Using the same logic as before but with a temporary variable: temp: head remove maximum-of reduce [a b c] max temp/1 temp/2 Or (another one you won't like) the median is what is left if you remove the maximum and the minimum. If I can't "pick ... 1" to convert the integer in a block to an integer, I'll do it the hard way: to integer! form head remove minimum-of head remove maximum-of reduce [a b c] More broken spirits!? Sunanda.

 [5/25] from: g:santilli:tiscalinet:it at: 1-Feb-2003 21:01


Hi Joel, On Saturday, February 1, 2003, 8:18:33 PM, you wrote: JN> How about another go? The hint was in the beginning of the email, so I'll take that. I you still want to work your answer, do not read further. any [ all [a < b b < c b] all [b < c c < a c] all [c < a a < b a] all [c < b b < a b] all [b < a a < c a] all [a < c c < b c] ] But this is making a decision actually, so maybe you had something different in mind. Hmm, now that I think of it, a + b + c - (min a min b c) - max a max b c but MIN and MAX are doing decisions so we're still deciding somehow. :-) Regards, Gabriele. -- Gabriele Santilli <[g--santilli--tiscalinet--it]> -- REBOL Programmer Amigan -- AGI L'Aquila -- REB: http://web.tiscali.it/rebol/index.r

 [6/25] from: rgombert:essentiel at: 1-Feb-2003 21:12


Maybe you'll find this better ? any [all all [a > b a < c a] [a > c a < b a] all [b > a b < c b] all [b > c b < a b] c] Renaud GOMBERT -------------------------------- www.essentiel.net N° SIRET : 418 620 159 N° MdA : G316527 NAF/APE : 923A

 [7/25] from: rgombert:essentiel at: 1-Feb-2003 22:32


Ooops, sorry, one of my 'all had some problem when copypasting ;-) So here it is... and indented : any [ all [a > b a < c a] all [a > c a < b a] all [b > a b < c b] all [b > c b < a b] c ] Renaud GOMBERT -------------------------------- www.essentiel.net N° SIRET : 418 620 159 N° MdA : G316527 NAF/APE : 923A

 [8/25] from: rotenca:telvia:it at: 2-Feb-2003 11:35


Look down to see my solution: any [all [(a > b) xor (a > c) a] all [(a > c) xor (b > c) c] b] --- Ciao Romano

 [9/25] from: sunandadh:aol at: 2-Feb-2003 6:37


Just to demonstrate how sticking to the rules may not lead to an elegant solution.... The median is the one in the middle -- provided the first one is lower and the third one is higher. If those conditions aren't true, throw the numbers in the air and try again: until [set [a b c] random/secure reduce [a b c] all [a <= b b <= c]] b There is no explicit "decision" just a termination condition whose precise timing is indeterminate. No "selection" activity either -- random shuffling isn't selection. (We'd of course put this in a func with local variables to stop messing up the caller's three variables. I'm not entirely unreasonable when it comes to coding). :-) Sunanda.

 [10/25] from: joel:neely:fedex at: 2-Feb-2003 7:34


Hi, Sunanda, <rotfl> This is the first "practical" application I've ever seen anyone make of the infamous BogoSort algorithm! What a blast! </rotfl> [SunandaDH--aol--com] wrote:

 [11/25] from: fuka:fuxoft:cz at: 2-Feb-2003 15:15


Joel Neely wrote:
>> until [set [a b c] random/secure reduce [a b c] all [a <= b b <= c]] b
random/secure ??? The Rebol Dictionary says that "random/secure - Returns a cryptographically secure random number" without giving any examples or explaining what "cryptographically secure random number" might be...? -- Frantisek Fuka (yes, that IS my real name) (and it's pronounced "Fran-tjee-shek Foo-kah") ---------------------------------------------------- My E-mail: [fuka--fuxoft--cz] My Homepage: http://www.fuxoft.cz My ICQ: 2745855

 [12/25] from: joel:neely:fedex at: 2-Feb-2003 8:19


Many thanks to all who've participated in this thread. Anyone who wants to keep playing, skip reading this email until you've had your fun! For those who'd like a hint about another approach, keep reading this email below. . . . . . . . . . . . Joel Neely wrote:
> Inspired by the discussions of ANY and ALL, I thought I'd share this > little puzzle, prefaced with a tad of background. >
Gabriele took that hint in an interesting way I hadn't thought of. (Thanks, Gabriele, for the new food for thought!) Let me give a nudge in the direction I was thinking. We've discussed how ANY and ALL implicitly provide a way to implement chains of tests. I intended to reinforce that hint at the end of the email.
> The trick, of course, is to find an expression that eliminates the > need for any explicit "decision" or "selection" activity. >
Are there any other words that provide implicit tests? Stop here if you want to take some time for thought. For more along this line, keep reading. Gabriele got "dangerously" close to the strategy I was pondering, with his alternate suggestion:
> a + b + c - (min a min b c) - max a max b c >
and rightfully so, since I am the one who specified that the values would be numeric. Second nudge: suppose that A, B, and C were not numeric (so we can't use addition and subtraction). As before, stop reading unless you want more hints ( which will come faster as we get closer to the end ). Think about the relationship between decisions and sorting. What can can we say is accomplished by each iteration of something like the classic BubbleSort? We can "unwrap" BubbleSort for three values into the following sequence of operations: 1) compare A and B, swapping their values if out of order 2) compare B and C, swapping their values if out of order 3) compare A and B, swapping their values if out of order (and there are some combinations of values that require all three of these steps). We can diagram this using the notion of "sort networks", as presented by Knuth, in this way: A ---+-------+--- A' | | B ---+---+---+--- B' | C -------+------- C' where the diagram is read left-to-right, with each vertical bar as an instance of "compare and swap if out of order" using the two values at either end. At the right-hand-end, A', B', and C' contain the sorted values from A, B, and C. This is an ENORMOUS hint, if we think about how to implement the compare and swap if out of order without using IF. One last chance to play with these hints on your own before I spoil the secret. We could do the "compare and swap if out of order" with MIN and MAX, as follows: set [a b] reduce [a min b a max b] set [b c] reduce [b min c b max c] set [a b] reduce [a min b a max c] which provides a literal transcription of the sort network diagram if read from left to right. However, we didn't ask for the sorted list, but just the median, so let's skip the SET operations, and just use the expressions. Substituting the expression for the modified B into the second line above gives us set [b c] reduce [(a max b) min c (a max b) max c] Then we can substitute *that* expression for B (and the one for the new value of A) into the third line: set [a b] reduce [ (a min b) min ((a max b) min c) (a min b) max ((a max b) min c) ] and then we recall that we only wanted the median, which means we only wanted the ultimate value for B, giving us our final answer: (a min b) max ((a max b) min c) which will work for any datatype (number pair char money date time tuple series) that MIN and MAX handle. There's actually a practical (well, at least if you think "philosophy of programming" is practical ;-) point to this little puzzle. We programmers are biased to think operationally. It's very easy to approach a p[roblem by thinking, "If I were solving this, I'd first do blah, and then do blah-blah, and then if yadda-yadda I'd do ..." Most experienced programmers could see an expression in a problem specification such as a * b + a * c + a * d + ... + a * z and would almost instinctivly 1) know that it could be written as a * (b + c + c + ... + z) because we've been trained to know the properties of * and + as operators, and how they interact. 2) probably desire to rewrite it as shown above, becase we've been trained to think about saving compute time. However, we typically aren't trained to think of other parts of our language (such as MIN, MAX, or the inspiring ANY, ALL, EITHER, etc.) in the same way. It makes me wonder how often we miss opportunities to do simplifications just as valid as the above arithmetic example because don't think about our operators enough... Thanks for playing! -jn-

 [13/25] from: sunandadh:aol at: 2-Feb-2003 10:15


Joel, Spoiler on Joel's solution below: . .. ... ..... ...... ..... .... ... .. .
> set [a b] reduce [a min b a max b] > set [b c] reduce [b min c b max c] > set [a b] reduce [a min b a max c]
Just to be picky myself here for a few lines, a min b is not valid Rebol -- you mean min a b. That makes your first step: set [a b] reduce [min a b max a b] set [b c] reduce [min b c max b c] set [a b] reduce [min a b max a b] Which clearly does make b the median value. Your subsequent simplifications confused me, perhaps because of the wrong syntax -- but is seems to me that you need to do at least one swap to make the thing work. My attempted reformulation of your final result doesn't work: max (min a b) (min (max a b) c) But I may have missed a step there somewhere. Try again!? Sunanda

 [14/25] from: joel:neely:fedex at: 2-Feb-2003 15:14


Hi, Sunanda, ... proving once again that notation can be too much fun ... ;-) Thanks for the correction of my prefix-vs-infix blunder! I'm so used to thinking of the operations in question as first-class operators (comparable to + - * and /) instead of programming language procedures that I typed what I was thinking instead of what I should have been writing! Now on to the real issue... Skip the rest of this email for those still playing with the puzzle ... [SunandaDH--aol--com] wrote:
> Joel, > Spoiler on Joel's solution below:
<<quoted lines omitted: 9>>
> . > max (min a b) (min (max a b) c)
I'm puzzled that you said it didn't work for you. Here's a small torture test that shows the right result every time: median3: func [a b c] [max (min a b) (min (max a b) c)] loop 40 [ trio: random [1 2 3] print [mold trio tab median3 trio/1 trio/2 trio/3] ] ... which produces results resembling ... [2 1 3] 2 [2 1 3] 2 [3 1 2] 2 [3 2 1] 2 [1 3 2] 2 [3 1 2] 2 [3 1 2] 2 [1 2 3] 2 [1 3 2] 2 ... [2 3 1] 2 [2 3 1] 2 [2 1 3] 2 [1 3 2] 2 ... and a non-numeric variation ... loop 40 [ trio: random ["1-A" "1-B" "1-C"] print [mold trio tab median3 trio/1 trio/2 trio/3] ] ... which produces results resembling ... ["1-B" "1-C" "1-A"] 1-B ["1-C" "1-B" "1-A"] 1-B ["1-C" "1-A" "1-B"] 1-B ["1-A" "1-C" "1-B"] 1-B ... ["1-B" "1-A" "1-C"] 1-B ["1-A" "1-B" "1-C"] 1-B ["1-B" "1-C" "1-A"] 1-B ["1-A" "1-B" "1-C"] 1-B Can you reproduce the problem using the above code? -jn-

 [15/25] from: sunandadh:aol at: 2-Feb-2003 16:55


Joel:
> I'm puzzled that you said it didn't work for you. Here's a small > torture test that shows the right result every time:
Sorry for that. My mistake. I cut and pasted the code incorrectly into my torture test. When pasted straight, it does indeed produce the median every time. By way of compensation or entertainment, here's another slightly demented way of doing it. The first three additions must give us Median + twice Maximum. So a quick subtraction of twice the maximum, and the job is done. (max a b) + (max b c) + (max a c) - (2 * (max a max b c)) Of course this only works on the original spec -- numbers -- but it's better than your solution because it uses less min's.....So this is *the* solution to remember if challenged to do it without using if, sort, first, second, third, either or min. One day that might save your life. To make it work with non-numbers, tweak it into inserting and removing items in a block, Sunanda Whose hoping his code examples will win him the first ever "Rebol without a clue" award.

 [16/25] from: lmecir:mbox:vol:cz at: 4-Feb-2003 7:41


Ahoj Franto, ----- Original Message ----- From: "Frantisek Fuka"
> "random/secure"??? The Rebol Dictionary says that "random/secure - > Returns a cryptographically secure random number" without giving any > examples or explaining what "cryptographically secure random number" > might be...?
The RANDOM/SECURE is just a better (statistically) RANDOM, but it is much more expensive (looks six times slower). -L

 [17/25] from: carl:cybercraft at: 5-Feb-2003 21:51


On 03-Feb-03, Joel Neely wrote:
> Many thanks to all who've participated in this thread. Anyone who > wants to keep playing, skip reading this email until you've had > your fun!
Okay, I'm late to this, although I didn't skip the following, but just skimmed it to the end to see Joel's solution. ;-) And, for anyone still playing I'll leave a few lines in saying nothing so you won't see Joel's solution, or mine. So...
> For those who'd like a hint about another approach, > keep
<<quoted lines omitted: 12>>
> . > .
[snip]
> and then we recall that we only wanted the median, which means we > only wanted the ultimate value for B, giving us our final answer: > (a min b) max ((a max b) min c)
Which was wrong as it happens, but here's Joel's corrected (in a later post) version, written as a function: median3: func [a b c] [max (min a b) (min (max a b) c)] As I said, I just skimmed Joel's post that explained how he arrived at his solution, but I did notice the use of min and max. So now for a brief explaination of how I arrived at my soloution. Firstly, I worked out what I considered the easiest way to get the maximum and minimum values from A, B and C...
>> a: 1 b: 2 c: 3
== 3
>> max max a b c
== 3
>> min min a b c
== 1 This gave me the idea for my first solution...
>> difference reduce [a b c] reduce [max max a b c min min a b c]
== [2] which returned the correct result, but in a block, and difference may be against the spirit of the game anyway, since sort and so on was out. (Difference is the word I thought of first when I read Joel's original post containing his challenge.) After more doodling, I hit on the thought that this... min a b min b c min a c would produce three values, none of which would be the maximum value of A, B and C, but which would include the median value, which of course would be the maximum value of these three new values. So, put a max and a max in front of those and my solution is...
>> max max min a b min b c min a c
== 2 A triffle longer than Joel's, but very REBOL I think. (And I do so hate parens;) -- Carl Read

 [18/25] from: joel:neely:fedex at: 5-Feb-2003 6:43


Thanks, Carl, I almost always find the train of thought leading to a solution as interesting as the final result (and usually more so). I agree with you about the parentheses (mostly! ;-). In the case of my solution, I left them there to make it clear where the sub- expressions came from, but I should have pointed out that all of the parentheses in my last version were actually redundant and could simply be dropped. (That's actually what I did when I set up some of the solutions for benchmarking, the results of which I'll post Real Soon Now...) -jn- Carl Read wrote:

 [19/25] from: carl:cybercraft at: 6-Feb-2003 11:08


On 06-Feb-03, Joel Neely wrote:
> Thanks, Carl, > I almost always find the train of thought leading to a solution as > interesting as the final result (and usually more so).
I thought I'd include them, since I've no formal training as a programmer and don't understand "real" maths, either. Might help you to understand how I get by with these computer things. ;-)
> I agree with you about the parentheses (mostly! ;-). In the case > of my solution, I left them there to make it clear where the sub-
<<quoted lines omitted: 3>>
> some of the solutions for benchmarking, the results of which I'll > post Real Soon Now...)
Well, yours will be the quicker, since it has a min or max less in it, but I'd be interested to know what kind of speed differences the parens make, so could you do a test of yours with them in as well? -- Carl Read

 [20/25] from: joel:neely:fedex at: 7-Feb-2003 17:24


Hi, Carl, Here are the benchmark results: Carl Read wrote:
> Well, yours will be the quicker, since it has a min or max > less in it, but I'd be interested to know what kind of speed > differences the parens make, so could you do a test of yours > with them in as well? >
First a bit of preamble; I took all of the submitted solutions (with a bit of cleanup, such as fixing the infix/prefix typo and replacing </> with <=/>= to allow equal values among the args) and built a testing and benchmarking harness. The benchmarking is built on the infrastructure I'm building (in my spare time with my left foot) for the "shootout" effort, so I won't take the time to explain that here -- documentation will be forthcoming Real Soon Now -- but the test cases appear below. Each case has a string label, a "setup" block to do any preliminary evaluations, and a "final" block that gives the expression whose value should be the median of the three arguments A B and C . test-specs: [ "benchmark baseline" [] [0] "second sort" [] [second sort reduce [a b c]] "pick max remove max" [] [ pick maximum-of head remove maximum-of reduce [a b c] 1 ] "any all (explicit)" [] [ any [ all [a <= b b <= c b] all [b <= c c <= a c] all [c <= a a <= b a] all [c <= b b <= a b] all [b <= a a <= c a] all [a <= c c <= b c] ] ] "any all (w/ default)" [] [ any [ all [a >= b a <= c a] all [a >= c a <= b a] all [b >= a b <= c b] all [b >= c b <= a b] c ] ] "sum less extremes" [] [ a + b + c - (min a min b c) - max a max b c ] "max of remove max" [ dummy: head remove maximum-of reduce [a b c] ][ max dummy/1 dummy/2 ] "int form remove extremes" [] [ to integer! form head remove minimum-of head remove maximum-of reduce [a b c] ] "any xor with default" [] [ any [ all [(a > b) xor (a > c) a] all [(a > c) xor (b > c) c] b ] ] "unwrapped sort with sets" [ set [a b] reduce [min a b max a b] set [b c] reduce [min b c max b c] set [a b] reduce [min a b max a b] ][ b ] "sums less twice max" [] [ (max a b) + (max b c) + (max a c) - (2 * (max a max b c)) ] "max of mins" [] [max max min a b min b c min a c] "sort network (parens)" [] [max (min a b) (min (max a b) c)] "sort network" [] [max min a b min max a b c] ] The only one missing is the "bogosort" approach, which is way slow compared to everything else (about 13 times the time needed for the fastest solution). The above block was first submitted to a testing harness to verify that all expressions yielded the correct results for a comprehensive set of test data; here's that output: [1 1 1] [1 1 1] 1 [1 1 2] [1 1 2] 1 [1 1 3] [1 1 3] 1 [1 2 1] [1 1 2] 1 [1 2 2] [1 2 2] 2 [1 2 3] [1 2 3] 2 [1 3 1] [1 1 3] 1 [1 3 2] [1 2 3] 2 [1 3 3] [1 3 3] 3 [2 1 1] [1 1 2] 1 [2 1 2] [1 2 2] 2 [2 1 3] [1 2 3] 2 [2 2 1] [1 2 2] 2 [2 2 2] [2 2 2] 2 [2 2 3] [2 2 3] 2 [2 3 1] [1 2 3] 2 [2 3 2] [2 2 3] 2 [2 3 3] [2 3 3] 3 [3 1 1] [1 1 3] 1 [3 1 2] [1 2 3] 2 [3 1 3] [1 3 3] 3 [3 2 1] [1 2 3] 2 [3 2 2] [2 2 3] 2 [3 2 3] [2 3 3] 3 [3 3 1] [1 3 3] 3 [3 3 2] [2 3 3] 3 [3 3 3] [3 3 3] 3 0 errors in second sort 0 errors in pick max remove max 0 errors in any all (explicit) 0 errors in any all (w/ default) 0 errors in sum less extremes 0 errors in max of remove max 0 errors in int form remove extremes 0 errors in any xor with default 0 errors in unwrapped sort with sets 0 errors in sums less twice max 0 errors in max of mins 0 errors in sort network (parens) 0 errors in sort network Then the various cases were benchmarked competitively. I used a baseline of evaluating a literal zero as a result; the benchmark engine subtracts out the timings for the baseline, and then gives ratios between all the "real" cases relative to the smallest one. With a bit of formatting (percentages and labels), we get these comparative results, fastest to slowest: 100 sort network 122.4 max of mins 122.4 sort network (parens) 167.7 any xor with default 175.5 sum less extremes 191.6 any all (w/ default) 219 sums less twice max 219.5 pick max remove max 245.3 any all (explicit) 250.8 second sort 264.2 max of remove max 338.1 int form remove extremes 549.6 unwrapped sort with sets Interestingly enough, adding the parens to the "sort network" expression adds 20%-25% to the running time, which is essentially the same cost as the extra min or max of the next alternative. Thanks to all who participated! -jn- -- ---------------------------------------------------------------------- Joel Neely joelDOTneelyATfedexDOTcom 901-263-4446

 [21/25] from: carl:cybercraft at: 8-Feb-2003 14:20


On 08-Feb-03, Joel Neely wrote:
> Hi, Carl, > Here are the benchmark results:
<<quoted lines omitted: 4>>
>> differences the parens make, so could you do a test of yours >> with them in as well?
[snip of the code and comments - see Joel's previous post]
> 100 sort network > 122.4 max of mins
<<quoted lines omitted: 12>>
> expression adds 20%-25% to the running time, which is essentially > the same cost as the extra min or max of the next alternative.
RT documentation suggests it's best to remove parens if possible due to speed reasons, which is partly why I asked the question. Those coming from compiled languages might expect no slowdown because of them and so leave them in for clarity, but this nicely shows that there can be quite a bit of cost to leaving them in. And thanks for doing the work Joel. I'm sure all here appreciate it. -- Carl Read

 [22/25] from: sunandadh:aol at: 8-Feb-2003 6:47


Joel:
> Here are the benchmark results:
Thanks for doing all that work, Joel! I'm glad to see that one of my "solutions" was so slow that it didn't make the cut into the benchmarks. That's as much as distinction as being the fastest :-) Other dimensions for scoring how "good" the various solutions are include: Range of data types handled: ** Some existing solutions only work with integers ** Some only work with numbers ** Some won't work with pairs (try a: 6x2 b: 2x6 c: 4x4) ** Some work with most Rebol data type -- though they fail on things like a: true b: false c: false ** There's just one (I think) that'll work with any Rebol data type. Flexibility in ordering ** All solutions so far use Rebol's built-in logic for ordering. But if we're trying to find the middle Czech surname from a list, how quickly can the solution be adapted to Czech collating sequence? Flexibility in processing the data * Find the median, but ignore missing values when doing so -- how many of our solutions return 10 from this data? a: 10 b: none c: 10 or from this: a: none b: none c: 10 Number of items to compare ** Very few scale easily beyond three items How quickly can the various solutions be adapted to 13 items? or N where N isn't known until run time? The fastest solution usually isn't the one that runs longest (if you see what I mean) before being replace by something more generalised. Sunanda.

 [23/25] from: joel:neely:fedex at: 8-Feb-2003 11:26


Hi, Sunanda, Interesting observations, all! You've certainly taken the discussion to another "level" (pun intended). Let me suggest that some of your points take us out of the arena of algorithm design into the arenae of system design and language design. [SunandaDH--aol--com] wrote:
> I'm glad to see that one of my "solutions" was so slow that it > didn't make the cut into the benchmarks... >
I'll rectify that omission shortly! (The original version using RANDOM/SECURE took about 13x as long as "sort network"; when I changed it to use plain RANDOM a quick check looked to be about 8x times as long as "sort network".) I was originally being paranoid about how long I would need to run the entire competition, feeling pressure to get results out. I'm tidying up the benchmark harness for publication, and will have that published to the community soon.
> Other dimensions for scoring how "good" the various solutions are > include:
<<quoted lines omitted: 5>>
> like a: true b: false c: false > ** There's just one (I think) that'll work with any Rebol data type.
Of course, some of that is my responsibility; my original post said: Three variables, A, B, and C which we'll assume have been set to numeric values. Equally obvious is the point that there's a different mind-set (and experience level) involved in addressing each of the following issues: - designing an algorithm/function/program to accomplish a narrowly- defined goal; - designing a solution for a class of tasks/goals you think are likely in a given environment, given a specific (narrow) instance of that class; or - designing a generalized tool for reuse in the widest possible set of environments/cases. The further we go in that direction, the more likely we are: - to tradeoff optimal performance for generality (which you address later on); and - to need support from the language (as in the next point).
> Flexibility in ordering > ** All solutions so far use Rebol's built-in logic for ordering. > But if we're trying to find the middle Czech surname from a > list, how quickly can the solution be adapted to Czech > collating sequence? >
I see that as raising at least three questions: - How much work would it be to *replace* a "smaller" solution with one which is extended to cover a larger domain? - With that extension done, how much is the performance over the original domain affected? - Alternately, could the underlying design of a "smaller" solution be (re)used in the development of *another* solution tailored for a different domain? In all three cases, of course, the generality of the design (not necessarily the initial implementation!) is crucial. Below is my attempt to identify the key design dependencies of each of the tested solutions. 1) Requires arithmetic and ordering of two elements: sums less twice max +/- max sum less extremes +/- min/max 2) Requires abiltity to test two elements for "in order" or "out of order": any all (explicit) >=/< any all (w/ default) >=/< any xor with default >=/< second bogosort < 3) Requires ability to order two elements: sort network min/max max of mins min/max sort network (parens) min/max unwrapped sort with sets min/max 4) Requires ability to find extrema of any number of elements: pick max remove max maximum-of int form remove extremes minimum-of/maximum-of 5) Requires both 2-element ordering and n-element extrema: max of remove max maximum-of/max 6) Requires ability to sort any number of values: second sort sort Clearly group (1) can only be used with numbers (or some abstract type with an ordering relation and an order-preserving invertible composition operator ;-) so I won't try to generalize that group. Groups (2) thru (5) could be generalized in two ways: - Modify the design to accept data-appropriate comparison/ordering operators to be specified by the caller (passed as parameters or set as attributes of an object containing a median method based on those attributes. - Reuse the design as a pattern, replacing the generic operators with functions over the alternate data type to be supported in a different version. This takes us to a language-design issue, in that we might expect the key functions of (2) thru (5) to be related nicely, as in min: func [a b] [either a <= b [a] [b]] minimum-of: func [s /local a] [ a: first s foreach b s [a: min a b] a ] sort: func [s /abstract... rearrange elements of s until a <= b is true for all pairs of consecutive elements ] Alas, in REBOL this is not the case! SORT and the extrema functions (MINIMUM-OF and MAXIMUM-OF) will order values that can't be compared with the standard ordering/comparison operators. Of course we could turn things around, and use SORT as the basis of the known universe, as follows: sort: ...native... minimum-value-of: func [s] [first sort copy s] min: func [a b] [minimum-value-of reduce [a b]] <=: func [a b] [a = min a b] (except we'd need a new name for <= because of REBOL lexical syntax rules!) If REBOL were truly object-oriented, we'd just call the appropriate methods on the values being compared/ordered/sorted, and let dynamic dispatch and inheritance "sort it out" (pun intended ;-)!
> Flexibility in processing the data > * Find the median, but ignore missing values when doing so -- how > many of our solutions return 10 from this data? a: 10 b: none > c: 10 or from this: a: none b: none c: 10 >
Now we're into a different set of issues: 1) What is the "contract" between caller and callee? 2) Who is responsible for enforcing that contract? 3) What should the callee do if the contract is broken? 4) What should the callee do if asked to perform the impossible? Some writers argue that error-checking and policy-enforcement should be done at the highest/outermost levels of a system, and the remainder of the system should be build on the assumption that this has been done. Their argument is based on the notion that performance is diminished if validity is constantly being vetted. The other side of the argument is that pushing the error-checking (at least -- policy-enforcement tends to beg the question of "whose policies?" and cause application-concept leakage) as far *down* as possible makes the lower-level routines more re-usable. Of course in the physical world, we confront this all the time! My house has a breaker panel to prevent circuit overload, and I can use a plain extension cord with any outlet. The error-checking duties are pushed "upstream". I could buy extension cords with a fuse or circuit breaker in each strand, I suppose... ;-) On the other hand, earlier this morning the circuit-breaker in the power conditioner that feeds this computer and monitor popped, which took down the computer but none of the other outlets in the room. Part of this gets to the heart of the distinction between dynamically- typed languages (REBOL, Python, Scheme, ...) versus statically-typed languages (PASCAL, JAVA, ...) and how much work we can do early on to avoid work later. (Look at the definition of FOR in REBOL to see an interesting example of this issue.)
> Number of items to compare > ** Very few scale easily beyond three items How quickly can the > various solutions be adapted to 13 items? or N where N isn't > known until run time? >
And how many special cases (e.g. 13) will one tolerate before just implementing the generalized case and trading programmer/coding cycles for CPU cycles?
> The fastest solution usually isn't the one that runs longest (if you > see what I mean) before being replace by something more generalised. >
The truth of that statement IMHO is highly dependent on the application domain. Real-time embedded systems are much more likely to require the fastest solution (even if that means having multiple type-specific routines) versus throw-away exploratory programs. Of course, that means that we typically use different tools for those domains! -jn-

 [24/25] from: sunandadh:aol at: 8-Feb-2003 15:50


Hi Joel, Thanks for a comprehensive and interesting discussion as usual. You are right that I attempted to move the issue from algorithm design to application design. I guess that's my bias. I enjoy the fact that we can come up with so may algorithms for doing the same thing. We need lots of different algorithms, each with its own datasheet of strengths and weaknesses. That way, application designers can pick the one they need. Different application designers will pick different algorithms depending on their needs. The right solution for a real-time controller (needing a determinate response time) may be different from another embedded-chip application (needing smallest footprint) and so on. Me, I've been warped by three decades of commercial application design. Means I normally instinctively go for the solution with the fewest built-in assumptions -- the most "general" usual, but seldom the most elegant. I know -- believe me I do -- that within 24 hours of implementing a median algorithm that handled numerics, I'd be on the wrong end of this phone call: It's broken! It's not displaying the median.....Yes, I know we said they'd be numbers -- but not always....On a Tuesday afternoon of course the three things could be city names....No of course we don't want the middle one alphabetically in that case -- why on earth would anyone want *that*!?....We need the median *population* -- unless the user logged on is a member of the ethics committee then they'd probably want the city with the median distance from the North Pole -- but check with them first, that's only a guess.....Why should I know if there are any other special cases" like that? -- you wrote the code: it's your job to know what median means.....And we need it fixed within 20 minutes." Sunanda.

 [25/25] from: rotenca:telvia:it at: 9-Feb-2003 12:06


Hi Sunanda,
> I know -- believe me I do -- that within 24 hours of implementing a median > algorithm that handled numerics, I'd be on the wrong end of this phone call:
<<quoted lines omitted: 5>>
> ethics committee then they'd probably want the city with the median distance > from the North Pole -- but check with them first, that's only a
guess.....Why
> should I know if there are any other "special cases" like that? -- you wrote > the code: it's your job to know what median means.....And we need it fixed > within 20 minutes."
You are right! This is the real world. :-( --- Ciao Romano

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted