More on expressions
[1/25] from: joel::neely::fedex::com at: 1-Feb-2003 10:21
Inspired by the discussions of ANY and ALL, I thought I'd share this
little puzzle, prefaced with a tad of background.
Some languages are described as "imperative" or "procedural" because
the programmer is writing commands to be obeyed. PASCAL, COBOL,
FORTRAN, and (the way most published code appears) C come to mind.
Other languages are designated as "functional" because the programmer
is defining functions whose (side-effect-free) evaluation yields the
desired result. SCHEME, ML, and HASKELL come to mind.
I think of REBOL as a hybrid which I'll call "expressional" because
we write expressions to be evaluated, but often the (side-)effect
of those evaluations is the goal. For instance APPEND has both a
resulting value and an effect; sometimes we want one, or the other,
or both.
Thinking clearly about what our expressions can do for us is a very
stimulating way to explore the power of programming concepts. Here
is a little puzzle along that vein.
GIVEN: Three variables, A, B, and C which we'll assume have
been set to numeric values.
CHALLENGE: Write an expression whose value is the median of the
values. A trival (but time-consuming) solution is
second sort reduce [a b c]
since the median of a set of values is "the one in the
middle if they are in order".
CONSTRAINT: Just to make it iteresting ;-) the expression MAY NOT
use any of these REBOL words/functions:
sort if either first second third
nor any path expressions.
The trick, of course, is to find an expression that eliminates the
need for any explicit "decision" or "selection" activity.
Have fun!
-jn-
[2/25] from: sunandadh:aol at: 1-Feb-2003 13:56
Joel:
> GIVEN: Three variables, A, B, and C which we'll assume have
> been set to numeric values.
<<quoted lines omitted: 4>>
> sort if either first second third
> nor any path expressions.
I'll have a go. The median is the largest left after you remove the largest,
so:
pick maximum-of head remove maximum-of reduce [a b c] 1
Sunanda.
[3/25] from: joel:neely:fedex at: 1-Feb-2003 13:18
Hi, Sunanda,
Ingenious! But I'll have to be picky... (Who, me? ;-)
[SunandaDH--aol--com] wrote:
> Joel:
> > GIVEN: Three variables, A, B, and C which we'll assume have
<<quoted lines omitted: 11>>
> largest, so:
> pick maximum-of head remove maximum-of reduce [a b c] 1
Since
pick some-expresion 1
is really the same as
first expression
I'll have to say that this violates the spirit (if not the letter) of
the
constraints. Just to repeat the hint at the end of the original post:
> The trick, of course, is to find an expression that eliminates the
> need for any explicit "decision" or "selection" activity.
>
How about another go?
-jn-
[4/25] from: sunandadh:aol at: 1-Feb-2003 15:01
Joel:
> How about another go?
My pleasure!
But something tells me you won't like this either. Using the same logic as
before but with a temporary variable:
temp: head remove maximum-of reduce [a b c] max temp/1 temp/2
Or (another one you won't like) the median is what is left if you remove the
maximum and the minimum. If I can't "pick ... 1" to convert the integer in a
block to an integer, I'll do it the hard way:
to integer! form head remove minimum-of head remove maximum-of reduce [a b c]
More broken spirits!?
Sunanda.
[5/25] from: g:santilli:tiscalinet:it at: 1-Feb-2003 21:01
Hi Joel,
On Saturday, February 1, 2003, 8:18:33 PM, you wrote:
JN> How about another go?
The hint was in the beginning of the email, so I'll take that.
I you still want to work your answer, do not read further.
any [
all [a < b b < c b]
all [b < c c < a c]
all [c < a a < b a]
all [c < b b < a b]
all [b < a a < c a]
all [a < c c < b c]
]
But this is making a decision actually, so maybe you had something
different in mind. Hmm, now that I think of it,
a + b + c - (min a min b c) - max a max b c
but MIN and MAX are doing decisions so we're still deciding
somehow. :-)
Regards,
Gabriele.
--
Gabriele Santilli <[g--santilli--tiscalinet--it]> -- REBOL Programmer
Amigan -- AGI L'Aquila -- REB: http://web.tiscali.it/rebol/index.r
[6/25] from: rgombert:essentiel at: 1-Feb-2003 21:12
Maybe you'll find this better ?
any [all all [a > b a < c a] [a > c a < b a] all [b > a b < c b] all [b > c
b < a b] c]
Renaud GOMBERT
--------------------------------
www.essentiel.net
N° SIRET : 418 620 159
N° MdA : G316527
NAF/APE : 923A
[7/25] from: rgombert:essentiel at: 1-Feb-2003 22:32
Ooops, sorry, one of my 'all had some problem when copypasting ;-)
So here it is... and indented :
any [
all [a > b a < c a]
all [a > c a < b a]
all [b > a b < c b]
all [b > c b < a b]
c
]
Renaud GOMBERT
--------------------------------
www.essentiel.net
N° SIRET : 418 620 159
N° MdA : G316527
NAF/APE : 923A
[8/25] from: rotenca:telvia:it at: 2-Feb-2003 11:35
Look down to see my solution:
any [all [(a > b) xor (a > c) a] all [(a > c) xor (b > c) c] b]
---
Ciao
Romano
[9/25] from: sunandadh:aol at: 2-Feb-2003 6:37
Just to demonstrate how sticking to the rules may not lead to an elegant
solution....
The median is the one in the middle -- provided the first one is lower and
the third one is higher. If those conditions aren't true, throw the numbers
in the air and try again:
until [set [a b c] random/secure reduce [a b c] all [a <= b b <= c]] b
There is no explicit "decision" just a termination condition whose precise
timing is indeterminate. No "selection" activity either -- random shuffling
isn't selection.
(We'd of course put this in a func with local variables to stop messing up
the caller's three variables. I'm not entirely unreasonable when it comes to
coding).
:-)
Sunanda.
[10/25] from: joel:neely:fedex at: 2-Feb-2003 7:34
Hi, Sunanda,
<rotfl>
This is the first "practical" application I've ever seen anyone
make of the infamous BogoSort algorithm! What a blast!
</rotfl>
[SunandaDH--aol--com] wrote:
[11/25] from: fuka:fuxoft:cz at: 2-Feb-2003 15:15
Joel Neely wrote:
>> until [set [a b c] random/secure reduce [a b c] all [a <= b b <= c]] b
random/secure
??? The Rebol Dictionary says that "random/secure -
Returns a cryptographically secure random number" without giving any
examples or explaining what "cryptographically secure random number"
might be...?
--
Frantisek Fuka
(yes, that IS my real name)
(and it's pronounced "Fran-tjee-shek Foo-kah")
----------------------------------------------------
My E-mail: [fuka--fuxoft--cz]
My Homepage: http://www.fuxoft.cz
My ICQ: 2745855
[12/25] from: joel:neely:fedex at: 2-Feb-2003 8:19
Many thanks to all who've participated in this thread. Anyone who
wants to keep playing, skip reading this email until you've had
your fun!
For those who'd like a hint about another approach,
keep
reading
this
email
below.
.
.
.
.
.
.
.
.
.
.
.
Joel Neely wrote:
> Inspired by the discussions of ANY and ALL, I thought I'd share this
> little puzzle, prefaced with a tad of background.
>
Gabriele took that hint in an interesting way I hadn't thought of.
(Thanks, Gabriele, for the new food for thought!)
Let me give a nudge in the direction I was thinking. We've discussed
how ANY and ALL implicitly provide a way to implement chains of tests.
I intended to reinforce that hint at the end of the email.
> The trick, of course, is to find an expression that eliminates the
> need for any explicit "decision" or "selection" activity.
>
Are there any other words that provide implicit tests?
Stop here if you want to take some time for thought.
For
more
along
this
line,
keep
reading.
Gabriele got "dangerously" close to the strategy I was pondering, with
his alternate suggestion:
> a + b + c - (min a min b c) - max a max b c
>
and rightfully so, since I am the one who specified that the values
would be numeric.
Second nudge: suppose that A, B, and C were not numeric (so we can't
use addition and subtraction).
As
before,
stop
reading
unless
you
want
more
hints
(
which
will
come
faster
as
we
get
closer
to
the
end
).
Think about the relationship between decisions and sorting. What can
can we say is accomplished by each iteration of something like the
classic BubbleSort?
We can "unwrap" BubbleSort for three values into the following
sequence of operations:
1) compare A and B, swapping their values if out of order
2) compare B and C, swapping their values if out of order
3) compare A and B, swapping their values if out of order
(and there are some combinations of values that require all three of
these steps).
We can diagram this using the notion of "sort networks", as presented
by Knuth, in this way:
A ---+-------+--- A'
| |
B ---+---+---+--- B'
|
C -------+------- C'
where the diagram is read left-to-right, with each vertical bar as an
instance of "compare and swap if out of order" using the two values at
either end. At the right-hand-end, A', B', and C' contain the sorted
values from A, B, and C.
This is an ENORMOUS hint, if we think about how to implement the
compare and swap if out of order
without using IF.
One
last
chance
to
play
with
these
hints
on
your
own
before
I
spoil
the
secret.
We could do the "compare and swap if out of order" with MIN and MAX,
as follows:
set [a b] reduce [a min b a max b]
set [b c] reduce [b min c b max c]
set [a b] reduce [a min b a max c]
which provides a literal transcription of the sort network diagram if
read from left to right. However, we didn't ask for the sorted list,
but just the median, so let's skip the SET operations, and just use
the expressions. Substituting the expression for the modified B into
the second line above gives us
set [b c] reduce [(a max b) min c (a max b) max c]
Then we can substitute *that* expression for B (and the one for the
new value of A) into the third line:
set [a b] reduce [
(a min b) min ((a max b) min c)
(a min b) max ((a max b) min c)
]
and then we recall that we only wanted the median, which means we only
wanted the ultimate value for B, giving us our final answer:
(a min b) max ((a max b) min c)
which will work for any datatype (number pair char money date time
tuple series) that MIN and MAX handle.
There's actually a practical (well, at least if you think "philosophy
of programming" is practical ;-) point to this little puzzle.
We programmers are biased to think operationally. It's very easy to
approach a p[roblem by thinking, "If I were solving this, I'd first
do blah, and then do blah-blah, and then if yadda-yadda I'd do ..."
Most experienced programmers could see an expression in a problem
specification such as
a * b + a * c + a * d + ... + a * z
and would almost instinctivly
1) know that it could be written as
a * (b + c + c + ... + z)
because we've been trained to know the properties of * and + as
operators, and how they interact.
2) probably desire to rewrite it as shown above, becase we've been
trained to think about saving compute time.
However, we typically aren't trained to think of other parts of our
language (such as MIN, MAX, or the inspiring ANY, ALL, EITHER, etc.)
in the same way. It makes me wonder how often we miss opportunities
to do simplifications just as valid as the above arithmetic example
because don't think about our operators enough...
Thanks for playing!
-jn-
[13/25] from: sunandadh:aol at: 2-Feb-2003 10:15
Joel,
Spoiler on Joel's solution below:
.
..
...
.....
......
.....
....
...
..
.
> set [a b] reduce [a min b a max b]
> set [b c] reduce [b min c b max c]
> set [a b] reduce [a min b a max c]
Just to be picky myself here for a few lines, a min b is not valid Rebol --
you mean min a b. That makes your first step:
set [a b] reduce [min a b max a b]
set [b c] reduce [min b c max b c]
set [a b] reduce [min a b max a b]
Which clearly does make b the median value.
Your subsequent simplifications confused me, perhaps because of the wrong
syntax -- but is seems to me that you need to do at least one swap to make
the thing work. My attempted reformulation of your final result doesn't work:
max (min a b) (min (max a b) c)
But I may have missed a step there somewhere.
Try again!?
Sunanda
[14/25] from: joel:neely:fedex at: 2-Feb-2003 15:14
Hi, Sunanda,
... proving once again that notation can be too much fun ... ;-)
Thanks for the correction of my prefix-vs-infix blunder! I'm so used
to thinking of the operations in question as first-class operators
(comparable to + - * and /) instead of programming language procedures
that I typed what I was thinking instead of what I should have been
writing!
Now on to the real issue...
Skip
the
rest
of
this
email
for
those
still
playing
with
the
puzzle
...
[SunandaDH--aol--com] wrote:
> Joel,
> Spoiler on Joel's solution below:
<<quoted lines omitted: 9>>
> .
> max (min a b) (min (max a b) c)
I'm puzzled that you said it didn't work for you. Here's a small
torture test that shows the right result every time:
median3: func [a b c] [max (min a b) (min (max a b) c)]
loop 40 [
trio: random [1 2 3]
print [mold trio tab median3 trio/1 trio/2 trio/3]
]
... which produces results resembling ...
[2 1 3] 2
[2 1 3] 2
[3 1 2] 2
[3 2 1] 2
[1 3 2] 2
[3 1 2] 2
[3 1 2] 2
[1 2 3] 2
[1 3 2] 2
...
[2 3 1] 2
[2 3 1] 2
[2 1 3] 2
[1 3 2] 2
... and a non-numeric variation ...
loop 40 [
trio: random ["1-A" "1-B" "1-C"]
print [mold trio tab median3 trio/1 trio/2 trio/3]
]
... which produces results resembling ...
["1-B" "1-C" "1-A"] 1-B
["1-C" "1-B" "1-A"] 1-B
["1-C" "1-A" "1-B"] 1-B
["1-A" "1-C" "1-B"] 1-B
...
["1-B" "1-A" "1-C"] 1-B
["1-A" "1-B" "1-C"] 1-B
["1-B" "1-C" "1-A"] 1-B
["1-A" "1-B" "1-C"] 1-B
Can you reproduce the problem using the above code?
-jn-
[15/25] from: sunandadh:aol at: 2-Feb-2003 16:55
Joel:
> I'm puzzled that you said it didn't work for you. Here's a small
> torture test that shows the right result every time:
Sorry for that. My mistake. I cut and pasted the code incorrectly into my
torture test. When pasted straight, it does indeed produce the median every
time.
By way of compensation or entertainment, here's another slightly demented way
of doing it.
The first three additions must give us Median + twice Maximum. So a quick
subtraction of twice the maximum, and the job is done.
(max a b) + (max b c) + (max a c) - (2 * (max a max b c))
Of course this only works on the original spec -- numbers -- but it's
better
than your solution because it uses less min's.....So this is *the*
solution to remember if challenged to do it without using if, sort, first,
second, third, either or min. One day that might save your life. To make it
work with non-numbers, tweak it into inserting and removing items in a block,
Sunanda
Whose hoping his code examples will win him the first ever "Rebol without a
clue" award.
[16/25] from: lmecir:mbox:vol:cz at: 4-Feb-2003 7:41
Ahoj Franto,
----- Original Message -----
From: "Frantisek Fuka"
> "random/secure"??? The Rebol Dictionary says that "random/secure -
> Returns a cryptographically secure random number" without giving any
> examples or explaining what "cryptographically secure random number"
> might be...?
The RANDOM/SECURE is just a better (statistically) RANDOM, but it is much
more expensive (looks six times slower).
-L
[17/25] from: carl:cybercraft at: 5-Feb-2003 21:51
On 03-Feb-03, Joel Neely wrote:
> Many thanks to all who've participated in this thread. Anyone who
> wants to keep playing, skip reading this email until you've had
> your fun!
Okay, I'm late to this, although I didn't skip the following, but just
skimmed it to the end to see Joel's solution. ;-) And, for anyone
still playing I'll leave a few lines in saying nothing so you won't
see Joel's solution, or mine. So...
> For those who'd like a hint about another approach,
> keep
<<quoted lines omitted: 12>>
> .
> .
[snip]
> and then we recall that we only wanted the median, which means we
> only wanted the ultimate value for B, giving us our final answer:
> (a min b) max ((a max b) min c)
Which was wrong as it happens, but here's Joel's corrected (in a later
post) version, written as a function:
median3: func [a b c] [max (min a b) (min (max a b) c)]
As I said, I just skimmed Joel's post that explained how he arrived at
his solution, but I did notice the use of min and max. So now for a
brief explaination of how I arrived at my soloution.
Firstly, I worked out what I considered the easiest way to get the
maximum and minimum values from A, B and C...
>> a: 1 b: 2 c: 3
== 3
>> max max a b c
== 3
>> min min a b c
== 1
This gave me the idea for my first solution...
>> difference reduce [a b c] reduce [max max a b c min min a b c]
== [2]
which returned the correct result, but in a block, and difference may
be against the spirit of the game anyway, since sort and so on was
out. (Difference is the word I thought of first when I read Joel's
original post containing his challenge.)
After more doodling, I hit on the thought that this...
min a b min b c min a c
would produce three values, none of which would be the maximum value
of A, B and C, but which would include the median value, which of
course would be the maximum value of these three new values. So, put
a max and a max in front of those and my solution is...
>> max max min a b min b c min a c
== 2
A triffle longer than Joel's, but very REBOL I think. (And I do so
hate parens;)
--
Carl Read
[18/25] from: joel:neely:fedex at: 5-Feb-2003 6:43
Thanks, Carl,
I almost always find the train of thought leading to a solution as
interesting as the final result (and usually more so).
I agree with you about the parentheses (mostly! ;-). In the case
of my solution, I left them there to make it clear where the sub-
expressions came from, but I should have pointed out that all of
the parentheses in my last version were actually redundant and
could simply be dropped. (That's actually what I did when I set up
some of the solutions for benchmarking, the results of which I'll
post Real Soon Now...)
-jn-
Carl Read wrote:
[19/25] from: carl:cybercraft at: 6-Feb-2003 11:08
On 06-Feb-03, Joel Neely wrote:
> Thanks, Carl,
> I almost always find the train of thought leading to a solution as
> interesting as the final result (and usually more so).
I thought I'd include them, since I've no formal training as a
programmer and don't understand "real" maths, either. Might help you
to understand how I get by with these computer things. ;-)
> I agree with you about the parentheses (mostly! ;-). In the case
> of my solution, I left them there to make it clear where the sub-
<<quoted lines omitted: 3>>
> some of the solutions for benchmarking, the results of which I'll
> post Real Soon Now...)
Well, yours will be the quicker, since it has a min or max less in it,
but I'd be interested to know what kind of speed differences the
parens make, so could you do a test of yours with them in as well?
--
Carl Read
[20/25] from: joel:neely:fedex at: 7-Feb-2003 17:24
Hi, Carl,
Here are the benchmark results:
Carl Read wrote:
> Well, yours will be the quicker, since it has a min or max
> less in it, but I'd be interested to know what kind of speed
> differences the parens make, so could you do a test of yours
> with them in as well?
>
First a bit of preamble; I took all of the submitted solutions
(with a bit of cleanup, such as fixing the infix/prefix typo and
replacing </> with <=/>= to allow equal values among the args)
and built a testing and benchmarking harness. The benchmarking
is built on the infrastructure I'm building (in my spare time with
my left foot) for the "shootout" effort, so I won't take the time
to explain that here -- documentation will be forthcoming Real Soon
Now -- but the test cases appear below.
Each case has a string label, a "setup" block to do any preliminary
evaluations, and a "final" block that gives the expression whose
value should be the median of the three arguments A B and C .
test-specs: [
"benchmark baseline" [] [0]
"second sort" [] [second sort reduce [a b c]]
"pick max remove max" [] [
pick maximum-of head remove maximum-of reduce [a b c] 1
]
"any all (explicit)" [] [
any [
all [a <= b b <= c b] all [b <= c c <= a c]
all [c <= a a <= b a] all [c <= b b <= a b]
all [b <= a a <= c a] all [a <= c c <= b c]
]
]
"any all (w/ default)" [] [
any [
all [a >= b a <= c a] all [a >= c a <= b a]
all [b >= a b <= c b] all [b >= c b <= a b]
c
]
]
"sum less extremes" [] [
a + b + c - (min a min b c) - max a max b c
]
"max of remove max" [
dummy: head remove maximum-of reduce [a b c]
][
max dummy/1 dummy/2
]
"int form remove extremes" [] [
to integer! form head remove minimum-of
head remove maximum-of reduce [a b c]
]
"any xor with default" [] [
any [
all [(a > b) xor (a > c) a]
all [(a > c) xor (b > c) c]
b
]
]
"unwrapped sort with sets" [
set [a b] reduce [min a b max a b]
set [b c] reduce [min b c max b c]
set [a b] reduce [min a b max a b]
][
b
]
"sums less twice max" [] [
(max a b) + (max b c) + (max a c) - (2 * (max a max b c))
]
"max of mins" [] [max max min a b min b c min a c]
"sort network (parens)" [] [max (min a b) (min (max a b) c)]
"sort network" [] [max min a b min max a b c]
]
The only one missing is the "bogosort" approach, which is way slow
compared to everything else (about 13 times the time needed for
the fastest solution).
The above block was first submitted to a testing harness to verify
that all expressions yielded the correct results for a comprehensive
set of test data; here's that output:
[1 1 1] [1 1 1] 1
[1 1 2] [1 1 2] 1
[1 1 3] [1 1 3] 1
[1 2 1] [1 1 2] 1
[1 2 2] [1 2 2] 2
[1 2 3] [1 2 3] 2
[1 3 1] [1 1 3] 1
[1 3 2] [1 2 3] 2
[1 3 3] [1 3 3] 3
[2 1 1] [1 1 2] 1
[2 1 2] [1 2 2] 2
[2 1 3] [1 2 3] 2
[2 2 1] [1 2 2] 2
[2 2 2] [2 2 2] 2
[2 2 3] [2 2 3] 2
[2 3 1] [1 2 3] 2
[2 3 2] [2 2 3] 2
[2 3 3] [2 3 3] 3
[3 1 1] [1 1 3] 1
[3 1 2] [1 2 3] 2
[3 1 3] [1 3 3] 3
[3 2 1] [1 2 3] 2
[3 2 2] [2 2 3] 2
[3 2 3] [2 3 3] 3
[3 3 1] [1 3 3] 3
[3 3 2] [2 3 3] 3
[3 3 3] [3 3 3] 3
0 errors in second sort
0 errors in pick max remove max
0 errors in any all (explicit)
0 errors in any all (w/ default)
0 errors in sum less extremes
0 errors in max of remove max
0 errors in int form remove extremes
0 errors in any xor with default
0 errors in unwrapped sort with sets
0 errors in sums less twice max
0 errors in max of mins
0 errors in sort network (parens)
0 errors in sort network
Then the various cases were benchmarked competitively. I used a
baseline of evaluating a literal zero as a result; the benchmark
engine subtracts out the timings for the baseline, and then gives
ratios between all the "real" cases relative to the smallest one.
With a bit of formatting (percentages and labels), we get these
comparative results, fastest to slowest:
100 sort network
122.4 max of mins
122.4 sort network (parens)
167.7 any xor with default
175.5 sum less extremes
191.6 any all (w/ default)
219 sums less twice max
219.5 pick max remove max
245.3 any all (explicit)
250.8 second sort
264.2 max of remove max
338.1 int form remove extremes
549.6 unwrapped sort with sets
Interestingly enough, adding the parens to the "sort network"
expression adds 20%-25% to the running time, which is essentially
the same cost as the extra min or max of the next alternative.
Thanks to all who participated!
-jn-
--
----------------------------------------------------------------------
Joel Neely joelDOTneelyATfedexDOTcom 901-263-4446
[21/25] from: carl:cybercraft at: 8-Feb-2003 14:20
On 08-Feb-03, Joel Neely wrote:
> Hi, Carl,
> Here are the benchmark results:
<<quoted lines omitted: 4>>
>> differences the parens make, so could you do a test of yours
>> with them in as well?
[snip of the code and comments - see Joel's previous post]
> 100 sort network
> 122.4 max of mins
<<quoted lines omitted: 12>>
> expression adds 20%-25% to the running time, which is essentially
> the same cost as the extra min or max of the next alternative.
RT documentation suggests it's best to remove parens if possible due
to speed reasons, which is partly why I asked the question. Those
coming from compiled languages might expect no slowdown because of
them and so leave them in for clarity, but this nicely shows that
there can be quite a bit of cost to leaving them in.
And thanks for doing the work Joel. I'm sure all here appreciate it.
--
Carl Read
[22/25] from: sunandadh:aol at: 8-Feb-2003 6:47
Joel:
> Here are the benchmark results:
Thanks for doing all that work, Joel!
I'm glad to see that one of my "solutions" was so slow that it didn't make
the cut into the benchmarks. That's as much as distinction as being the
fastest :-)
Other dimensions for scoring how "good" the various solutions are include:
Range of data types handled:
** Some existing solutions only work with integers
** Some only work with numbers
** Some won't work with pairs (try a: 6x2 b: 2x6 c: 4x4)
** Some work with most Rebol data type -- though they fail on things like a:
true b: false c: false
** There's just one (I think) that'll work with any Rebol data type.
Flexibility in ordering
** All solutions so far use Rebol's built-in logic for ordering. But if we're
trying to find the middle Czech surname from a list, how quickly can the
solution be adapted to Czech collating sequence?
Flexibility in processing the data
* Find the median, but ignore missing values when doing so -- how many of our
solutions return 10 from this data? a: 10 b: none c: 10 or from this: a: none
b: none c: 10
Number of items to compare
** Very few scale easily beyond three items How quickly can the various
solutions be adapted to 13 items? or N where N isn't known until run time?
The fastest solution usually isn't the one that runs longest (if you see what
I mean) before being replace by something more generalised.
Sunanda.
[23/25] from: joel:neely:fedex at: 8-Feb-2003 11:26
Hi, Sunanda,
Interesting observations, all! You've certainly taken the discussion
to another "level" (pun intended). Let me suggest that some of your
points take us out of the arena of algorithm design into the arenae
of system design and language design.
[SunandaDH--aol--com] wrote:
> I'm glad to see that one of my "solutions" was so slow that it
> didn't make the cut into the benchmarks...
>
I'll rectify that omission shortly! (The original version using
RANDOM/SECURE took about 13x as long as "sort network"; when I
changed it to use plain RANDOM a quick check looked to be about
8x times as long as "sort network".)
I was originally being paranoid about how long I would need to run
the entire competition, feeling pressure to get results out. I'm
tidying up the benchmark harness for publication, and will have that
published to the community soon.
> Other dimensions for scoring how "good" the various solutions are
> include:
<<quoted lines omitted: 5>>
> like a: true b: false c: false
> ** There's just one (I think) that'll work with any Rebol data type.
Of course, some of that is my responsibility; my original post said:
Three variables, A, B, and C which we'll assume have been
set to numeric values.
Equally obvious is the point that there's a different mind-set (and
experience level) involved in addressing each of the following issues:
- designing an algorithm/function/program to accomplish a narrowly-
defined goal;
- designing a solution for a class of tasks/goals you think are
likely in a given environment, given a specific (narrow) instance
of that class; or
- designing a generalized tool for reuse in the widest possible set
of environments/cases.
The further we go in that direction, the more likely we are:
- to tradeoff optimal performance for generality (which you address
later on); and
- to need support from the language (as in the next point).
> Flexibility in ordering
> ** All solutions so far use Rebol's built-in logic for ordering.
> But if we're trying to find the middle Czech surname from a
> list, how quickly can the solution be adapted to Czech
> collating sequence?
>
I see that as raising at least three questions:
- How much work would it be to *replace* a "smaller" solution with
one which is extended to cover a larger domain?
- With that extension done, how much is the performance over the
original domain affected?
- Alternately, could the underlying design of a "smaller" solution
be (re)used in the development of *another* solution tailored for
a different domain?
In all three cases, of course, the generality of the design (not
necessarily the initial implementation!) is crucial. Below is my
attempt to identify the key design dependencies of each of the
tested solutions.
1) Requires arithmetic and ordering of two elements:
sums less twice max +/- max
sum less extremes +/- min/max
2) Requires abiltity to test two elements for "in order" or "out
of order":
any all (explicit) >=/< any all (w/ default) >=/< any xor with
default >=/< second bogosort <
3) Requires ability to order two elements:
sort network min/max
max of mins min/max
sort network (parens) min/max
unwrapped sort with sets min/max
4) Requires ability to find extrema of any number of elements:
pick max remove max maximum-of
int form remove extremes minimum-of/maximum-of
5) Requires both 2-element ordering and n-element extrema:
max of remove max maximum-of/max
6) Requires ability to sort any number of values:
second sort sort
Clearly group (1) can only be used with numbers (or some abstract
type with an ordering relation and an order-preserving invertible
composition operator ;-) so I won't try to generalize that group.
Groups (2) thru (5) could be generalized in two ways:
- Modify the design to accept data-appropriate comparison/ordering
operators to be specified by the caller (passed as parameters or
set as attributes of an object containing a median method based
on those attributes.
- Reuse the design as a pattern, replacing the generic operators
with functions over the alternate data type to be supported in
a different version.
This takes us to a language-design issue, in that we might expect
the key functions of (2) thru (5) to be related nicely, as in
min: func [a b] [either a <= b [a] [b]]
minimum-of: func [s /local a] [
a: first s
foreach b s [a: min a b]
a
]
sort: func [s /abstract...
rearrange elements of s until a <= b is true for all
pairs of consecutive elements
]
Alas, in REBOL this is not the case! SORT and the extrema functions
(MINIMUM-OF and MAXIMUM-OF) will order values that can't be compared
with the standard ordering/comparison operators. Of course we could
turn things around, and use SORT as the basis of the known universe,
as follows:
sort: ...native...
minimum-value-of: func [s] [first sort copy s]
min: func [a b] [minimum-value-of reduce [a b]]
<=: func [a b] [a = min a b]
(except we'd need a new name for <= because of REBOL lexical syntax
rules!)
If REBOL were truly object-oriented, we'd just call the appropriate
methods on the values being compared/ordered/sorted, and let dynamic
dispatch and inheritance "sort it out" (pun intended ;-)!
> Flexibility in processing the data
> * Find the median, but ignore missing values when doing so -- how
> many of our solutions return 10 from this data? a: 10 b: none
> c: 10 or from this: a: none b: none c: 10
>
Now we're into a different set of issues:
1) What is the "contract" between caller and callee?
2) Who is responsible for enforcing that contract?
3) What should the callee do if the contract is broken?
4) What should the callee do if asked to perform the impossible?
Some writers argue that error-checking and policy-enforcement should
be done at the highest/outermost levels of a system, and the remainder
of the system should be build on the assumption that this has been
done. Their argument is based on the notion that performance is
diminished if validity is constantly being vetted.
The other side of the argument is that pushing the error-checking (at
least -- policy-enforcement tends to beg the question of "whose
policies?" and cause application-concept leakage) as far *down* as
possible makes the lower-level routines more re-usable.
Of course in the physical world, we confront this all the time! My
house has a breaker panel to prevent circuit overload, and I can use
a plain extension cord with any outlet. The error-checking duties
are pushed "upstream". I could buy extension cords with a fuse or
circuit breaker in each strand, I suppose... ;-) On the other hand,
earlier this morning the circuit-breaker in the power conditioner
that feeds this computer and monitor popped, which took down the
computer but none of the other outlets in the room.
Part of this gets to the heart of the distinction between dynamically-
typed languages (REBOL, Python, Scheme, ...) versus statically-typed
languages (PASCAL, JAVA, ...) and how much work we can do early on to
avoid work later. (Look at the definition of FOR in REBOL to see an
interesting example of this issue.)
> Number of items to compare
> ** Very few scale easily beyond three items How quickly can the
> various solutions be adapted to 13 items? or N where N isn't
> known until run time?
>
And how many special cases (e.g. 13) will one tolerate before just
implementing the generalized case and trading programmer/coding
cycles for CPU cycles?
> The fastest solution usually isn't the one that runs longest (if you
> see what I mean) before being replace by something more generalised.
>
The truth of that statement IMHO is highly dependent on the application
domain. Real-time embedded systems are much more likely to require the
fastest solution (even if that means having multiple type-specific
routines) versus throw-away exploratory programs. Of course, that means
that we typically use different tools for those domains!
-jn-
[24/25] from: sunandadh:aol at: 8-Feb-2003 15:50
Hi Joel,
Thanks for a comprehensive and interesting discussion as usual. You are right
that I attempted to move the issue from algorithm design to application
design. I guess that's my bias.
I enjoy the fact that we can come up with so may algorithms for doing the
same thing. We need lots of different algorithms, each with its own
datasheet
of strengths and weaknesses. That way, application designers can
pick the one they need.
Different application designers will pick different algorithms depending on
their needs. The right solution for a real-time controller (needing a
determinate response time) may be different from another embedded-chip
application (needing smallest footprint) and so on.
Me, I've been warped by three decades of commercial application design. Means
I normally instinctively go for the solution with the fewest built-in
assumptions -- the most "general" usual, but seldom the most elegant.
I know -- believe me I do -- that within 24 hours of implementing a median
algorithm that handled numerics, I'd be on the wrong end of this phone call:
It's broken! It's not displaying the median.....Yes, I know we said they'd
be numbers -- but not always....On a Tuesday afternoon of course the three
things could be city names....No of course we don't want the middle one
alphabetically in that case -- why on earth would anyone want *that*!?....We
need the median *population* -- unless the user logged on is a member of the
ethics committee then they'd probably want the city with the median distance
from the North Pole -- but check with them first, that's only a guess.....Why
should I know if there are any other
special cases" like that? -- you wrote
the code: it's your job to know what median means.....And we need it fixed
within 20 minutes."
Sunanda.
[25/25] from: rotenca:telvia:it at: 9-Feb-2003 12:06
Hi Sunanda,
> I know -- believe me I do -- that within 24 hours of implementing a median
> algorithm that handled numerics, I'd be on the wrong end of this phone call:
<<quoted lines omitted: 5>>
> ethics committee then they'd probably want the city with the median distance
> from the North Pole -- but check with them first, that's only a
guess.....Why
> should I know if there are any other "special cases" like that? -- you wrote
> the code: it's your job to know what median means.....And we need it fixed
> within 20 minutes."
You are right! This is the real world. :-(
---
Ciao
Romano
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted