## [REBOL] Re: More on expressions

### From: joel:neely:fedex at: 8-Feb-2003 11:26

Hi, Sunanda, Interesting observations, all! You've certainly taken the discussion to another "level" (pun intended). Let me suggest that some of your points take us out of the arena of algorithm design into the arenae of system design and language design.[SunandaDH--aol--com]wrote:> I'm glad to see that one of my "solutions" was so slow that it > didn't make the cut into the benchmarks... >I'll rectify that omission shortly! (The original version using RANDOM/SECURE took about 13x as long as "sort network"; when I changed it to use plain RANDOM a quick check looked to be about 8x times as long as "sort network".) I was originally being paranoid about how long I would need to run the entire competition, feeling pressure to get results out. I'm tidying up the benchmark harness for publication, and will have that published to the community soon.> Other dimensions for scoring how "good" the various solutions are > include: > > Range of data types handled: > ** Some existing solutions only work with integers > ** Some only work with numbers > ** Some won't work with pairs (try a: 6x2 b: 2x6 c: 4x4) > ** Some work with most Rebol data type -- though they fail on things > like a: true b: false c: false > ** There's just one (I think) that'll work with any Rebol data type. >Of course, some of that is my responsibility; my original post said: Three variables, A, B, and C which we'll assume have been set to numeric values. Equally obvious is the point that there's a different mind-set (and experience level) involved in addressing each of the following issues: - designing an algorithm/function/program to accomplish a narrowly- defined goal; - designing a solution for a class of tasks/goals you think are likely in a given environment, given a specific (narrow) instance of that class; or - designing a generalized tool for reuse in the widest possible set of environments/cases. The further we go in that direction, the more likely we are: - to tradeoff optimal performance for generality (which you address later on); and - to need support from the language (as in the next point).> Flexibility in ordering > ** All solutions so far use Rebol's built-in logic for ordering. > But if we're trying to find the middle Czech surname from a > list, how quickly can the solution be adapted to Czech > collating sequence? >I see that as raising at least three questions: - How much work would it be to *replace* a "smaller" solution with one which is extended to cover a larger domain? - With that extension done, how much is the performance over the original domain affected? - Alternately, could the underlying design of a "smaller" solution be (re)used in the development of *another* solution tailored for a different domain? In all three cases, of course, the generality of the design (not necessarily the initial implementation!) is crucial. Below is my attempt to identify the key design dependencies of each of the tested solutions. 1) Requires arithmetic and ordering of two elements: sums less twice max +/- max sum less extremes +/- min/max 2) Requires abiltity to test two elements for "in order" or "out of order": any all (explicit) >=/< any all (w/ default) >=/< any xor with default >=/< second bogosort < 3) Requires ability to order two elements: sort network min/max max of mins min/max sort network (parens) min/max unwrapped sort with sets min/max 4) Requires ability to find extrema of any number of elements: pick max remove max maximum-of int form remove extremes minimum-of/maximum-of 5) Requires both 2-element ordering and n-element extrema: max of remove max maximum-of/max 6) Requires ability to sort any number of values: second sort sort Clearly group (1) can only be used with numbers (or some abstract type with an ordering relation and an order-preserving invertible composition operator ;-) so I won't try to generalize that group. Groups (2) thru (5) could be generalized in two ways: - Modify the design to accept data-appropriate comparison/ordering operators to be specified by the caller (passed as parameters or set as attributes of an object containing a median method based on those attributes. - Reuse the design as a pattern, replacing the generic operators with functions over the alternate data type to be supported in a different version. This takes us to a language-design issue, in that we might expect the key functions of (2) thru (5) to be related nicely, as in min: func [a b] [either a <= b [a] [b]] minimum-of: func [s /local a] [ a: first s foreach b s [a: min a b] a ] sort: func [s /abstract... rearrange elements of s until a <= b is true for all pairs of consecutive elements ] Alas, in REBOL this is not the case! SORT and the extrema functions (MINIMUM-OF and MAXIMUM-OF) will order values that can't be compared with the standard ordering/comparison operators. Of course we could turn things around, and use SORT as the basis of the known universe, as follows: sort: ...native... minimum-value-of: func [s] [first sort copy s] min: func [a b] [minimum-value-of reduce [a b]] <=: func [a b] [a = min a b] (except we'd need a new name for <= because of REBOL lexical syntax rules!) If REBOL were truly object-oriented, we'd just call the appropriate methods on the values being compared/ordered/sorted, and let dynamic dispatch and inheritance "sort it out" (pun intended ;-)!> Flexibility in processing the data > * Find the median, but ignore missing values when doing so -- how > many of our solutions return 10 from this data? a: 10 b: none > c: 10 or from this: a: none b: none c: 10 >Now we're into a different set of issues: 1) What is the "contract" between caller and callee? 2) Who is responsible for enforcing that contract? 3) What should the callee do if the contract is broken? 4) What should the callee do if asked to perform the impossible? Some writers argue that error-checking and policy-enforcement should be done at the highest/outermost levels of a system, and the remainder of the system should be build on the assumption that this has been done. Their argument is based on the notion that performance is diminished if validity is constantly being vetted. The other side of the argument is that pushing the error-checking (at least -- policy-enforcement tends to beg the question of "whose policies?" and cause application-concept leakage) as far *down* as possible makes the lower-level routines more re-usable. Of course in the physical world, we confront this all the time! My house has a breaker panel to prevent circuit overload, and I can use a plain extension cord with any outlet. The error-checking duties are pushed "upstream". I could buy extension cords with a fuse or circuit breaker in each strand, I suppose... ;-) On the other hand, earlier this morning the circuit-breaker in the power conditioner that feeds this computer and monitor popped, which took down the computer but none of the other outlets in the room. Part of this gets to the heart of the distinction between dynamically- typed languages (REBOL, Python, Scheme, ...) versus statically-typed languages (PASCAL, JAVA, ...) and how much work we can do early on to avoid work later. (Look at the definition of FOR in REBOL to see an interesting example of this issue.)> Number of items to compare > ** Very few scale easily beyond three items How quickly can the > various solutions be adapted to 13 items? or N where N isn't > known until run time? >And how many special cases (e.g. 13) will one tolerate before just implementing the generalized case and trading programmer/coding cycles for CPU cycles?> The fastest solution usually isn't the one that runs longest (if you > see what I mean) before being replace by something more generalised. >The truth of that statement IMHO is highly dependent on the application domain. Real-time embedded systems are much more likely to require the fastest solution (even if that means having multiple type-specific routines) versus throw-away exploratory programs. Of course, that means that we typically use different tools for those domains! -jn-