Cross-language benchmarking

[1/5] from: joel::neely::fedex::com at: 5-Nov-2002 11:38

Hello, all, Pardon my manners in not replying to individual remarks, but "so many emails, so little time!" ;-) Let me raise a couple of points about what we might accomplish with such an effort, and who might be in the audience, as a way of thinking about how such an effort might be done, and what languages might be included. Option 1 ======== Goal: Demonstrate to the world that REBOL is a viable, competitive language for many common programming tasks (where "competitive" is defined in terms of run time performance). Audience: Developers at large. Languages: Commonly used languages, to maximize likelihood that the reader will be familiar with one or more of the other "competitors": c/c++, Java, Perl, Python, VB, ... Tasks: Toy problems that allow specific aspects of performance to be instrumented/measured. Also some small "real" tasks in REBOL's "sweet spot" of performance. Comment: The tests must be fair, and must be seen to be fair. We've all seen the kind of unvarnished advocacy that claims things like "X is faster than Tcl, uses fewer special characters than Perl, and is cheaper than COBOL on an IBM mainframe, therefore it's the best!" which only hurts the credibility of the claimant. Option 2 ======== Goal: Demonstrate to the world that REBOL is a viable notation for use in RAD/prototyping tasks, and makes a good "tool to think with". Audience: Same as (1) Languages: Same as (1) Tasks: Reasonably small "spike" (in the XP sense) tasks that would be recognizable as related to application-level needs. Comment: It's also fair to include code size and programmer effort in such discussions, but these are notoriously difficult to instrument objectively. Option 3 ======== Goal: Identify and document REBOL programming techniques that have substantial effect on performance, and build a vocabulary of "performance patterns" for future use. Audience: REBOL programmers. Languages: REBOL only Tasks: Those situations where reasonably small effort in refactoring/strategy could produce significant gains in performance. Option 4 ======== Goal: Help identify "hot spots" in REBOL where performance optimization would have significant perceived value. Audience: RT Languages: REBOL and limited number of "compare for reference" alternative languages. Comment: I want to be clear on this one: I'm not suggesting that RT staff are unaware of performance issues! Not at all! However, several on this list (including RT staff) have observed that the good folks at RT don't get any more hours in the day than the rest of us (and therefore have to pick and choose where to put their mere 20 work hours per day ;-) If the list members can help spread the load of finding issues worthy of attention, help (by participation) to indicate which performance issues are considered higher priority than others, or even find one glitch that has escaped notice to date, then I think the effort would be a net gain. General Comment =============== Benchmarking is tricky business at best, and A Dark Art at worst. For results to be meaningful, the sample base must be large enough (and the indidual tests must be large enough) that transient or special-case effects get averaged out (e.g., garbage collection that happens now-and-again during series flogging, differences in performance due to different CPU speeds, free memory, disk I/O bandwidth, network bandwidth/congestion, concurrent processing on computers with real operating systems, etc). It will be of little use (except to the submitter! ;-) to have a single benchmark comparing REBOL to Glypnir on an Illiac IV. The strong benefit IMHO to using primarily cross-platform languages is that it allows us to perform the tests under the widest possible range of conditions, thus improving the quality of our averages. That said, there's probably room for a widely-used proprietary language (e.g., VB) since that's likely familiar to a significant portion of the target audience for options (1) and (2). We just need to be careful to have the widest possible set of alternatives run on *the*same*boxen* as the proprietary cases, so that we can make meaningful use of the results. (E.g., a single comparison of REBOL vs C++ on a box running XPpro would be hard to interpret beyond the obvious "which one is faster?") -jn- -- ---------------------------------------------------------------------- Joel Neely joelDOTneelyATfedexDOTcom 901-263-4446

[2/5] from: edanaii:cox at: 5-Nov-2002 12:25

Joel Neely wrote: [Snip]

>Option 1 >>

<<quoted lines omitted: 28>>

> in such discussions, but these are notoriously difficult > to instrument objectively.

I personally believe that option two is the best choice. However, there is no reason why submissions can't be categorized. Speed a criteria, where it makes sense, Intuitiveness, code size, programmer effort where that makes sense. In terms of the site that I was contemplating, since it was meant to be a "how to" sight, option 2 fits this scenario best. Option one, IMHO, is more likely attract Computer Nerds. This is not a bad thing, in and of itself, but you want professionals, trying to do there job, who would hopefully stay and look at the competitions, if only out of professional curiousity. As for evaluating programmer effort, If a standard algorythm is used for all comparisons, I think Lines of Code generated to meet the algorithm is a good measure of effort. Also, since not all languages may be able to implement all parts of the algorithm, since they do not all implement solutions in the same way, "completeness", for lack of a better term, would be an import standard, as well...

>General Comment >>

<<quoted lines omitted: 19>>

>of REBOL vs C++ on a box running XPpro would be hard to interpret >beyond the obvious "which one is faster?")

Well said and well written Joe. As to cross-platform testing, the standard I would prefer to judge such code by would be "completeness", similar to assessing effort. In other words, does the same program perform as specified as it is tested on differing hardware. -- Sincerely, | We're Human Beings, with the blood of a million Ed Dana | savage years on our hands! But we can stop it! We Software Developer | can admit we're killers, but we're not going to 1Ghz Athlon Amiga | kill today. That's all it takes! Knowing that we're | not going to kill... Today! -- Star Trek.

[3/5] from: carl:s:rebol at: 5-Nov-2002 16:16

Thanks Joel, for breaking it down. I like Option 1. And, we know between us that we're not trying to convert everyone for all uses, but offer a useful tool that we personally find to save us time in the long run. A certain percentage of developers need just this kind of comparison, and it's something that RT has been asked many times (although you know that we never bash other languages). So, how can it be done? -Carl At 11:38 AM 11/5/02 -0600, you wrote:

[4/5] from: joel:neely:fedex at: 6-Nov-2002 7:56

Hi, Carl, and all, OBTW, I didn't mean to imply that the options were mutually exclusive. I believe we can satisfy multiple goals, as long as we keep ourselves clear on what we're working toward. See below: Carl at REBOL wrote:

> And, we know between us that we're not trying to convert everyone > for all uses, but offer a useful tool that we personally find to > save us time in the long run. A certain percentage of developers > need just this kind of comparison, and it's something that RT has > been asked many times (although you know that we never bash other > languages). So, how can it be done? >

I suggest we pick a small set of languages, test with a small set of benchmark tasks, publish the results, and let it grow over time to include more languages and tasks as needed. I also suggest we run the tests on multiple platforms (wxx, Unices, Linux, Mac OS X, ... ?others?) and average the normalized results to provide some degree of platform neutrality. I suggest normalizing all run times against c (= 1) to avoid dependence on CPU speed, etc.

> >Goal: Demonstrate to the world that REBOL is a viable, > > competitive language for many common programming tasks > > (where "competitive" is defined in terms of run time > > performance). > >

...

> >Languages: Commonly used languages, to maximize likelihood that the > > reader will be familiar with one or more of the other > > "competitors": c/c++, Java, Perl, Python, VB, ...

My nominees for languages are: Language Reason -------- -------------------------------------------------- c It serves as a baseline for optimal speed. Java Widespread enterprise usage. Perl Probably the most popular platform-neutral language, and main "competitor" for 'net-related applications on the back end (cgi, etc.) Python Second only to Perl ... I personally would be interested in some of the "academic" languages (e.g. Scheme, Haskell, Prolog), but I'm *not* including those in my list of nominees because I think they are insufficiently "mainstream" to be relevant to most of the audience of Option 1 who would be looking to build and deploy personally or professionally. -jn- -- ; Joel Neely joeldotneelyatfedexdotcom REBOL [] do [ do func [s] [ foreach [a b] s [prin b] ] sort/skip do function [s] [t] [ t: "" foreach [a b] s [repend t [b a]] t ] { | e s m!zauafBpcvekexEohthjJakwLrngohOqrlryRnsctdtiub} 2 ]

[5/5] from: joel:neely:fedex at: 6-Nov-2002 8:13

Hi again, Carl ... ... on a slightly different tack. Carl at REBOL wrote:

...

> >Goal (1): Demonstrate to the world that REBOL is a viable, > > competitive language for many common programming tasks > > (where "competitive" is defined in terms of run time > > performance). > >

...

> >Goal (3): Identify and document REBOL programming techniques that > > have substantial effect on performance, and build a > > vocabulary of "performance patterns" for future use.

...

> >Goal (4): Help identify "hot spots" in REBOL where performance > > optimization would have significant perceived value.

Speaking for myself, the Ackermann discussion has already given me some ROI on goal (3) with ideas about refactoring expressions to minimize depth/nesting. Now, WRT goal (4) ... This is totally a shot in the dark, as I have no clue about the internal structure of the REBOL interpreter, but here goes anyway! Testing inspired by Ladislav's comments would indicate that stack is being consumed by native functions (such as EITHER) with the consequence that user-level recursion depth is diminished. Perl has a mechanism that allows a subroutine to delegate to another subroutine in a way that does not increase the call stack depth. In pseudo-REBOL notation, one can replace something like foo: func [x y] [ either phase-of-moon x y [ bletch x y ][ ;; otherwise, do something else ] ] bletch: func [x y] [ ;; transmogrify x y and return something ] with foo: func [x y] [ if phase-of-moon x y [become bletch] ;; otherwise, do something else ] (for the same BLETCH). IOW, instead of creating a new frame and invoking BLETCH with a subsequent return to FOO and thence to FOO's caller, the evaluation simply twiddles the state so that BLETCH is invoked (with FOO's arguments) and BLETCH returns directly to FOO's caller upon completion (sort of vaguely like an exec() in Unix...) Now, I'm *NOT* suggesting that we have such a mechanism in high- level REBOL *AT*ALL*! But I'm wondering if it would be feasible to allow NATIVE! functions to make use of such a mechanism in special cases, so as to minimize the stack penalty of using IF, EITHER, etc. Specifically, could e.g. the native code for EITHER directly proceed to the evaluation of the first or second block alternatives without nesting a call, since we're guaranteed (I am assuming) that the result of the nested block evaluation is actually going to be passed back to the expression where EITHER appeared without any further manipulation? -jn- -- ; Joel Neely joeldotneelyatfedexdotcom REBOL [] do [ do func [s] [ foreach [a b] s [prin b] ] sort/skip do function [s] [t] [ t: "" foreach [a b] s [repend t [b a]] t ] { | e s m!zauafBpcvekexEohthjJakwLrngohOqrlryRnsctdtiub} 2 ]

Notes

Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted