Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Cross-language benchmarking

From: edanaii:cox at: 5-Nov-2002 12:25

Joel Neely wrote: [Snip]
>Option 1 >> >Goal: Demonstrate to the world that REBOL is a viable, > competitive language for many common programming tasks > (where "competitive" is defined in terms of run time > performance). > >Audience: Developers at large. > >Languages: Commonly used languages, to maximize likelihood that the > reader will be familiar with one or more of the other > "competitors": c/c++, Java, Perl, Python, VB, ... > >Tasks: Toy problems that allow specific aspects of performance > to be instrumented/measured. Also some small "real" > tasks in REBOL's "sweet spot" of performance. > >Comment: The tests must be fair, and must be seen to be fair. > We've all seen the kind of unvarnished advocacy that > claims things like "X is faster than Tcl, uses fewer > special characters than Perl, and is cheaper than > COBOL on an IBM mainframe, therefore it's the best!" > which only hurts the credibility of the claimant. > >Option 2 >> >Goal: Demonstrate to the world that REBOL is a viable notation > for use in RAD/prototyping tasks, and makes a good "tool > to think with". > >Audience: Same as (1) > >Languages: Same as (1) > >Tasks: Reasonably small "spike" (in the XP sense) tasks that > would be recognizable as related to application-level > needs. > >Comment: It's also fair to include code size and programmer effort > in such discussions, but these are notoriously difficult > to instrument objectively. >
I personally believe that option two is the best choice. However, there is no reason why submissions can't be categorized. Speed a criteria, where it makes sense, Intuitiveness, code size, programmer effort where that makes sense. In terms of the site that I was contemplating, since it was meant to be a "how to" sight, option 2 fits this scenario best. Option one, IMHO, is more likely attract Computer Nerds. This is not a bad thing, in and of itself, but you want professionals, trying to do there job, who would hopefully stay and look at the competitions, if only out of professional curiousity. As for evaluating programmer effort, If a standard algorythm is used for all comparisons, I think Lines of Code generated to meet the algorithm is a good measure of effort. Also, since not all languages may be able to implement all parts of the algorithm, since they do not all implement solutions in the same way, "completeness", for lack of a better term, would be an import standard, as well...
>General Comment >> >Benchmarking is tricky business at best, and A Dark Art at worst. >For results to be meaningful, the sample base must be large enough >(and the indidual tests must be large enough) that transient or >special-case effects get averaged out (e.g., garbage collection >that happens now-and-again during series flogging, differences in >performance due to different CPU speeds, free memory, disk I/O >bandwidth, network bandwidth/congestion, concurrent processing on >computers with real operating systems, etc). > >It will be of little use (except to the submitter! ;-) to have a >single benchmark comparing REBOL to Glypnir on an Illiac IV. The >strong benefit IMHO to using primarily cross-platform languages is >that it allows us to perform the tests under the widest possible >range of conditions, thus improving the quality of our averages. > >That said, there's probably room for a widely-used proprietary >language (e.g., VB) since that's likely familiar to a significant >portion of the target audience for options (1) and (2). We just >need to be careful to have the widest possible set of alternatives >run on *the*same*boxen* as the proprietary cases, so that we can >make meaningful use of the results. (E.g., a single comparison >of REBOL vs C++ on a box running XPpro would be hard to interpret >beyond the obvious "which one is faster?") >
Well said and well written Joe. As to cross-platform testing, the standard I would prefer to judge such code by would be "completeness", similar to assessing effort. In other words, does the same program perform as specified as it is tested on differing hardware. -- Sincerely, | We're Human Beings, with the blood of a million Ed Dana | savage years on our hands! But we can stop it! We Software Developer | can admit we're killers, but we're not going to 1Ghz Athlon Amiga | kill today. That's all it takes! Knowing that we're | not going to kill... Today! -- Star Trek.