Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] byte frequency as benchmark

From: joel::neely::fedex::com at: 8-Jul-2001 8:30

Hi, all, FWIW I also ran the "byte frequencies" computation (on the same files, with the same box) using Perl. I found the results interesting; perhaps some of you will as well... Joel Neely wrote:
> I just happened to have a 32Mb core dump lying around, > so it was easy to dd some test files of different sizes > for benchmarking. The relative times, normalized to the > fastest function, for various test files are: >
...
> Anything below about the 10% level is probably noise, > but the trend seems consistent so far... >
The more-or-less equivalent code to CFCPY and CFBUF in Perl reads as follows. For CFCPY the strategy was to chomp through the file in 4k chunks, tallying the characters in each chunk, so we get open (IN, "<$filename") or die "Can't open '$filename'!\n"; $/= \4096; @cfb = (); while(<IN>){ map ++$cfb[$_], unpack("C*", $_);} close (IN); * Open the file whose name is in $filename. * Set input to read 4k at a time. * Set up an empty array for the tallies. * Loop through the file (while...), chopping the 4k string read into an array of unsigned numeric bytes (unpack...), and incrementing the appropriate counter for each (map...). * Close the file. For CFBUF the strategy was to slurp the entire file into memory and tally all the characters at once, so we get open (IN, "<$filename") or die "Can't open '$filename'!\n"; undef $/; $foo = <IN>; @cfu = (); map ++$cfu[$_], unpack ("C*", $foo); close (IN); * Open the file as before. * Specify unbroken input (e.g. all at once). * Read the entire file into a single string. ... The rest is similar to the first program. The all-at-once strategy began to thrash at the 2Mb level, so I didn't bother to run it for the larger files. The modified report now contains timings for these as well, labeled P(4k) and P(all). ------ test file size ------ Function 1 Mb 2 Mb 4 Mb 8 Mb -------- ---- ---- ---- ---- cfdir 3.38 3.71 3.75 3.84 cfcpy 1.00 1.00 1.00 1.00 cfbuf 1.08 1.06 1.10 1.06 P(4k) 0.55 0.47 0.51 0.42 P(all) 0.75 2.71 n/a n/a Which offers a couple of interesting take-aways IMHO. * Running at about half the speed of a language that is highly optimized for performance means doing fairly well! * REBOL's iteration-through-characters-in-a-string is a win over Perl's transform-huge-string-to-huge-array approach in the comparison of CFBUF to P(ALL)! * A significant portion of Perl's performance came from having a built-in map operation that applies a function to every element of an array. Replacing that operation with an explicit loop slowed the Perl version(s) down significantly. Therefore, one gains from using (and having) language-native capabilities for frequently-used processes. Enjoy! -jn- --------------------------------------------------------------- There are two types of science: physics and stamp collecting! -- Sir Arthur Eddington joel-dot-neely-at-fedex-dot-com