Shamless request for improving function speed.

[1/15] from: reboler::programmer::net at: 26-Mar-2002 15:05

This group is really good at tweaking functions to make them faster. This is a shameless request for help making the following functions _faster_. {not necessarily shorter, just faster :)} Any help would be deeply appreciated. The second set (object) is a REBOL version of Ron Rivest's (formerly of RSA) RC4 encryption algorithm, used in some secure internet transactions. ;***** begin first function set comp-pass: func [ {Compress a string with a passphrase. Only casually secure! Needs DECOMP-PASS function and passphrase to decompress} text [string! binary!] {Data to compress with a passphrase. Requires 'decomp-pass to decompress.} passphrase [string! binary!] {passphrase to compress the data with} /local comp-text p ][ comp-text: compress text p: 0 for n 3 ((length? comp-text) - 4) 1 [ poke comp-text n to-char xor (pick comp-text n) (pick passphrase (p + 1)) p: (p + 1) // (length? passphrase) ] comp-text ] decomp-pass: func [ {Decompress a string compressed with 'comp-pass, requires passphrase.} comp-text [binary!] {Data to decompress with a passphrase} passphrase [string! binary!] {passphrase to decompress the data with} /local text p ][ p: 0 for n 3 ((length? comp-text) - 4) 1[ poke comp-text n to-char xor (pick comp-text n) (pick passphrase (p + 1)) p: (p + 1) // (length? passphrase) ] either error? try [text: decompress comp-text][ "** decomp-pass error ** Wrong passphrase? " ][ text ] ] ;***** set two ******** arc4: make object! [ s-swap: func [{swaps the elements of two series at the given index positions} a [series!] "1st series" index-a [integer!] "index position to swap in 1st series" b [series!] "2nd series" index-b [integer!] "index position to swap in 2nd series" /local holder ][ holder: pick a index-a poke a index-a pick b index-b poke b index-b holder ] get-key: func [key] [ while [(length? key) > 256] [ print "Key must be 256 characters or less, with IV" key: ask/hide "key :>> " ] key ] mix-state: func [key /local j n] [ j: 0 state: copy make block! 256 for i 0 255 1 [append state i] if not value? 'iterations [iterations: 1] loop iterations [ for i 0 255 1 [ n: i // (length? key) state-i: to-integer pick state (i + 1) key-n: to-integer pick key (n + 1) j: (j + state-i + key-n) // 256 s-swap state (i + 1) state (j + 1) ] ] ] produce-ciphertext: func [plaintext key /local j n] [ j: 0 ciphertext: make string! length? plaintext for count 0 ((length? plaintext) - 1) 1 [ i: ((count + 1) // 256) state-i: pick state (i + 1) j: ((j + state-i) // 256) s-swap state (i + 1) state (j + 1) n: (((pick state (i + 1)) + (pick state (j + 1))) // 256) append ciphertext (xor (to-char pick state (n + 1)) (to-char pick plaintext (count + 1))) ] clear state ciphertext ] arc4: func [{ARCFour Function to encrypt/decrypt a string! or binary!. The function should be given a string! or binary!, and outputs a string!. "arcfour plaintext key" will encrypt, "arcfour ciphertext key" decrypts} plaintext [string! binary!] "String to en/decrypt." key [string! binary!] {Supply a key [string! binary!].} ][ mix-state get-key key produce-ciphertext plaintext key unset [plaintext key iterations] ciphertext ] ] ;***** end If you use these, please note that you will have to convert the output of decrypted data back to 'binary! yourself, in such cases as when the original data was 'binary!. -- _______________________________________________ Sign-up for your own FREE Personalized E-mail at Mail.com http://www.mail.com/?sr=signup Win the Ultimate Hawaiian Experience from Travelocity. http://ad.doubleclick.net/clk;4018363;6991039;n?http://svc.travelocity.com/promos/winhawaii/

[2/15] from: holger:rebol at: 26-Mar-2002 16:57

On Tue, Mar 26, 2002 at 03:05:09PM -0500, alan parman wrote:

> This group is really good at tweaking functions to make them faster. > This is a shameless request for help making the following functions _faster_. > {not necessarily shorter, just faster :)} > > Any help would be deeply appreciated. > > The second set (object) is a REBOL version of Ron Rivest's (formerly of RSA) RC4 encryption algorithm, used in some secure internet transactions.

Please do not post source code for cryptographic functions to the mailing list. This list is hosted in the US and some subscribers are outside of the US, so posting source code constitutes a violation of US export laws on cryptography. -- Holger Kruse [kruse--nordicglobal--com]

[3/15] from: holger:rebol at: 26-Mar-2002 17:22

On Tue, Mar 26, 2002 at 04:57:41PM -0800, Holger Kruse wrote:

> On Tue, Mar 26, 2002 at 03:05:09PM -0500, alan parman wrote: > > This group is really good at tweaking functions to make them faster.

<<quoted lines omitted: 8>>

> the US, so posting source code constitutes a violation of US export laws > on cryptography.

Btw, there is another problem, specifically with RC4: AFAIR RSADSI still considers that algorithm to be their trade secret, and the name "RC4" is trademarked. Because of that this particular algorithm should not be discussed publically. You can, however discuss publically (just discuss, not post source code) a different algorithm, called ARCFOUR, which is assumed to be compatible with RC4. ARCFOUR is implemented by a lot of software packages (OpenSSL etc.) as a compatible replacement for RC4 (e.g. for SSL). ARCFOUR is also supported by REBOL/Command 2.0. -- Holger Kruse [kruse--nordicglobal--com]

[4/15] from: ryanc:iesco-dms at: 26-Mar-2002 17:30

It looks pretty darn good to me. Just a few minor things... My tests have shown that while making a series to size is and then appending it is fast, but using 'poke or 'change and reusing the space is even faster--about double and thats not counting the time saved by not having to allocate the memory. It seems to take an amazingly minute amount of time in rebol to assign a word to a value. It is therefore advantageous when using a value more than once, to go ahead and assign it to a word. This remains true even in cases of the most simple calculations (ie 1 + 1). --Ryan alan parman wrote:

[5/15] from: reboler:programmer at: 26-Mar-2002 21:25

Holger, Sorry, I should have been more careful and specified that the code is ARCFOUR, not RC4. (stands for _A_pparently RC4 (four)). (Isn't RSA's patent expired?) And not to stir up trouble, but my understanding of the latest export laws is that source code is free speech and therefore has no export restrictions. I will, however, abide by the rules of the List. Thanks, Ryan, for pointing out that 'poke may be faster then 'append. I see a couple of places where I can substitute 'poke for 'append, and will try them out for speed. -- _______________________________________________ Sign-up for your own FREE Personalized E-mail at Mail.com http://www.mail.com/?sr=signup Win the Ultimate Hawaiian Experience from Travelocity. http://ad.doubleclick.net/clk;4018363;6991039;n?http://svc.travelocity.com/promos/winhawaii/

[6/15] from: holger:rebol at: 26-Mar-2002 19:17

On Tue, Mar 26, 2002 at 09:25:15PM -0500, alan parman wrote:

> Holger, > Sorry, I should have been more careful and specified that the code is ARCFOUR, not RC4. (stands for _A_pparently RC4 (four)). (Isn't RSA's patent expired?)

It's not a patent, but a combination of trade secret and trademark. Unlike patents, they do not expire.

> And not to stir up trouble, but my understanding of the latest export laws is that source code is free speech and therefore has no export restrictions.

AFAIK not universally or automatically. Pretty much anything that is encryption-related still requires at least the filing of paperwork, in some cases even a waiting period and/or a permit. There is also the issue that some countries, like France, even put restrictions on the use of encryption code, not just on their export, so publically distributed encryption source code would have to come with a disclaimer :-). It is probably best to simply keep encryption source code off this list... -- Holger Kruse [kruse--nordicglobal--com]

[7/15] from: lmecir:mbox:vol:cz at: 27-Mar-2002 13:31

Hi Alan, I noticed that your functions call the FOR function. The FOR function still contains some bugs I reported pretty long ago and it is a very slow beast compared to virtually any other cycle function. I suggest you to use WHILE, UNTIL, REPEAT, LOOP, or my CFOR instead. Cheers L

[8/15] from: gchiu:compkarori at: 28-Mar-2002 8:26

On Wed, 27 Mar 2002 13:31:04 +0100 "Ladislav Mecir" <[lmecir--mbox--vol--cz]> wrote:

> I noticed that your functions call the FOR function. The > FOR function still > contains some bugs I reported pretty long ago and it is a > very slow beast

Hi Ladislav, There's a beta of the new rebol core out there on IOS. -- Graham Chiu

[9/15] from: joel:neely:fedex at: 28-Mar-2002 7:34

Hi, Alan, alan parman wrote:

> This group is really good at tweaking functions to make them > faster. >

While I have absolutely nothing to say on the subject of cryptography ;-) your email reminded me of an interesting question in REBOL data structure manipulation that I had thought about some time ago. So I dusted off a few ideas that I had left on a mental shelf and did a little bit of benchmarking... The REBOL data structure manipulation question is this: What is a fast way to iterate through two series values in parallel, cycling the shorter one as needed, and producing a result that is some function of corresponding pairs of values, one from each series? Interestingly enough, the fastest way seems to be to make a new series, rather than updating the longer one in place, as shown by the following test cases: CYCLES0 - uses FOREACH to iterate across the first (assumed to be the longer) argument, and explicitly plays with the second series to cycle it through its positions. CYCLES1 - uses FOREACH on the first/longer series, but uses explicit indexing on the second/shorter one. The mod-and-add-one trick cycles an integer counter through the set of values from one to the modulus. CYCLES2 - uses explicit indexing on both series arguments. All of the above variations create a new series to hold the result, so... CYCLES3 - modifies the first/longer series in place, rather than taking up the storage for a modified copy. 8<----------begin code---------- cycles0: func [b0 [block!] b1 [block!] /local r] [ r: make block! length? b0 foreach a b0 [ insert tail r a + b1/1 if empty? b1: next b1 [b1: head b1] ] r ] cycles1: func [b0 [block!] b1 [block!] /local j y r] [ r: make block! length? b0 y: length? b1 j: 1 foreach a b0 [ insert tail r a + b1/:j j: j // y + 1 ] r ] cycles2: func [b0 [block!] b1 [block!] /local i j x y r] [ r: make block! length? b0 x: length? b0 y: length? b1 i: j: 1 while [i <= x] [ insert tail r b0/:i + b1/:j i: i + 1 j: j // y + 1 ] r ] cycles3: func [b0 [block!] b1 [block!] /local j y] [ y: length? b1 j: 1 forall b0 [ change b0 b0/1 + b1/:j j: j // y + 1 ] b0 ] 8<-----------end code----------- The timings went as follows (with line-wrapping for email formatting): 8<-------begin transcript-------

>> x: make block! 500000

for i 0 499999 1 [insert tail x i] print length? x 500000

>> y: make block! 10

for i 0 9 1 [insert tail y i] print length? y 10

>> t: now/time/precise z: cycles0 x y now/time/precise - t

== 0:00:30.65

>> t: now/time/precise z: cycles1 x y now/time/precise - t

== 0:00:25.43

>> t: now/time/precise z: cycles2 x y now/time/precise - t

== 0:00:33.28

>> t: now/time/precise z: cycles3 x y now/time/precise - t

== 0:00:40.86

>> copy/part z 50

== [0 2 4 6 8 10 12 14 16 18 10 12 14 16 18 20 22 24 26 28 20 22 24 26 28 30 32 34 36 38 30 32 34 36 38 40 42 44 46 48 40 42 44 46 ... 8<--------end transcript-------- Note that the longer series was made long enough to really require some time (on a slower box, at least...), and the shorter series was made short enough that its manipulation was stressed (and so that the result could be examined by eye for correctness). My conclusions (based on preliminary testing only, the usual disclaimers about benchmarks apply...) are as follows: - Explicit indexing wins over series manipulation for the shorter series. - WHILE loses to FOREACH (duh. included for completeness). - Preallocation is already known to be necessary for speed. - Constructing a new series wins over modifying the original series in place. Of course, these tests were performed using blocks, so it is only an assumption that the same performance trade-offs would apply to other series types. -jn- -- ; Joel Neely joeldotneelyatfedexdotcom REBOL [] do [ do func [s] [ foreach [a b] s [prin b] ] sort/skip do function [s] [t] [ t: "" foreach [a b] s [repend t [b a]] t ] { | e s m!zauafBpcvekexEohthjJakwLrngohOqrlryRnsctdtiub} 2 ]

[10/15] from: oliva:david:seznam:cz at: 28-Mar-2002 15:10

HK> Please do not post source code for cryptographic functions to the mailing HK> list. This list is hosted in the US and some subscribers are outside of HK> the US, so posting source code constitutes a violation of US export laws HK> on cryptography. reading lines like this one always reminds me that for some people we are second class persons living on this planet [just because our ancestors were not killing aborigines in America or Australia?] I now that you just had to write this mail as I wanted to write this one... Oldes ----- -----[you never know who is observing you!]

[11/15] from: reboler:programmer at: 28-Mar-2002 12:17

Re: Improving function speed (no code) Thanks Ryan, and Ladislav, for your help with improving function speed. After removing one 'append from ARCFOUR, and changing the code such that the series is modified in place I increased speed by 16-17%. After replacing all the 'for in ARCFOUR with 'loop, I gained another 15% on top of that. Overall function execution time was decreased by a solid 28% ! I used FX5's Profiler to do my time tests. Great Proggie! Too bad I can't show you the improved code... ;^/ An interesting phenom: for iterations (k) over 1000, 'loop beats 'for always, but under 1000 it's a toss-up which one is faster. (I tried the following... cp-orig: func [string][for n 1 k 1 [xor #"1" #"a"]] cp-new: func [string][loop k [xor #"1" #"a"]] ) It does not seem to be related to some 1k (1024) barrier. PS: I've just read Joel's excellent analysis and now I may have to try constructing a new series instead of mod-in-place! I still can't get rid of one 'append though ... ... s: j: i: 0 state: make block! 256 loop 256 [append state s s: s + 1] ... -- _______________________________________________ Sign-up for your own FREE Personalized E-mail at Mail.com http://www.mail.com/?sr=signup Win the Ultimate Hawaiian Experience from Travelocity. http://ad.doubleclick.net/clk;4018363;6991039;n?http://svc.travelocity.com/promos/winhawaii/

[12/15] from: greggirwin:mindspring at: 28-Mar-2002 10:55

Thanks (once again) Joel for an informative post! I need to get organized and gather together all the various performance related findings posted to the list. Good stuff. --Gregg

[13/15] from: joel:neely:fedex at: 28-Mar-2002 13:31

Hi, Alan, alan parman wrote:

> Re: Improving function speed (no code) >

...

> I still can't get rid of one 'append though ... > ... > s: j: i: 0 > state: make block! 256 > loop 256 [append state s s: s + 1] > ... >

It *can* be sped up slightly (saving a millisecond a month somewhere ;-) b: make block! n i: 0 t0a: now/time/precise loop n [append b i i: i + 1] t0b: now/time/precise b: make block! n i: -1 t1a: now/time/precise loop n [append b i: i + 1] t1b: now/time/precise b: make block! n i: -1 t2a: now/time/precise loop n [insert tail b i: i + 1] t2b: now/time/precise print [t0b - t0a t1b - t1a t2b - t2a] The first version is similar to yours, the second eliminates the doubled use of the counter, and the third uses the well- known APPEND ==> INSERT TAIL optimization. Timing the above five times for N of one million yields these results: 0:00:04.687 0:00:04.507 0:00:02.113 0:00:04.707 0:00:04.547 0:00:02.143 0:00:04.677 0:00:04.526 0:00:02.113 0:00:04.687 0:00:04.536 0:00:02.113 0:00:04.656 0:00:04.586 0:00:02.113 Most of the savings, as expected, results from the second optimization, but avoiding the "double-dipping" does have a measurable value. -jn-

[14/15] from: ryanc:iesco-dms at: 28-Mar-2002 12:38

Wow Joel, I was blown away when I seen 'foreach is cruise through that so fast. They must have improved it along the way. Anyhow, here are some optomized versions... ; Really fast! hyper-cycles: func [b0 [block!] b1 [block!] /local j y r] [ r: copy b0 y: length? b1 j: 1 foreach a b0 [ r: change r a + pick b1 j j: j // y + 1 ] head r ] ; Less memory consuming, yet still fast... lean-cycles: func [b0 [block!] b1 [block!] /local j y] [ y: length? b1 j: 1 repeat B0-I length? b0 [ poke b0 B0-I b0/:B0-I + b1/:j J: J // y + 1 ] b0 ] --Ryan

[15/15] from: al:bri:xtra at: 29-Mar-2002 8:45

Alan wrote:

> I still can't get rid of one 'append though ... > ... > s: j: i: 0 > state: make block! 256 > loop 256 [append state s s: s + 1]

Try: S: 0 State: make block! 256 loop length? State [ insert state S S: S + 1 ] reverse State I think that might be faster. A better alternative is to pregenerate this block, save to a file and then load it as needed. For example: State: [ 0 1 2 3 4 ; and so on. ] save %State.r State ;... State: load %State.r Andrew Martin ICQ: 26227169 http://valley.150m.com/

Notes

Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted