Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search

[REBOL] Re: UTF-8 revisited

From: jan:skibinski:sympatico:ca at: 27-Nov-2002 14:10

Hi Romano, While the first set of changes reduces the timings to about 65% the second has much lesser impact - 61% at best, which is about 0.34s for 1000 loops on "chars: =EC=9A=E8=F8=9E=FD=E1=ED=E9" Latin-1 sequence (case k=1) where utf8-encode gets its best timing of 0.18. But heck, every single percent counts! :-) Timings vary, so the above data is just for your orientation. But I am sure you already know the results. :-) I found it quite easy to get the first improvements in my original version down to 5s, but then I got stuck on 1.35s. I was so "desperate" that I even tried simulated bit registers. Injecting "10" bits in front of every six bits, travelling from the tail. Amazingly, I was reaching there similar timings of 1.5s, so do not discard such approaches off hand, if you ever need shifts and other such manipulations. But breaking of the 1s barrier happened only after I completely revised the algorithm and started working from the least significant bits up. This way I could get rid of most of the tables and use hardcoded magic "64" integer instead. I was so caught up in the official algorithm description that I missed the obvious - which is what 'utf8-encode is in fact based on. The register simulation clearly helped me here. Best regards, Jan Romano Paolo Tenca wrote: