[REBOL] Re: UTF-8 revisited
From: jan:skibinski:sympatico:ca at: 27-Nov-2002 14:10
Hi Romano,
While the first set of changes reduces the timings to about 65%
the second has much lesser impact - 61% at best, which is about
0.34s for 1000 loops on "chars: =EC=9A=E8=F8=9E=FD=E1=ED=E9" Latin-1 sequence (case k=1)
where utf8-encode gets its best timing of 0.18.
But heck, every single percent counts! :-)
Timings vary, so the above data is just for your orientation. But I am
sure you already know the results. :-)
I found it quite easy to get the first improvements in my original
version down to 5s, but then I got stuck on 1.35s. I was so "desperate"
that I even tried simulated bit registers. Injecting "10" bits in front
of
every six bits, travelling from the tail.
Amazingly, I was reaching there similar timings of 1.5s, so do not
discard such approaches off hand, if you ever need shifts and other
such manipulations.
But breaking of the 1s barrier happened only after I completely revised
the algorithm and started working from the least significant bits up.
This way I could get rid of most of the tables and use hardcoded
magic "64" integer instead.
I was so caught up in the official algorithm description that I missed
the obvious - which is what 'utf8-encode is in fact based on.
The register simulation clearly helped me here.
Best regards,
Jan
Romano Paolo Tenca wrote: