[REBOL] Re: UTF-8 revisited
From: jan:skibinski:sympatico:ca at: 29-Nov-2002 10:48
Hi Romano, RebOldes and All,
Version 1.0.1 of utf-8 has been posted to the library.
Of the three functions there: 'encode, 'decode and 'to-ucs
only the 'decode does not use the 'parse. It differs from the other
two in this respect that it works on a variable number of input
bytes (1-6) for every wide character to be decoded.
I would not know how to convert its 'while loop to the 'parse loop
due to the fact that a 'skip value would not be constant. But if you find
it doable and beneficial for speed improvement please take a shot at that.
Otherwise the 'decode behaves quite well and is only about 20% slower
than the 'encode.
The new version is completely redsigned, much simplified
and well documented. It also contains a sample of a simple
phrase in a bunch of languages from Latin-1, 2, 4 and 5.
If anyone is interested in hosting it I can provide a bit bigger
(7K) UTF-8 sample (lokomotywa.html), which is a Polish-English
side-by-side onomatopeic poem for children. Good for testing
and fun for kids too. I did the UTF-8-ization, someone else did
the translation, which I found well done and rythmicly superb.
Jan