Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: UTF-8 revisited

From: jan:skibinski:sympatico:ca at: 29-Nov-2002 10:48

Hi Romano, RebOldes and All, Version 1.0.1 of utf-8 has been posted to the library. Of the three functions there: 'encode, 'decode and 'to-ucs only the 'decode does not use the 'parse. It differs from the other two in this respect that it works on a variable number of input bytes (1-6) for every wide character to be decoded. I would not know how to convert its 'while loop to the 'parse loop due to the fact that a 'skip value would not be constant. But if you find it doable and beneficial for speed improvement please take a shot at that. Otherwise the 'decode behaves quite well and is only about 20% slower than the 'encode. The new version is completely redsigned, much simplified and well documented. It also contains a sample of a simple phrase in a bunch of languages from Latin-1, 2, 4 and 5. If anyone is interested in hosting it I can provide a bit bigger (7K) UTF-8 sample (lokomotywa.html), which is a Polish-English side-by-side onomatopeic poem for children. Good for testing and fun for kids too. I did the UTF-8-ization, someone else did the translation, which I found well done and rythmicly superb. Jan