Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: UTF-8

From: rebol-list2:seznam:cz at: 20-Oct-2004 14:35

Hello Alain, Sunday, October 17, 2004, 7:27:41 PM, you wrote: AG> Hi all, AG> I got interested in manipulating Unicode with REBOL and tried the UTF-8 script by Jan Skibinski. AG> It seems there is an error in the encode function which did not convert correctly my test case : the 1st letter of Khmer alphabet which code is U+1780, should become #{E19E80} in UTF-8, according AG> to my understanding (based on http://www.zvon.org/tmRFC/RFC2279/Output/chapter2.html). AG> In case it may be helpful to someone this version should work (though not optimized and tested only with k=2 on U+1780 :-) : Hi, it looks that you were using some older version. Here is available my latest utf-8.r script: http://oldes.multimedia.cz/rss/projects/utf-8_latest.rip (4kB) I removed the to-ucs2 function as I'm using this ucs2.r script: http://oldes.multimedia.cz/rss/projects/ucs2_latest.rip ( 2.5MB !!!) The archive is pretty large as it includes all available charmaps which I collected with already pre-generated appropriate Rebol parsing rules. I use only cp1250 and ISO-8859-2 so I'm not sure if the others are good working, but they should be if the included charmap sources are correct. So if I need to encode a text which was written using 'cp1250' to utf-8 I do: ucs2/load-rules "cp1250" utf-8/encode-2 ucs2/encode "text with special char " Theoretically I can also change encoding of the text: ucs2/load-rules "cp1250" ucstext: ucs2/encode "text with special char " ucs2/load-rules "iso-8859-2" to-string ucs2/decode ucstext == "text with special char " (but I never used this so it's not tested at all and there may be problem if you have some unicode chars which the decoder rule doesn't know) I the UCS2 archive there is also a script which creates PHP code for ucs2 encoding (according charmap you need) as I was missing this in my PHP build. Isn't Rebol great tool? :) Feel free to let me know if you would have some troubles. Cheers, Oldes PS: I'm still unicode newbie! I just made a script which is working as I need it, that's all.