[REBOL] Re: UTF-8
From: rebol-list2:seznam:cz at: 20-Oct-2004 14:35
Hello Alain,
Sunday, October 17, 2004, 7:27:41 PM, you wrote:
AG> Hi all,
AG> I got interested in manipulating Unicode with REBOL and tried the UTF-8 script by
Jan Skibinski.
AG> It seems there is an error in the encode function which did not convert correctly
my test case : the 1st letter of Khmer alphabet which code is U+1780, should become #{E19E80}
in UTF-8, according
AG> to my understanding (based on http://www.zvon.org/tmRFC/RFC2279/Output/chapter2.html).
AG> In case it may be helpful to someone this version should work (though not optimized
and tested only with k=2 on U+1780 :-) :
Hi, it looks that you were using some older version. Here is available
my latest utf-8.r script:
http://oldes.multimedia.cz/rss/projects/utf-8_latest.rip (4kB)
I removed the to-ucs2 function as I'm using this ucs2.r script:
http://oldes.multimedia.cz/rss/projects/ucs2_latest.rip ( 2.5MB !!!)
The archive is pretty large as it includes all available charmaps
which I collected with already pre-generated appropriate Rebol parsing rules.
I use only cp1250 and ISO-8859-2 so I'm not sure if the others are
good working, but they should be if the included charmap sources are correct.
So if I need to encode a text which was written using 'cp1250' to utf-8 I do:
ucs2/load-rules "cp1250"
utf-8/encode-2 ucs2/encode "text with special char Š"
Theoretically I can also change encoding of the text:
ucs2/load-rules "cp1250"
ucstext: ucs2/encode "text with special char Š"
ucs2/load-rules "iso-8859-2"
to-string ucs2/decode ucstext
== "text with special char ©"
(but I never used this so it's not tested at all and there may be
problem if you have some unicode chars which the decoder rule doesn't know)
I the UCS2 archive there is also a script which creates PHP code for
ucs2 encoding (according charmap you need) as I was missing this in my
PHP build.
Isn't Rebol great tool? :)
Feel free to let me know if you would have some troubles.
Cheers, Oldes
PS: I'm still unicode newbie! I just made a script which is working
as I need it, that's all.