[REBOL] Re: utf8-encode
From: oliva::david::seznam::cz at: 7-Jun-2002 0:04
R> nobody has utf8-encoder?
R> probably will have to write one by myself
ok... why there is NO native right/left shift function in Rebol?!
here is how utf8 works:
putwchar(c)
{
if (c < 0x80) {
putchar (c);
}
else if (c < 0x800) {
putchar (0xC0 | c>>6);
putchar (0x80 | c & 0x3F);
}
else if (c < 0x10000) {
putchar (0xE0 | c>>12);
putchar (0x80 | c>>6 & 0x3F);
putchar (0x80 | c & 0x3F);
}
else if (c < 0x200000) {
putchar (0xF0 | c>>18);
putchar (0x80 | c>>12 & 0x3F);
putchar (0x80 | c>>6 & 0x3F);
putchar (0x80 | c & 0x3F);
}
}
and here is my Rebol version:
rebol [
title: "UTF-8 encode"
purpose: {Encodes the string data to UTF-8}
author: "oldeS"
email: [oliva--david--seznam--cz]
date: 7-Jun-2002/0:03:27+2:00
usage: {
>> utf8-encode "czech chars: ìšèøžýáíé"
== "czech chars: ìšèøžýáÃé"}
comment: {More info: http://czyborra.com/utf/ }
]
shift: func [
"Takes a base-2 binary string and shifts bits"
data [string! binary!] places [integer!] /left /right
][
data: enbase/base data 2
either right [
remove/part tail data negate places
data: head insert/dup head data #"0" places
][
remove/part data places
insert/dup tail data #"0" places
]
return debase/base data 2
]
utf8-encode: func[
"Encodes the string data to UTF-8"
str [any-string!] "string to encode"
/local c
][
str: to-binary str
forall str [
if #{79} < c: to-binary to-char first str [
remove str
insert str join (#{c0} or shift/right c 6) (c and #{3F} or #{80})
str: next str
]
]
to-string head str
]