utf8-encode
[1/10] from: oliva::david::seznam::cz at: 7-Jun-2002 0:04
R> nobody has utf8-encoder?
R> probably will have to write one by myself
ok... why there is NO native right/left shift function in Rebol?!
here is how utf8 works:
putwchar(c)
{
if (c < 0x80) {
putchar (c);
}
else if (c < 0x800) {
putchar (0xC0 | c>>6);
putchar (0x80 | c & 0x3F);
}
else if (c < 0x10000) {
putchar (0xE0 | c>>12);
putchar (0x80 | c>>6 & 0x3F);
putchar (0x80 | c & 0x3F);
}
else if (c < 0x200000) {
putchar (0xF0 | c>>18);
putchar (0x80 | c>>12 & 0x3F);
putchar (0x80 | c>>6 & 0x3F);
putchar (0x80 | c & 0x3F);
}
}
and here is my Rebol version:
rebol [
title: "UTF-8 encode"
purpose: {Encodes the string data to UTF-8}
author: "oldeS"
email: [oliva--david--seznam--cz]
date: 7-Jun-2002/0:03:27+2:00
usage: {
>> utf8-encode "czech chars: ìšèøžýáíé"
== "czech chars: ìšèøžýáÃé"}
comment: {More info: http://czyborra.com/utf/ }
]
shift: func [
"Takes a base-2 binary string and shifts bits"
data [string! binary!] places [integer!] /left /right
][
data: enbase/base data 2
either right [
remove/part tail data negate places
data: head insert/dup head data #"0" places
][
remove/part data places
insert/dup tail data #"0" places
]
return debase/base data 2
]
utf8-encode: func[
"Encodes the string data to UTF-8"
str [any-string!] "string to encode"
/local c
][
str: to-binary str
forall str [
if #{79} < c: to-binary to-char first str [
remove str
insert str join (#{c0} or shift/right c 6) (c and #{3F} or #{80})
str: next str
]
]
to-string head str
]
[2/10] from: oliva:david:seznam:cz at: 6-Jun-2002 22:06
Hello rebol-list,
nobody has utf8-encoder?
probably will have to write one by myself
[3/10] from: nitsch-lists:netcologne at: 7-Jun-2002 9:11
Am Freitag, 7. Juni 2002 00:04 schrieb RebOldes:
> R> nobody has utf8-encoder?
> R> probably will have to write one by myself
>
> ok... why there is NO native right/left shift function in Rebol?!
>
because it can be replaced by multiplication/division with powers of 2? is
rarely needed, and performance is not so critical. something like [ shift: [
1 2 4 8 16 ..] my-number * shift/2 ] ?
[4/10] from: rpgwriter:y:ahoo at: 7-Jun-2002 9:26
--- RebOldes <[oliva--david--seznam--cz]> wrote:
> shift: func [
> "Takes a base-2 binary string and shifts bits"
<<quoted lines omitted: 11>>
> return debase/base data 2
> ]
Why convert to a binary string? Why not something
like:
shift: func [
"shifts bits in an integer, by default to the right"
data [integer!] places [integer!] /right /left
] [
return
data * ( 2 ** either left [ places ] [ 0 - places
]
]
[5/10] from: ethanak:interclub:pl at: 7-Jun-2002 23:20
On Jun 07 at 00:04 RebOldes wrote:
> R> nobody has utf8-encoder?
> R> probably will have to write one by myself
[cut code]
Very clever... but it's not utf-8. Don't you think you should
translate ISO-2 characters into UTC-2 encoding?
ethanak
[6/10] from: oliva:david:seznam:cz at: 12-Jun-2002 14:16
or this shorter and faster version but not so clear to understand:
rebol [
title: "UTF-8 encode"
purpose: {Encodes the string data to UTF-8}
author: "oldeS"
email: [oliva--david--seznam--cz]
date: 7-Jun-2002/0:24:44+2:00
usage: {
>> utf8-encode "chars: ìšèøžýáíé"
== "chars: ìšèøžýáÃé"}
comment: {More info: http://czyborra.com/utf/ }
]
utf8-encode: func[
"Encodes the string data to UTF-8 (from Latin-1)"
str [any-string!] "string to encode"
/local c
][
str: to binary! str
forall str [
if 127 < c: first str [
change str to char! (c and 63 or 128)
c: enbase/base to binary! to char! c 2
remove/part tail c -6
c: head insert/dup head c #"0" 6
str: insert str (#{c0} or debase/base c 2)
]
]
to string! head str
]
Now just to find how to encode Latin2 charset to have real czech
chars:(
[7/10] from: rotenca:telvia:it at: 12-Jun-2002 23:36
> or this shorter and faster version but not so clear to understand:
And this is mine (more fast and more criptic :-)
I do not know utf8, i copied the logic of your code.
This works also under actual View 1.2.1.3.1
I think parse is the best way to go.
utf8-encode: func[
"Encodes the string data to UTF-8"
str [any-string!] "string to encode"
/local c h
][
;if you remove 'copy you can change the original string
parse/all copy str [
any [
h: skip (
if 127 < c: first h [
h: change h c / 64 or 192
insert h c and 63 or 128
]
)
:h
skip
]
]
head h
]
---
Ciao
Romano
[8/10] from: oliva:david:seznam:cz at: 14-Jun-2002 11:43
Hello Bohdan,
Friday, June 7, 2002, 11:20:16 PM, you wrote:
BRR> On Jun 07 at 00:04 RebOldes wrote:
>> R> nobody has utf8-encoder?
>> R> probably will have to write one by myself
BRR> [cut code]
BRR> Very clever... but it's not utf-8. Don't you think you should
BRR> translate ISO-2 characters into UTC-2 encoding?
BRR> ethanak
hmm... I was looking at it just a few minutes and there was same
result as gives me utf8_encode() function in PHP, but you are right
because the script I've sent does not solve my problems with our czech
extended characters because it converts just from Latin-1 to UTF-8.
The problem is that I don't know how to convert ISO-2 to UTC-2 (yet) but if
I will have some time, I will try to find some documentation and...
[9/10] from: oliva:david:seznam:cz at: 14-Jun-2002 10:35
Hello Romano,
Wednesday, June 12, 2002, 11:36:51 PM, you wrote:
>> or this shorter and faster version but not so clear to understand:
RPT> And this is mine (more fast and more criptic :-)
RPT> I do not know utf8, i copied the logic of your code.
RPT> This works also under actual View 1.2.1.3.1
RPT> I think parse is the best way to go.
RPT> utf8-encode: func[
RPT> "Encodes the string data to UTF-8"
RPT> str [any-string!] "string to encode"
RPT> /local c h
RPT> ][
RPT> ;if you remove 'copy you can change the original string
RPT> parse/all copy str [
RPT> any [
RPT> h: skip (
RPT> if 127 < c: first h [
RPT> h: change h c / 64 or 192
RPT> insert h c and 63 or 128
RPT> ]
RPT> )
RPT> :h
RPT> skip
RPT> ]
RPT> ]
RPT> head h
RPT> ]
RPT> ---
RPT> Ciao
RPT> Romano
Great.... I have to say... you are the winner:-) I did some tests and
your code is the fastest. If you don't mind I will upload the script
to the library.
[10/10] from: rotenca:telvia:it at: 17-Jun-2002 12:29
> If you don't mind I will upload the script
> to the library.
No problem at all!
---
Ciao
Romano
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted