Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

utf8-encode

 [1/10] from: oliva::david::seznam::cz at: 7-Jun-2002 0:04


R> nobody has utf8-encoder? R> probably will have to write one by myself ok... why there is NO native right/left shift function in Rebol?! here is how utf8 works: putwchar(c) { if (c < 0x80) { putchar (c); } else if (c < 0x800) { putchar (0xC0 | c>>6); putchar (0x80 | c & 0x3F); } else if (c < 0x10000) { putchar (0xE0 | c>>12); putchar (0x80 | c>>6 & 0x3F); putchar (0x80 | c & 0x3F); } else if (c < 0x200000) { putchar (0xF0 | c>>18); putchar (0x80 | c>>12 & 0x3F); putchar (0x80 | c>>6 & 0x3F); putchar (0x80 | c & 0x3F); } } and here is my Rebol version: rebol [ title: "UTF-8 encode" purpose: {Encodes the string data to UTF-8} author: "oldeS" email: [oliva--david--seznam--cz] date: 7-Jun-2002/0:03:27+2:00 usage: {
>> utf8-encode "czech chars: "
== "czech chars: ìšèøžýáíé"} comment: {More info: http://czyborra.com/utf/ } ] shift: func [ "Takes a base-2 binary string and shifts bits" data [string! binary!] places [integer!] /left /right ][ data: enbase/base data 2 either right [ remove/part tail data negate places data: head insert/dup head data #"0" places ][ remove/part data places insert/dup tail data #"0" places ] return debase/base data 2 ] utf8-encode: func[ "Encodes the string data to UTF-8" str [any-string!] "string to encode" /local c ][ str: to-binary str forall str [ if #{79} < c: to-binary to-char first str [ remove str insert str join (#{c0} or shift/right c 6) (c and #{3F} or #{80}) str: next str ] ] to-string head str ]

 [2/10] from: oliva:david:seznam:cz at: 6-Jun-2002 22:06


Hello rebol-list, nobody has utf8-encoder? probably will have to write one by myself

 [3/10] from: nitsch-lists:netcologne at: 7-Jun-2002 9:11


Am Freitag, 7. Juni 2002 00:04 schrieb RebOldes:
> R> nobody has utf8-encoder? > R> probably will have to write one by myself > > ok... why there is NO native right/left shift function in Rebol?! >
because it can be replaced by multiplication/division with powers of 2? is rarely needed, and performance is not so critical. something like [ shift: [ 1 2 4 8 16 ..] my-number * shift/2 ] ?

 [4/10] from: rpgwriter:yah:oo at: 7-Jun-2002 9:26


--- RebOldes <[oliva--david--seznam--cz]> wrote:
> shift: func [ > "Takes a base-2 binary string and shifts bits"
<<quoted lines omitted: 11>>
> return debase/base data 2 > ]
Why convert to a binary string? Why not something like: shift: func [ "shifts bits in an integer, by default to the right" data [integer!] places [integer!] /right /left ] [ return data * ( 2 ** either left [ places ] [ 0 - places ] ]

 [5/10] from: ethanak:interclub:pl at: 7-Jun-2002 23:20


On Jun 07 at 00:04 RebOldes wrote:
> R> nobody has utf8-encoder? > R> probably will have to write one by myself
[cut code] Very clever... but it's not utf-8. Don't you think you should translate ISO-2 characters into UTC-2 encoding? ethanak

 [6/10] from: oliva:david:seznam:cz at: 12-Jun-2002 14:16


or this shorter and faster version but not so clear to understand: rebol [ title: "UTF-8 encode" purpose: {Encodes the string data to UTF-8} author: "oldeS" email: [oliva--david--seznam--cz] date: 7-Jun-2002/0:24:44+2:00 usage: {
>> utf8-encode "chars: "
== "chars: ìšèøžýáíé"} comment: {More info: http://czyborra.com/utf/ } ] utf8-encode: func[ "Encodes the string data to UTF-8 (from Latin-1)" str [any-string!] "string to encode" /local c ][ str: to binary! str forall str [ if 127 < c: first str [ change str to char! (c and 63 or 128) c: enbase/base to binary! to char! c 2 remove/part tail c -6 c: head insert/dup head c #"0" 6 str: insert str (#{c0} or debase/base c 2) ] ] to string! head str ] Now just to find how to encode Latin2 charset to have real czech chars:(

 [7/10] from: rotenca:telvia:it at: 12-Jun-2002 23:36


> or this shorter and faster version but not so clear to understand:
And this is mine (more fast and more criptic :-) I do not know utf8, i copied the logic of your code. This works also under actual View 1.2.1.3.1 I think parse is the best way to go. utf8-encode: func[ "Encodes the string data to UTF-8" str [any-string!] "string to encode" /local c h ][ ;if you remove 'copy you can change the original string parse/all copy str [ any [ h: skip ( if 127 < c: first h [ h: change h c / 64 or 192 insert h c and 63 or 128 ] ) :h skip ] ] head h ] --- Ciao Romano

 [8/10] from: oliva:david:seznam:cz at: 14-Jun-2002 11:43


Hello Bohdan, Friday, June 7, 2002, 11:20:16 PM, you wrote: BRR> On Jun 07 at 00:04 RebOldes wrote:
>> R> nobody has utf8-encoder? >> R> probably will have to write one by myself
BRR> [cut code] BRR> Very clever... but it's not utf-8. Don't you think you should BRR> translate ISO-2 characters into UTC-2 encoding? BRR> ethanak hmm... I was looking at it just a few minutes and there was same result as gives me utf8_encode() function in PHP, but you are right because the script I've sent does not solve my problems with our czech extended characters because it converts just from Latin-1 to UTF-8. The problem is that I don't know how to convert ISO-2 to UTC-2 (yet) but if I will have some time, I will try to find some documentation and...

 [9/10] from: oliva:david:seznam:cz at: 14-Jun-2002 10:35


Hello Romano, Wednesday, June 12, 2002, 11:36:51 PM, you wrote:
>> or this shorter and faster version but not so clear to understand:
RPT> And this is mine (more fast and more criptic :-) RPT> I do not know utf8, i copied the logic of your code. RPT> This works also under actual View 1.2.1.3.1 RPT> I think parse is the best way to go. RPT> utf8-encode: func[ RPT> "Encodes the string data to UTF-8" RPT> str [any-string!] "string to encode" RPT> /local c h RPT> ][ RPT> ;if you remove 'copy you can change the original string RPT> parse/all copy str [ RPT> any [ RPT> h: skip ( RPT> if 127 < c: first h [ RPT> h: change h c / 64 or 192 RPT> insert h c and 63 or 128 RPT> ] RPT> ) RPT> :h RPT> skip RPT> ] RPT> ] RPT> head h RPT> ] RPT> --- RPT> Ciao RPT> Romano Great.... I have to say... you are the winner:-) I did some tests and your code is the fastest. If you don't mind I will upload the script to the library.

 [10/10] from: rotenca:telvia:it at: 17-Jun-2002 12:29


> If you don't mind I will upload the script > to the library.
No problem at all! --- Ciao Romano

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted