ISO-8859-1 question

[1/3] from: spike:dal at: 23-Jan-2001 15:03

Been about a year since I've made use of this list, and I hope it is still here :) I've already contacted feedback@ and they have at least concurred that this would take some work to fix A page I used to grab and parse through Rebol using a page: read to-url <urlhere> command .. seems to have switched from sending its content as ISO-8859-1 (Latin1) and seems to be doing it as UTF-8 now. I really need the content now stored in "page" to be ISO-8859-1. So, can anyone tell me a way to force the "read" to pull it down as ISO-8859-1? Barring that, does anyone have any functions to convert "page" to it? (I would actually need another function to go in the reverse direction I think) I'd love to hear from anyone with any ideas. Thanks. MirclMax [spike--dal--net]

[2/3] from: kgd03011::nifty::ne::jp at: 24-Jan-2001 17:30

Hello MirclMax,

>A page I used to grab and parse through Rebol using a >page: read to-url <urlhere>

<<quoted lines omitted: 4>>

>Barring that, does anyone have any functions to convert "page" to it? (I >would actually need another function to go in the reverse direction I think)

This is something I've wanted to have a whack at for quite a while, so I threw something together. It seems to work OK, but I haven't tested it on illegal or broken UTF-8 to see what it does in such cases. Anything that can't be expressed as ISO-8859-1 is converted to "?", but you can easily modify it to substitute some other string, or ignore such characters. You can use it like this: page: utf-iso read <some url> Cheers, Eric utf-iso: func [ {convert a string from UTF-8 encoding to ISO-8859-1} s [string!] /local res ascii skipn skipped stretch one iso ] compose [ normal: (make bitset! [#"^(0)" - #"^(7F)"]) iso: (make bitset! [#"^(C2)" - #"^(C3)"]) skipn: (make bitset! [#"^(80)" - #"^(FF)"]) skipped: (make bitset! [#"^(80)" - #"^(BF)"]) res: copy "" parse/all s [ any [ copy stretch some normal (append res stretch) | copy one iso copy stretch skipped (append res to-char (first one) - #"^(C0)" * #"^(40)" + ((first stretch)- #"^(80)")) | skipn any skipped (append res "?") | some skipped (append res "!") ] ] res ]

[3/3] from: spike:dal at: 24-Jan-2001 10:35

Eric: Thanks so much, it works perfectly. I'm very happy to have seen this as I was just about to start having a perl program parse all of the stuff before it got to the rebol program. I'm much happier having it all in one. One note though, Your syntax:

>You can use it like this: > > page: utf-iso read <some url>

doesn't seem to work as it looks for some string and the like.. I just did: page: read to-url <urlhere> page2: utf-iso page Works great. Thanks again.. (Note, I will likely be unsubscribing from this list, any followups should be CC:'d to me) MirclMax [spike--dal--net] Quoth Eric Long at 05:30 PM 1/24/2001 +0900:

Notes

Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted