Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search

[REBOL] Problem with parsing ISO-8859-1 encoded text

From: didec:tiscali at: 30-Sep-2003 21:12

Problem with parsing ISO-8859-1 encoded text Hi all, All of you have saw in this list some sender E-mail adress "junked" by some =?ISO-8859-1? and other "=22" characters (like the Robert M. Munch one for exemple). I'm working on a function to replace text at that format by it's decoded value (for those who use it, it's for the %delete-emails.r script, but it can also be used for this list ;-) This encoding is describe in the RFC 2047 (and others) The sample script below can "decode" this sort of text, but I've a problem and of course, i need your help The aim is to replace in the original string the encoded text by it's decoded value. It works (almost with the example) if text is encoded with the quoted-printable format that his identify by the "Q?" folowing the =?ISO-8859-1? : full string is replaced by it's decoded value. but with the base64 encoding (identify by the "B?" at the same place), the parser seems to not evaluate the rest of the "encoded-word" rule after replacing the embased text by it's original value. If you have improvment or correction to the parsing rule, please tell me. ****************************** REBOL [] token: complement charset [#" " #"^-" #"^(00)" - #"^(1f)" {()<>@,;:"/[]?.=}] ascii: charset [#" " - #"^(7f)"] car: exclude ascii charset [" ?_=^-"] quoted-printable: [ dc: "_" (change dc #" ") | car | dc: "=" (change/part dc do rejoin [ {#"} "^^" {(} copy/part next dc 2 {)"} ] 3 dc: next dc) :dc ] encoded-word: [ ds: "=?ISO-8859-1?" [ "q?" dt: some quoted-printable ft: | "b?" dt: to "?" ft: (ft: change/part dt to-string debase copy/part dt ft ft) :ft ] "?=" fs: (remove/part ft fs remove/part ds dt) ] str: "=?ISO-8859-1?Q?Re:_Photos_de_la_Journ=E9e_Rebol?=" parse/all str [ any [encoded-word | "^/ "] to end] probe str print"-------------------------------" str: "=?iso-8859-1?b?VGhpcyB3b3JrZWQgZm9yIG1lLCBhbmQgd2lsbCBmb3IgeW91IQ==?=" parse/all str [ any [encoded-word | "^/ "] to end] probe str halt *************************** DideC