[REBOL] Problem with parsing ISO-8859-1 encoded text
From: didec:tiscali at: 30-Sep-2003 21:12
Problem with parsing ISO-8859-1 encoded text
Hi all,
All of you have saw in this list some sender E-mail adress "junked" by some
=?ISO-8859-1?
and other "=22" characters (like the Robert M. Munch one for
exemple).
I'm working on a function to replace text at that format by it's decoded value
(for those who use it, it's for the %delete-emails.r script, but it can also be
used for this list ;-)
This encoding is describe in the RFC 2047 (and others)
The sample script below can "decode" this sort of text, but I've a problem and
of course, i need your help
The aim is to replace in the original string the encoded text by it's decoded
value.
It works (almost with the example) if text is encoded with the
quoted-printable
format that his identify by the "Q?" folowing the
=?ISO-8859-1?
: full string is replaced by it's decoded value.
but with the base64 encoding (identify by the "B?" at the same place), the
parser seems to not evaluate the rest of the "encoded-word" rule after
replacing the embased text by it's original value.
If you have improvment or correction to the parsing rule, please tell me.
******************************
REBOL []
token: complement charset [#" " #"^-" #"^(00)" - #"^(1f)" {()<>@,;:"/[]?.=}]
ascii: charset [#" " - #"^(7f)"]
car: exclude ascii charset [" ?_=^-"]
quoted-printable: [
dc: "_" (change dc #" ") |
car |
dc: "=" (change/part dc do rejoin [ {#"} "^^" {(} copy/part next dc 2 {)"} ] 3 dc: next
dc) :dc
]
encoded-word: [
ds: "=?ISO-8859-1?" [
"q?" dt: some quoted-printable ft: |
"b?" dt: to "?" ft: (ft: change/part dt to-string debase copy/part dt ft ft) :ft
]
"?=" fs: (remove/part ft fs remove/part ds dt)
]
str: "=?ISO-8859-1?Q?Re:_Photos_de_la_Journ=E9e_Rebol?="
parse/all str [ any [encoded-word | "^/ "] to end]
probe str
print"-------------------------------"
str: "=?iso-8859-1?b?VGhpcyB3b3JrZWQgZm9yIG1lLCBhbmQgd2lsbCBmb3IgeW91IQ==?="
parse/all str [ any [encoded-word | "^/ "] to end]
probe str
halt
***************************
DideC