Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

Problem with parsing ISO-8859-1 encoded text

 [1/4] from: didec:tiscali at: 30-Sep-2003 21:12


Problem with parsing ISO-8859-1 encoded text Hi all, All of you have saw in this list some sender E-mail adress "junked" by some =?ISO-8859-1? and other "=22" characters (like the Robert M. Munch one for exemple). I'm working on a function to replace text at that format by it's decoded value (for those who use it, it's for the %delete-emails.r script, but it can also be used for this list ;-) This encoding is describe in the RFC 2047 (and others) The sample script below can "decode" this sort of text, but I've a problem and of course, i need your help The aim is to replace in the original string the encoded text by it's decoded value. It works (almost with the example) if text is encoded with the quoted-printable format that his identify by the "Q?" folowing the =?ISO-8859-1? : full string is replaced by it's decoded value. but with the base64 encoding (identify by the "B?" at the same place), the parser seems to not evaluate the rest of the "encoded-word" rule after replacing the embased text by it's original value. If you have improvment or correction to the parsing rule, please tell me. ****************************** REBOL [] token: complement charset [#" " #"^-" #"^(00)" - #"^(1f)" {()<>@,;:"/[]?.=}] ascii: charset [#" " - #"^(7f)"] car: exclude ascii charset [" ?_=^-"] quoted-printable: [ dc: "_" (change dc #" ") | car | dc: "=" (change/part dc do rejoin [ {#"} "^^" {(} copy/part next dc 2 {)"} ] 3 dc: next dc) :dc ] encoded-word: [ ds: "=?ISO-8859-1?" [ "q?" dt: some quoted-printable ft: | "b?" dt: to "?" ft: (ft: change/part dt to-string debase copy/part dt ft ft) :ft ] "?=" fs: (remove/part ft fs remove/part ds dt) ] str: "=?ISO-8859-1?Q?Re:_Photos_de_la_Journ=E9e_Rebol?=" parse/all str [ any [encoded-word | "^/ "] to end] probe str print"-------------------------------" str: "=?iso-8859-1?b?VGhpcyB3b3JrZWQgZm9yIG1lLCBhbmQgd2lsbCBmb3IgeW91IQ==?=" parse/all str [ any [encoded-word | "^/ "] to end] probe str halt *************************** DideC

 [2/4] from: rotenca::telvia::it at: 30-Sep-2003 23:38


Changing the string while it is parsed is not an easy programming style. It is better to copy it in another location, it is almost always more fast. But if one wants to do it, he must keep in mind these rules: 1) get the next location and pass it back to parse 2) use opt for paren expressions Example: parse "12" [h: "12" opt (h: change/part h "3" 2) :h] ;==true instead, without opt: parse "12" [h: "12" (h: change/part h "3" 2) :h] ;==false There is at least one thread on this mailing list about this "feature". The point is that also paren are matched against the string only to see if the string is already at end, if the string is shorted by the paren rule, parse has nothing to match with and the paren rule makes parse to fail. --- Ciao Romano

 [3/4] from: didec:tiscali at: 1-Oct-2003 0:26


Re: Problem with parsing ISO-8859-1 encoded text Thank's Romano The 'opt has made the difference and now it works, but i'm not sure to understand why 'opt should be use for paren!. I suppose that your last remark (about shorter string after modifying it) was the problem. My 'parse understanding need to be improved a lot. But I will re-make my parse without changing input, it will be easier. DideC

 [4/4] from: andrew:martin:colenso:school at: 1-Oct-2003 10:34


DideC wrote:
> The 'opt has made the difference and now it works, but I'm not sure to
understand why 'opt should be use for paren!.
> I suppose that your last remark (about shorter string after modifying
it) was the problem. I believe Romano meant that there's a bug in 'parse that it should keep working through the rule until 'parse comes to a rule that requires content and fail then. 'Parse should not stop just because it's come to the end of the input after processing the parenthesis value. Andrew J Martin Attendance Officer & Grail Jedi. Colenso High School Arnold Street, Napier. Tel: 64-6-8310180 ext 826 Fax: 64-6-8336759 http://colenso.net/scripts/Wiki.r?AJM http://www.colenso.school.nz/ DISCLAIMER: Colenso High School and its Board of Trustees is not responsible (or legally liable) for materials distributed to or acquired from user e-mail accounts. You can report any misuse of an e-mail account to our ICT Manager and the complaint will be investigated. (Misuse can come in many forms, but can be viewed as any material sent/received that indicate or suggest pornography, unethical or illegal solicitation, racism, sexism, inappropriate language and/or other issues described in our Acceptable Use Policy.) All outgoing messages are certified virus-free by McAfee GroupShield Exchange 5.10.285.0 Phone: +64 6 843 5095 or Fax: +64 6 833 6759 or E-mail: [postmaster--colenso--school--nz]