x-www-form-urlencoded (bug in bitset! or find ?)

[1/5] from: alex::pini::mclink::it at: 2-Sep-2000 1:55

>- Open Your Mind -<

I've been examining the url-encode function from CookieClient.r, but there's something I'd most definitely call a bug... if I could manage to pinpoint it! :-0

>> s: copy "" for c #"^(00)" #"^(fe)" 1 [append s c] append s #"^(ff)" >> probe url-encode s

{%00%01%02%03%04%05%06%07%08%09%0A%0B%0C%0D%0E%0F%10%11%12%13%14%15%16%17%18%19%1A%1B%1C%1D%1E%1F [%21%22%23%24%25%26%27%28%29*%2B%2C---%2F0123456789%3A%3B%3C%3D%3E%3F--ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D%7E%7F%80%81%83] %8F%91%94%97%98%99%9A%9B%9F��%A4�%A6%A7%A8%A9%AA%AB%AC�%AE%AF%B0��%B6%B7%B8��%BE%BF%C0%C1%C2%C3%C4�%C6%C7%C8%C9%CA%CB%CC%CD%CE%CF%D0�%D2�%D4�%D6%D7��%DA�%DC�%DE%DF%E0%E1��%E4�%E6%E7�%E9��%EC�%EE%EF%F0%F1%F2%F3%F4%F5%F6%F7%F8%F9%FA%FB%FC%FD%FE%FF} The result isn't correct (just look at all those unescaped characters beyond "%7F"). Now, don't ask me why, but, in different moments...

>> probe url-encode s

{%00%01%02%03%04%05%06%07%08%09%0A%0B%0C%0D%0E%0F%10%11%12%13%14%15%16%17%18%19%1A%1B%1C%1D%1E%1F [%21%22%23%24%25%26%27%28%29*%2B%2C---%2F0123456789%3A%3B%3C%3D%3E%3F--ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D%7E%7F%80%81%83] %8F%91%94%97%98%99%9A%9B%9F��%A4�%A6%A7%A8%A9%AA%AB%AC�%AE%AF%B0��%B6%B7%B8��%BE%BF%C0%C1%C2%C3%C4�%C6%C7%C8%C9%CA%CB%CC%CD%CE%CF%D0%D1%D2%D3%D4%D5%D6%D7%D8%D9%DA%DB%DC%DD%DE%DF%E0%E1%E2%E3%E4%E5%E6%E7%E8%E9%EA%EB%EC%ED%EE%EF%F0%F1%F2%F3%F4%F5%F6%F7%F8%F9%FA%FB%FC%FD%FE%FF} They're different. Playing around this thing in various ways, I got mixed results, even correct ones :-). At times I seemed to catch a kind of pattern in the bug, but lost it soon after. :-0 I've examined the source several times, but I can't find anything wrong in how it works. I think there's a bug in find and/or in bitsets, because one single time I had this:

>> for c #"^(00)" #"^(fe)" 1 [if find charset "ABC" c [prin c]] prin newline

ABC� (notice the "squared" after the "C"). Tomorrow (after some sleep :-) I'm going to try harder, but in the meantime here's to you antipodeans :-) BTW, is there an *easy* way to get a string from a bitset? For example, to get "123ABCabc" from charset "abcABC123". Alessandro Pini ([alex--pini--mclink--it]) Dexteeeer!!! Get me out of here!!! (Kimberly)

[2/5] from: kgd03011:nifty:ne:jp at: 2-Sep-2000 12:23

Hi Alessandro,

>I've been examining the url-encode function from CookieClient.r, but >there's something I'd most definitely call a bug... if I could manage to >pinpoint it! :-0

Your problem reminded me of something, and after a good deal of poking around in my old mail I found it. The bug lies in this part: find normal-char first data When you pluck a character out of a string with FIRST or PICK, the return value is not stable - it produces unpredictable errors when used with FIND and bitsets. The workaround I found for the problem is this: find normal-char to-char to-integer first data Here's a copy of the feedback I sent at the beginning of last November! It sounds like you've gone through exactly what I did. Sure would be nice to have an accessible bug data base! I've never had much luck with getting bugs I've reported fixed. But perhaps if you complain too, something will eventually be done for this bug.

>>I've run into more trouble with FIND (using 2.2.0.3.1). I've got a bitset >>that defines all the possible characters that can be the first part of a

<<quoted lines omitted: 25>>

>> >>which always seems to work.

Actually, maybe it's better to do this with PARSE: url-encode: func [ {URL-encode a string} data "String to encode" /local new-data normal-char c ] compose [ new-data: make string! "" normal-char: (charset [ #"A" - #"Z" #"a" - #"z" #"@" #"." #"*" #"-" #"_" #"0" - #"9" ]) if not string? data [return new-data] parse data [ some [ copy c normal-char (append new-data c) | copy c skip (append new-data reduce ["%" skip tail (to-hex 0 + first c) -2]) ] ] new-data ] (Notice I've optimized a bit. Using COMPOSE means we only have to make the bitset NORMAL-CHAR when the function is constructed. And in the original line: rejoin ["%" to-string skip tail (to-hex to-integer first data) -2] TO-STRING isn't necessary, 0 + works faster than TO-INTEGER, and REDUCE works just as well as REJOIN, replacing a function call with a fast native.)

>BTW, is there an *easy* way to get a string from a bitset? For example, to >get "123ABCabc" from charset "abcABC123".

This is a function I wrote a long time ago. I think it does what you want. With the /ul refinement it does something completely different, which is return a "case insensitive" form of a bitset. unroll-bitset: func [ {return string listing all characters in B} b [bitset!] /ul "return B with all characters set for upper and lower case" /local s i ][ b: copy b s: copy "" i: 0 while [ i <= 255 ] [ if find b to char! i [insert tail s to char! i] i: i + 1 ] s: head s either ul [ insert b uppercase s insert b lowercase s ][s] ]

>> unroll-bitset charset "abcABC123"

== "123ABCabc"

>> unroll-bitset unroll-bitset/ul charset "abc123"

== "123ABCabc" See you, Eric

[3/5] from: alex:pini:mclink:it at: 3-Sep-2000 18:09

>- Open Your Mind -<

Quoting from Eric's message (02-Sep-00 05:23:14). K> Actually, maybe it's better to do this with PARSE: My thinking exactly. :-) K> url-encode: func [ K> {URL-encode a string} K> data "String to encode" K> /local new-data normal-char c K> ] compose [ K> new-data: make string! "" K> normal-char: (charset [ K> #"A" - #"Z" #"a" - #"z" K> #"@" #"." #"*" #"-" #"_" K> #"0" - #"9" K> ]) K> if not string? data [return new-data] K> parse data [ some [ K> copy c normal-char K> (append new-data c) | K> copy c skip K> (append new-data reduce ["%" skip tail (to-hex 0 + first c) -2]) K> ] ] K> new-data K> ] I've modified your version a little, striving to obtain RFC-compliance (which is impossible, since they contradict one another and sometimes themselves)-: url-encode: func [ "URL-encodes a string." value [string!] "The string to encode" /local url-encoded-string normal-char char ] compose [ url-encoded-string: make string! "" normal-char: (charset [ #"A" - #"Z" #"a" - #"z" #"0" - #"9" "$-_.!*'()," ]) parse/all value [ any [ copy char normal-char (append url-encoded-string char) | copy char " " (append url-encoded-string "+") | copy char newline (append url-encoded-string "%0D%0A") | copy char skip (append url-encoded-string reduce ["%" skip tail (to-hex 0 + first char) -2]) ] ] url-encoded-string ] The differences: datatype checking (I need to be notified when I do something wrong :-); all charset's special characters are now in a string for clarity; the special characters are now the ones specified in RFC 1738, minus the plus (huh? :-); white space (" ") is now transformed into a plus ("+"), as per HTML 4.01 specs, which de-facto obsolete RFC 1866; line break is now transformed into the pair "%0D%0A", as per HTML 4.01 specs; the "some" word is now "any" to deal with empty strings (even if results don't change: pedantic now, no headaches later); layout and some names changed to meet REBOL standards and to suit my tastes :-) You can try it using for example the following 2 lines. s: copy "" for c #"^(00)" #"^(fe)" 1 [append s c] append s #"^(ff)" probe url-encode s Two doubts still remain. First, I'm not sure if I'm doing the right thing by changing the special characters in the charset: some of my browsers agree with the previous version, so maybe there's a de-facto standard *I* am not respecting. If so, please point me in the right direction. :-) Second, I guess the newline check isn't sufficient for everyone. Then again, newline is right for the platform the script is running on... and if you use this function you probably know how to (easily) go around possible obstacles. Alessandro Pini ([alex--pini--mclink--it]) Did you accidentally inject yourself with some kind of... psychotropic agent? (Kim)

[4/5] from: alex:pini:mclink:it at: 3-Sep-2000 23:26

>- Open Your Mind -<

Quoting from my message (03-Sep-00 18:09:43). a> The differences: a> datatype checking (I need to be notified when I do something wrong :-); a> all charset's special characters are now in a string for clarity; a> the special characters are now the ones specified in RFC 1738, minus the plus (huh? :-); a> white space (" ") is now transformed into a plus ("+"), as per HTML 4.01 specs, which de-facto obsolete RFC 1866; a> line break is now transformed into the pair "%0D%0A", as per HTML 4.01 specs; a> the "some" word is now "any" to deal with empty strings (even if results don't change: pedantic now, no headaches later); a> layout and some names changed to meet REBOL standards and to suit my tastes :-) ... and, of course, the /all refinement has been added to parse (I forgot, sorry). Alessandro Pini ([alex--pini--mclink--it]) So far we've got everything more or less under control. We should be OK, for now... As long as nothing *else* goes wrong. (Ivanova)

[5/5] from: alex:pini:mclink:it at: 8-Sep-2000 16:10

>- Open Your Mind -<

Quoting from my message (03-Sep-00 18:09:43). a> First, I'm not sure if I'm doing the right thing by changing the a> special characters in the charset: some of my browsers agree with the a> previous version, so maybe there's a de-facto standard *I* am not a> respecting ... and, as a matter of fact, learning more and more about cookies, I've found one problem. :-) The comma (as well as the semicolon) can be used as a cookie separator. Here's my newest version of url-encode, complete with comments. url-encode: func [ "URL-encodes a string." value [string!] "The string to encode" /local url-encoded-string normal-char char ] compose [ url-encoded-string: make string! "" normal-char: (charset [ #"A" - #"Z" #"a" - #"z" #"0" - #"9" "$-_.!*'()" ]) ; normal chars are defined as per HTML 4.01 specs, section 17.13.4.1, referring to RFC 1738, section 2.2 ; "+" is not a normal char here because it encodes " " ; "," is not a normal char here because it is a special char as per RFC 2109, section 4.3.4 parse/all value [ any [ copy char normal-char (append url-encoded-string char) | copy char " " (append url-encoded-string "+") | copy char newline (append url-encoded-string "%0D%0A") | copy char skip (append url-encoded-string reduce ["%" skip tail (to-hex 0 + first char) -2]) ] ] url-encoded-string ] Alessandro Pini ([alex--pini--mclink--it]) Hy. Im HAL 9000, your nu ortoghrapphic corecktor.

Notes

Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted