Mailing List Archive: Re:x-www-form-urlencoded (bug in bitset! or find ?)

[REBOL] Re:x-www-form-urlencoded (bug in bitset! or find ?)

From: kgd03011:nifty:ne:jp at: 2-Sep-2000 12:23


Hi Alessandro,

>I've been examining the url-encode function from CookieClient.r, but
>there's something I'd most definitely call a bug... if I could manage to
>pinpoint it! :-0

Your problem reminded me of something, and after a good deal of poking
around in my old mail I found it. The bug lies in this part:

  find normal-char first data

When you pluck a character out of a string with FIRST or PICK, the return
value is not stable - it produces unpredictable errors when used with FIND
and bitsets. The workaround I found for the problem is this:

  find normal-char to-char to-integer first data

Here's a copy of the feedback I sent at the beginning of last November!
It sounds like you've gone through exactly what I did. Sure would be
nice to have an accessible bug data base! I've never had much luck with
getting bugs I've reported fixed. But perhaps if you complain too,
something will eventually be done for this bug.

>>I've run into more trouble with FIND (using 2.2.0.3.1). I've got a bitset
>>that defines all the possible characters that can be the first part of a
>>Shift-JIS two-bit character used to encode Japanese text:
>>
>>>> zen1: make bitset! [#"^(81)" - #"^(9F)" #"^(E0)" - #"^(FC)"]
>>
>>It works fine with FIND if a character is expressed directly:
>>
>>>> find zen1 #"^(90)"
>>== true
>>
>>But if the character is picked out of a string it doesn't:
>>
>>>> find zen1 first "^(90)M"
>>== none
>>
>>Sometimes when I fiddle around a bit it starts working:
>>
>>>> #"^(90)" = first "^(90)M"
>>== true
>>>> find zen1 first "^(90)M"
>>== true
>>
>>but not consistently. One workaround is:
>>
>>>> find zen1 make char! make integer! first "^(90)M"
>>== true
>>
>>which always seems to work.

Actually, maybe it's better to do this with PARSE:

url-encode: func [
    {URL-encode a string}
    data "String to encode"
    /local new-data normal-char c
] compose [
    new-data: make string! ""
    normal-char: (charset [
        #"A" - #"Z" #"a" - #"z"
        #"@" #"." #"*" #"-" #"_"
        #"0" - #"9"
    ])
    if not string? data [return new-data]
    parse data [ some [
        copy c normal-char
            (append new-data c) |
        copy c skip
            (append new-data reduce ["%" skip tail (to-hex 0 + first c) -2])
    ] ]
    new-data
]

(Notice I've optimized a bit. Using COMPOSE means we only have to make the
bitset NORMAL-CHAR when the function is constructed. And in the original
line:

    rejoin ["%" to-string skip tail (to-hex to-integer first data) -2]

TO-STRING isn't necessary, 0 + works faster than TO-INTEGER, and
REDUCE works just as well as REJOIN, replacing a function call with a
fast native.)

>BTW, is there an *easy* way to get a string from a bitset? For example, to
>get "123ABCabc" from charset "abcABC123".

This is a function I wrote a long time ago. I think it does what you want.
With the /ul refinement it does something completely different, which
is return a "case insensitive" form of a bitset.

unroll-bitset: func [
    {return string listing all characters in B}
    b [bitset!]
    /ul "return B with all characters set for upper and lower case"
    /local s i
][
    b: copy b
    s: copy ""
    i: 0
    while [ i <= 255 ] [
        if find b to char! i [insert tail s to char! i]
        i: i + 1
    ]
    s: head s
    either ul [
        insert b uppercase s
        insert b lowercase s
    ][s]
]

>> unroll-bitset charset "abcABC123"
== "123ABCabc"
>> unroll-bitset unroll-bitset/ul charset "abc123"
== "123ABCabc"

See you,
Eric