Mailing List Archive: Parse's current index

[REBOL] Parse's current index

From: hallvard::ystad::helpinhand::com at: 17-Jan-2002 10:47


Hello everyone


I've got a decode-url function from somewhere, did a search to find out where, but didn't 
succeed. Have searched the escribe site as well, but with no luck. (Did I write it myself?).

Here's the code:
decode-url: func [to-decode /local hex] [
  hex: charset "0123456789ABCDEFabcdef"

  parse/all to-decode [some [copy entity insert-point: ["%" 2 hex] (
                             insert-point: remove/part insert-point 3

                             insert insert-point to-char to-integer to-issue next entity) |
                             skip ]]
  to-decode
]


Now I discovered that the code has a problem: once it finds an entity, it replaces three 
characters with one. As the parse continues, of two adjacent entities, only the first 
will be replaced, since parse suddenly finds itself in the middle of the next one after 
the replace:
>> decode-url "http%3A%2F%2Fwww.rebol.com%2F"
== "http:%2F/www.rebol.com/"


I looked at different parse tutorials, including yours, Brett, to manipulate parse's 
index. But look at this:

decode-url: func [to-decode /local hex] [
  hex: charset "0123456789ABCDEFabcdef"

  parse/all to-decode [some [copy entity insert-point: ["%" 2 hex] (
                             insert-point: remove/part insert-point 3

                             insert insert-point to-char to-integer to-issue next entity
                             print join "entity: " entity
                             print join "instert-point after replace: " insert-point
                             ) |
                             (print join "not %: " insert-point ) skip ]]
  to-decode
]
>> print decode-url "http%3A%2F%2Fwww.rebol.com%2F"
not %: http%3A%2F%2Fwww.rebol.com%2F
not %: ttp%3A%2F%2Fwww.rebol.com%2F
not %: tp%3A%2F%2Fwww.rebol.com%2F
not %: p%3A%2F%2Fwww.rebol.com%2F
entity: %3A
instert-point after replace: :%2F%2Fwww.rebol.com%2F
not %: F%2Fwww.rebol.com%2F
entity: %2F
instert-point after replace: /www.rebol.com%2F
not %: w.rebol.com%2F
not %: .rebol.com%2F
not %: rebol.com%2F
not %: ebol.com%2F
not %: bol.com%2F
not %: ol.com%2F
not %: l.com%2F
not %: .com%2F
not %: com%2F
not %: om%2F
not %: m%2F
entity: %2F
instert-point after replace: /
not %: /
not %:
http:%2F/www.rebol.com/


So the insert-point is perfectly well situated to continue, but it seems once an entity 
is evaluated and replaced, 'parse continues at the index where it left of *in*the*original*string*. 
Suppose this is only natural and as it should be, but I haven't had enough coffee to 
find a workaround this morning. (except this:
replace/all the_url "%3A" ":"
replace/all the_url "%2F" "/"
replace/all the_url "\" "/"
but I'd prefer my decode-url method to work).


Do I have to rewrite the rule to look only for "%", so that the next two characters are 
untouched?

~H