[REBOL] Re: Bug! Rebol's parsing of urls is incorrect.
From: brett:codeconscious at: 12-Feb-2001 15:04
> What we really need is a validating parser/loader/scanner, one for each
> scheme. I've been skimming through the schemes and noticed there's some
> repetition in Rebol's open functions, which could be abstracted out. Also,
> if a validating parser rule was incorporated into each scheme to check for
> wellformedness, that would be good. As it stands now, I can't use Rebol's
> 'load/next function to extract a URL from plain text with punctuation
> it. For example extracting the URL from the following: "Rebol's HQ"
> http://www.rebol.com! requires me to write my own URL parser.
I suspect that even if you had such a validating parser rule in the scheme
it would not make any difference to the way Rebol scans the url! datatype -
for the reason that Holger pointed out. Thus it would not help using the
load/next function either.
You may be better off writing your own parser. By doing so you are adding
crucial knowledge to the solution that Rebol doesn't have - that being that
your input is actually plain text - not a Rebol loadable dialect.
Also, I recall a warning that Larry gave some time back about using load -
it puts the words in system/words thus using up a finite resource. So if I
were to load this email I'm writing (apart from the errors) I would have
words like "also", "recall", "warning" and "larry" in system/words. I do
for loading tag names and attributes in my html manipulation scripts, but I
calm myself with the knowledge that the number of tag names is probably
finite. Plain text though is coming from a much larger domain of
That said, having a validating parser rule as part of each scheme does seem
appropriate. It would allow you to make your own parser and know that it
will not need modification as more schemes are added.