Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Bug! Rebol's parsing of urls is incorrect.

From: brett:codeconscious at: 12-Feb-2001 15:04

Hi Andrew
> What we really need is a validating parser/loader/scanner, one for each > scheme. I've been skimming through the schemes and noticed there's some > repetition in Rebol's open functions, which could be abstracted out. Also, > if a validating parser rule was incorporated into each scheme to check for > wellformedness, that would be good. As it stands now, I can't use Rebol's > 'load/next function to extract a URL from plain text with punctuation
around
> it. For example extracting the URL from the following: "Rebol's HQ" > http://www.rebol.com! requires me to write my own URL parser.
I suspect that even if you had such a validating parser rule in the scheme it would not make any difference to the way Rebol scans the url! datatype - for the reason that Holger pointed out. Thus it would not help using the load/next function either. You may be better off writing your own parser. By doing so you are adding crucial knowledge to the solution that Rebol doesn't have - that being that your input is actually plain text - not a Rebol loadable dialect. Also, I recall a warning that Larry gave some time back about using load - it puts the words in system/words thus using up a finite resource. So if I were to load this email I'm writing (apart from the errors) I would have words like "also", "recall", "warning" and "larry" in system/words. I do use load for loading tag names and attributes in my html manipulation scripts, but I calm myself with the knowledge that the number of tag names is probably finite. Plain text though is coming from a much larger domain of possibilities. That said, having a validating parser rule as part of each scheme does seem appropriate. It would allow you to make your own parser and know that it will not need modification as more schemes are added. Brett.