Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

Load of trouble?

 [1/8] from: sanghabum:aol at: 29-Nov-2001 19:06


Hi all. Romano, in his response to me about strings and blocks reminded me of some pain I'd had trying to use 'load as a validation tool. Can I lay out what I see as then problems, and maybe the gurus can tell me what I'm missing? Thanks .... I have a raw string entered by a user: raw-input: " 12-may-87 " It is supposed to be a date (in this case). I want to convert it to its internal Rebol format, and see if it is valid. date? cleaned-input: load raw-input looks like I've done the job in one line. Good old Rebol. But, before I put this on my website, I test it with bad things a user could enter: raw-input: " Rebol [quit] " date? cleaned-input: load raw-input This shuts down the console. Luckily, I remember Jeff warning about these sorts of things, and recommending 'load/all raw-input: " Rebol [quit] " date? cleaned-input: first load/all raw-input That doesn't shut the console down, but a canadian or UK postcode will raise an error: raw-input: " W1A 4AA " date? cleaned-input: first load/all raw-input ** Syntax Error: Invalid integer -- 4AA ** Near: (line 1) W1A 4AA So we need to wrap it in a try block: raw-input: " W1a 4AA " clear cleaned-input error? try [cleaned-input: first load/all raw-input] date? cleaned-input That works! But using 'load at all introduces a subtle problem. Each 'load of a raw-input string potentially adds entries to System/Words. When that reaches about 4000, the system crashes. Unrecoverably. Which means the code above can't be 24x7 on a server. And it won't last more than a few hundred lines if it's part of a clean-up operation on an incoming text file. So (my conclusion) is that 'load for validation of raw data is a dead end. And if I'm going to continue using Rebol, I need to buckle down and write some serious data validation code first. Or RT need to rethink System/Words. Comments and corrections are welcome, as usual, thanks, Colin.

 [2/8] from: rotenca:telvia:it at: 30-Nov-2001 2:11


Hi, Colin
> That doesn't shut the console down, but a canadian or UK postcode will
raise
> an error: > > raw-input: " W1A 4AA " > date? cleaned-input: first load/all raw-input > > ** Syntax Error: Invalid integer -- 4AA > ** Near: (line 1) W1A 4AA
The point is that 4AA is interpreted like an integer because it starts with a digit, but then the conversion fails (A can be in a integer). No Rebol datatype! match 4AA. The difference between Load and To-Word is that To-Word creates a global word with any sequence of chars: to-word "123" creates the global word! '123, while Load interprets "123" like an integer!.
> But using 'load at all introduces a subtle problem. Each 'load of a
raw-input
> string potentially adds entries to System/Words. When that reaches about > 4000, the system crashes. Unrecoverably.
You must use to-block which make the same check of load about datatypes but doesn't add the words to the global context. These kind of words can be called unloaded words or "masked words" (there is not an ufficial name for them). The drawback is that the words generated by to block are out of context and to use them like variable you must first bind them to any context (object function use, not global of course). I have written some func to use this kind of words. You will find them in my rebsite (Romano) under the name gcmask.r http://web.tiscali.it/anarkick/gcmask.r Recently i have found a bug in Rebol which crashes the interpreter when you try to Set or Get a masked word out of context, instead of generating the error "the word is not defined in this context". So do not try to Set or Get this type of words (unloaded and undefined word) before adding them to a local context; you can use them like set-word (to get an error not a crash) but not use them with the functions Set or Get. Probably Set presumes that the word is in the global context. The thing is bad because you must assicure that no func Set or Get the unloaded and undefined word. Is high probable that a function like Layout will set/get the word if you try to use it in the layout definition.
> Which means the code above can't be 24x7 on a server. And it won't last more > than a few hundred lines if it's part of a clean-up operation on an incoming > text file.
The solution: use to-block to create unloaded words. A consequence of this is that often (not always) you have to manipulate them in strings, not in block (but here you will not have any problem :-)
> So (my conclusion) is that 'load for validation of raw data is a dead end.
I agree.
> And if I'm going to continue using Rebol, I need to buckle down and write > some serious data validation code first. Or RT need to rethink System/Words.
At least RT should correct the set bug. Perhaps there are other Rebol bugs connected to the use of unloaded words, but i not sure. --- Ciao Romano

 [3/8] from: brett:codeconscious at: 30-Nov-2001 12:26


Hi Colin, <skipped instructive comments>
> But using 'load at all introduces a subtle problem. Each 'load of a
raw-input
> string potentially adds entries to System/Words. When that reaches about > 4000, the system crashes. Unrecoverably. > > Which means the code above can't be 24x7 on a server. And it won't last
more
> than a few hundred lines if it's part of a clean-up operation on an
incoming
> text file.
True. However, Romano has shown that TO-BLOCK can be used instead of LOAD to avoid flooding system/words.
> So (my conclusion) is that 'load for validation of raw data is a dead end.
I'm thinking I agree.
> And if I'm going to continue using Rebol, I need to buckle down and write > some serious data validation code first. Or RT need to rethink
System/Words. Serious validation code, but not too onerous. I took your idea of using TRY, added in Romano's on using TO-BLOCK, and mixed in my own by using PARSE for validation. Poured into a function and cooked for 5 minutes: accept-date: function [raw-input] [cleared-input] [ if error? try [ either parse to-block raw-input [set cleared-input date!] [ RETURN cleared-input ] [throw make error! "Not a valid date!"] ] [cleared-input: none] ] The above function returns none if it doesn't like what it sees, otherwise it returns a date:
>> accept-date " W1a 4AA "
== none
>> accept-date " 27-may-01"
== 27-May-2001 Ok It is not a one-liner, but with more thought maybe it could handle more basic types than just date therefore justifying it's bulk. And if you want to accept date formats where like "27 may 2001" then parse will be your friend. HTH Brett [part time Parse evangelist :) ]

 [4/8] from: ptretter:charter at: 29-Nov-2001 19:40


what would happen if you use LOAD to an object such as: some-obj: context [locals: load [...some stuff..]] Paul Tretter

 [5/8] from: brett:codeconscious at: 30-Nov-2001 13:04


Hi Paul, Using load affects system/words no matter what happens after you have used it. So making an object after using load does not change anything as far as I know. Brett.

 [6/8] from: sanghabum::aol::com at: 30-Nov-2001 14:09


Thanks to Brett and Romano for the (as usual) insightful explanations. to-block is a great discover (thanks Romano). I doubt if I would have come across or, or have worked out that it corrects the flaws in Load (while adding some other issues about contexts). And that to some extent sums up the strengths and weaknesses of Rebol's marketing (or education or training or whatever you want to call it) ..... I write industrial-strength code, and I write proof-of-concept demos. Rebol is awesome for the demos (it makes me, at times, look semi-super human, rather than my usual semi-human). But I do not have the confidence that I can write industrial-strength code in it. It has too many undocumented "gotchas". I don't have a problem with gotchas (all languages have them) but I want them spelled out in advance so my mission-critical stuff doesn't code right into them. With most other languages there's a written body of experience that can get me up to speed fast. Rebol is lacking this.... I hope the resolution is just a matter of time. (A technical writer could probably knock up a Rebol Gotchas list from the dialogs on this list in no time squared. And it would be an asset for the language). Colin.

 [7/8] from: g:santilli:tiscalinet:it at: 1-Dec-2001 13:08


Hello [Sanghabum--aol--com]! On 30-Nov-01, you wrote: S> I have a raw string entered by a user: S> raw-input: " 12-may-87 " S> It is supposed to be a date (in this case). I want to convert S> it to its internal Rebol format, and see if it is valid. When handling dates, I'd suggest to use TO-DATE instead: it handles much more formats than LOAD and just gives an error if the input is not a date.
>> to-date "12-may-87"
== 12-May-1987
>> to-date "12/5/87"
== 12-May-1987
>> to-date "12 5 87"
== 12-May-1987
>> to-date "1987 5 12"
== 12-May-1987
>> to-date "invalid"
** Script Error: Invalid argument: invalid. ** Where: to date! :value You'll need to TRIM your strings tough:
>> to-date " 12-may-87 "
** Script Error: Invalid argument: 12-may-87 . ** Where: to date! :value Romano's suggestion is useful if the user could input values of different datatypes (imagine accepting a file or an url), but if you know the type in advance, TO is the best IMHO. (Of course, you'll need to do your own checks for certain types, such as all of the any-string!s...) Regards, Gabriele. -- Gabriele Santilli <[giesse--writeme--com]> - Amigan - REBOL programmer Amiga Group Italia sez. L'Aquila -- http://www.amyresource.it/AGI/

 [8/8] from: sanghabum:aol at: 1-Dec-2001 15:04


Hi Gabrielle,
> When handling dates, I'd suggest to use TO-DATE instead: it handles > much more formats than LOAD and just gives an error if the input
<<quoted lines omitted: 5>>
> you'll need to do your own checks for certain types, such as all > of the any-string!s...)
Thanks for the analysis. Romano abd Brett's responses were pretty close to what I wanted..... That's a generic validation process for raw strings that does three things: 1. Convert string to its internal Rebol type 2. Ensure that type is the right type for the data field 3. Apply field-specific rules to ensure the content is right for the field and the data type. (So, for example, a date field may be blank; have a date; or have any of the strings "Today" "Tomorrow" "Soonest". Using everything I've learn on the list in the last few days, I can do the first two with this little function (please suggest improvements!) ============== is-this?: func [ "Checks and cleans a string" Types [block!] "List of acceptable Rebol" Raw-data [string!] "Item to be checked" /local Block-data Clean-Data ] [ ;; Returns: either Raw-data converted to its Rebol datatype, if possible ;; or False - if it is not one of the types on the list Block-data: copy [] error? try [block-data: to-block Raw-Data] either (length? Block-data) = 1 [Clean-data: first Block-data] [Clean-data: Raw-data] foreach RebolType Types [if (type? Clean-data) = get to-word RebolType [Return Clean-data] ] ; for Return False ] ; func ============ Examples of use: is-this? [date!] " 24/6/44 " is-this? [date! tuple!] " 10.08.1976" is-this? [url! email!] " ftp://[dorothy--somewhere--otr]" is-this? [integer!] " 45.5" is-this? [integer! decimal!] " 45.5" Thanks again, --Colin.

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted