More dialects than you think, validating function input.

[1/12] from: brett:codeconscious at: 12-Sep-2002 22:05

I have a function that takes a block! as an argument. This block! should contain only numbers. The question is how do I check for valid input and throw an error on encountering a problem in an efficient manner? My first thought was something like this: f1: function [ [catch] block [block!] ] [kount] [ kount: 0 repeat value block [ if not number? value [ throw make error! "F1 can only process numbers." ] kount: kount + 1 ] ] After reflecting on this, I thought about how a block! passed to a function is one sense is being interpreted by the function according to an implied grammar. That is my requirement in the above function for only numbers, in a sense, implies that my function is processing a fairly simple REBOL based grammar (ie dialect). I don't think it is too outlandish to say every function that processes a block! is implementing grammar processing even if trivial. Hence, the subject title of this message. As a side note, I reckon every offline script can be considered as an out of memory block! ready for loading. Anyway, this thought leads me to PARSE, REBOL's grammar validator. So I recoded my function to look something like this: f2: function [ [catch] block [block!] ] [kount value] [ kount: 0 if not parse block [ any [set value number! (kount: kount + 1)] | set value any! to end ] [throw make error! "F2 can only process numbers."] ] Then I did some quick and dirty testing to see which was faster. On my machine, F2 takes half the time that F1 does. Just thought I'd share my little ramble. Test code below. Cheers, Brett. REBOL [] data: copy [] repeat i 2000 [insert tail data i] f1: function [ [catch] block [block!] ] [kount] [ kount: 0 repeat value block [ if not number? value [ throw make error! "F1 can only process numbers." ] kount: kount + 1 ] ] f2: function [ [catch] block [block!] ] [kount value] [ kount: 0 if not parse block [ any [set value number! (kount: kount + 1)] | set value any! to end ] [throw make error! "F2 can only process numbers."] ] timeit: func [f] [ t1: now/precise repeat i 300 [error? try [f data]] t2: now/precise t2/time - t1/time ] print ["f1" timeit :f1] print ["f2" timeit :f2] insert at data 1001 'a-word print ["f1 - error half way" timeit :f1] print ["f2 - error half way" timeit :f2] --- Website: http://www.codeconscious.com Rebsite: http://www.codeconscious.com/index.r

[2/12] from: brett:codeconscious at: 12-Sep-2002 22:10

Tiny correction to my code so as not to confuse. It does not change the result or conclusion. set value any! to end should read set value any-type! to end Brett.

[3/12] from: lmecir:mbox:vol:cz at: 12-Sep-2002 15:05

Hi Brett, I don't know, why you didn't write it as follows: f3: func [ [catch] block [block!] ] [ if not parse block [any [number!] end] [ throw make error! "F3 can only process numbers." ] ] The parse rule contains END only for compatibility with my proposed PARSE behaviour, otherwise it isn't necessary. Cheers -L ----- Original Message ----- From: "Brett Handley" <[brett--codeconscious--com]> To: <[rebol-list--rebol--com]> Sent: Thursday, September 12, 2002 2:05 PM Subject: [REBOL] More dialects than you think, validating function input. I have a function that takes a block! as an argument. This block! should contain only numbers. The question is how do I check for valid input and throw an error on encountering a problem in an efficient manner? My first thought was something like this: f1: function [ [catch] block [block!] ] [kount] [ kount: 0 repeat value block [ if not number? value [ throw make error! "F1 can only process numbers." ] kount: kount + 1 ] ] After reflecting on this, I thought about how a block! passed to a function is one sense is being interpreted by the function according to an implied grammar. That is my requirement in the above function for only numbers, in a sense, implies that my function is processing a fairly simple REBOL based grammar (ie dialect). I don't think it is too outlandish to say every function that processes a block! is implementing grammar processing even if trivial. Hence, the subject title of this message. As a side note, I reckon every offline script can be considered as an out of memory block! ready for loading. Anyway, this thought leads me to PARSE, REBOL's grammar validator. So I recoded my function to look something like this: f2: function [ [catch] block [block!] ] [kount value] [ kount: 0 if not parse block [ any [set value number! (kount: kount + 1)] | set value any! to end ] [throw make error! "F2 can only process numbers."] ] Then I did some quick and dirty testing to see which was faster. On my machine, F2 takes half the time that F1 does. Just thought I'd share my little ramble. Test code below. Cheers, Brett. REBOL [] data: copy [] repeat i 2000 [insert tail data i] f1: function [ [catch] block [block!] ] [kount] [ kount: 0 repeat value block [ if not number? value [ throw make error! "F1 can only process numbers." ] kount: kount + 1 ] ] f2: function [ [catch] block [block!] ] [kount value] [ kount: 0 if not parse block [ any [set value number! (kount: kount + 1)] | set value any! to end ] [throw make error! "F2 can only process numbers."] ] timeit: func [f] [ t1: now/precise repeat i 300 [error? try [f data]] t2: now/precise t2/time - t1/time ] print ["f1" timeit :f1] print ["f2" timeit :f2] insert at data 1001 'a-word print ["f1 - error half way" timeit :f1] print ["f2 - error half way" timeit :f2] --- Website: http://www.codeconscious.com Rebsite: http://www.codeconscious.com/index.r

[4/12] from: brett:codeconscious at: 12-Sep-2002 23:32

Hi Ladislav, I was less than clear. The kount stuff was to simulate "other processing". The value needs to be set for processing and to report the type in the error message. My mistake for over simplifying the examples, and for stuffing them up. Here is the actual function I was working on: bayes: function [ {Calculate combined probability.} [catch] probabilities [any-block!] ] [p0 p1 d] [ p0: p1: 1.0 if not parse probabilities [ any [ set value number! (p0: value * p0 p1: 1 - value * p1) | set value any-type! to end skip ] ] [throw make error! reduce ['script 'cannot-use 'bayes mold type? get/any 'value]] if zero? d: add p0 p1 [throw make error! "The probabilities cannot be combined."] divide p0 d ] Sorry for the confusion. Brett.

[5/12] from: g:santilli:tiscalinet:it at: 12-Sep-2002 15:26

Hi Brett, On Thursday, September 12, 2002, 2:05:39 PM, you wrote: BH> if not parse block [ BH> any [set value number! (kount: kount + 1)] | BH> set value any! to end BH> ] [throw make error! "F2 can only process numbers."] The second part of the rule will never be reached. ANY does not fail if it matches 0 elements. In your trivial example, your code could look like: if not parse block [any number!] [ ... provided you are accepting empty blocks as input too. Of course, I imagine your example was actually cut down form something bigger, that needed the SET VALUE ... and the count (which you could maybe do faster by just using LENGTH?...). Regards, Gabriele. -- Gabriele Santilli <[g--santilli--tiscalinet--it]> -- REBOL Programmer Amigan -- AGI L'Aquila -- REB: http://web.tiscali.it/rebol/index.r

[6/12] from: greggirwin:mindspring at: 12-Sep-2002 9:56

Brett, et al A very good post, and related to the discussion on error handling to boot! :) I can see at least two major ways to apply this approach. 1) Parse drives the processing of the function, automatically validating the correctness of the argument block as it goes. 2) Parse is used only to validate the block at the entry point, like a precondition in a Design by Contract design, and the operation of the function is separate. You could also use parse to validate return values (i.e. as a post-condition processor). In a large system, and as more things are driven by dialects, this could prove very useful indeed. Now, does anyone have a model they use to return helpful context information when a parse operation fails? Programmers are OK with "Syntax error: expected integer!" but users are probably better served by "I understood the first part ("sell 400 shares of MSFT at"), but then I was expecting to see a monetary value for the sell price (e.g. $50.00), and I didn't, so I couldn't process your request. I'm sorry for any inconvenience this might cause you. Have a nice day." Or maybe just "No sell price was specified in your request." or even a prompt for the missing info. In any case, I agree that this is a good tool to keep in mind. --Gregg

[7/12] from: lmecir:mbox:vol:cz at: 12-Sep-2002 18:35

Thanks. -L ----- Original Message ----- From: "Brett Handley" ... Here is the actual function I was working on: bayes: function [ {Calculate combined probability.} [catch] probabilities [any-block!] ] [p0 p1 d] [ p0: p1: 1.0 if not parse probabilities [ any [ set value number! (p0: value * p0 p1: 1 - value * p1) | set value any-type! to end skip ] ] [throw make error! reduce ['script 'cannot-use 'bayes mold type? get/any 'value]] if zero? d: add p0 p1 [throw make error! "The probabilities cannot be combined."] divide p0 d ] Sorry for the confusion. Brett.

[8/12] from: reffy:ulrich at: 12-Sep-2002 19:56

Watching the thread on error handling ... Giving useful information seems to me to be at least 2 parts: 1. A specific operation fails - need to have information which readily reveals domain error, length error, mismatched operand types, etc. 2. Contextually, it might be important for the developer to control what gets displayed for tracking purposes. Since having one scheme (as the language developer) that suits everyone's taste is very difficult, maybe a compromise could be had? As a developer, I would like to be able to place begin/end markers in my code. The purpose of the markers is to signal the language execution engine the type/degree of capturing trace information. I can imagine several fine/coarse grained levels. The coarseness of the tracing could be at the function level, the statement level, maybe even other levels. I like this approach because I probably don't want the execution engine burdened with something which might yield an overall performance hit. I/you can probably be the best judge of when/what we need information about. It would be up to us to cause the performance hit of tracing/tracking flow of control. Comments? Dick

[9/12] from: joel:neely:fedex at: 13-Sep-2002 7:50

Hi, Dick, You've given me an idea... (Thanks!) [reffy--ulrich--net] wrote:

> Watching the thread on error handling ... >

...

> As a developer, I would like to be able to place begin/end markers > in my code. The purpose of the markers is to signal the language

<<quoted lines omitted: 7>>

> we need information about. It would be up to us to cause the > performance hit of tracing/tracking flow of control.

This is a *real* QAD, but the idea was to write something that would let me "enclose" a block with a marker so that any error occurring within that block would generate a display of all currently active markers as well as the error message itself. Again, this is a quick hack with much room for improvement... Here's the error zone manager, in %errzone.r: 8<---------- errzone: make object! [ zones: [] init: func [] [zones: copy []] new: func [/local newself] [ newself: make self [] newself/init newself ] nest: func [s [string!] b [block!] /local err] [ insert tail zones s either error? set/any 'err try b [ print ["^/Error within"] foreach zone zones [print ["^-" zone]] print ["was^/" mold disarm err newline] halt ][ remove back tail zones ] ] ] 8<---------- Here are some error-prone routines that use ERRZONE, either recursively or in nested function evaluations: 8<---------- do %errzone.r irecurrence: func [n [integer!] /local rzone _recurrence] [ rzone: errzone/new do _recurrence: func [ p [integer!] n [integer!] /local result ][ rzone/nest rejoin ["tracing: " p ", " n] [ result: _recurrence p / n n - 1 ] result ] 1000 n ] nrecurrence: func [n [integer!] /local rzone _recurrence] [ rzone: errzone/new do _recurrence: func [ p [number!] n [integer!] /local result ][ rzone/nest rejoin ["tracing: " p ", " n] [ result: _recurrence p / n n - 1 ] result ] 1000 n ] gzone: errzone/new eenie: func [n [integer!]] [ gzone/nest "eenie" [ loop random 5 [print "blah "] print 1 / n loop random 5 [print "yak "] ] ] meenie: func [n [integer!]] [ gzone/nest "meenie" [ loop random 5 [print "blah "] eenie n loop random 5 [print "yak "] ] ] meinie: func [n [integer!]] [ gzone/nest "meinie" [ loop random 5 [print "blah "] meenie n loop random 5 [print "yak "] ] ] moe: func [n [integer!]] [ gzone/nest "moe" [ loop random 5 [print "blah "] meinie n loop random 5 [print "yak "] ] ] 8<---------- Here are some annotated results of using the above "clients": 8<----------

>> irecurrence 5

Error within tracing: 1000, 5 tracing: 200, 4 tracing: 50, 3 was make object! [ code: 303 type: 'script id: 'expect-arg arg1: '_recurrence arg2: 'p arg3: [integer!] near: [result: _recurrence p / n] where: 'nest ] 8<---------- I included this one because it surprised me at first. (I didn't read the error object carefully enough!) The value of 50 / 3 is not integral, so the attempt to invoke _RECURRENCE failed because of an argument type error. 8<----------

>> nrecurrence 5

Error within tracing: 1000, 5 tracing: 200, 4 tracing: 50, 3 tracing: 16.6666666666667, 2 tracing: 8.33333333333333, 1 tracing: 8.33333333333333, 0 was make object! [ code: 400 type: 'math id: 'zero-divide arg1: none arg2: none arg3: none near: [result: _recurrence p / n] where: 'nest ] 8<---------- Our classic poster-child error... 8<----------

>> moe 1

blah blah blah blah blah blah blah blah blah blah blah blah 1 yak yak yak yak yak yak yak yak yak yak yak yak yak yak = []

>> moe 0

blah blah blah blah blah blah blah blah blah blah blah blah blah Error within moe meinie meenie eenie was make object! [ code: 400 type: 'math id: 'zero-divide arg1: none arg2: none arg3: none near: [print 1 / n loop] where: 'nest ] 8<---------- The first call to MOE succeeded (producing lots of simulated work and output), but the second failed four levels of function evaluation down. As I said above, this is a bit kludged up, but serves as a proof of concept. For example, we could define a new function-defining word something like this safe-foo: zonefunc zoneobj "FOO Zone" [...args...] [...body...] in contrast with foo: func [...args...] [...body...] to wrap the body of the newly-defined function with an appropriate use of the ERRZONE concept. -jn- -- ; Joel Neely joeldotneelyatfedexdotcom REBOL [] do [ do func [s] [ foreach [a b] s [prin b] ] sort/skip do function [s] [t] [ t: "" foreach [a b] s [repend t [b a]] t ] { | e s m!zauafBpcvekexEohthjJakwLrngohOqrlryRnsctdtiub} 2 ]

[10/12] from: brett:codeconscious at: 13-Sep-2002 22:13

Hi Gregg,

> A very good post, and related to the discussion on error handling to boot!

Thanks!

> In a large system, and as more things are driven by dialects, this could > prove very useful indeed.

It is probably overkill for some situations, on the other hand you never know when you might want that one-off to be included in some large undertaking. :^)

> In a large system, and as more things are driven by dialects, this could > prove very useful indeed. Now, does anyone have a model they use to return > helpful context information when a parse operation fails? Programmers are

> with "Syntax error: expected integer!" but users are probably better

served

> by "I understood the first part ("sell 400 shares of MSFT at"), but then I > was expecting to see a monetary value for the sell price (e.g. $50.00),

and

> I didn't, so I couldn't process your request. I'm sorry for any > inconvenience this might cause you. Have a nice day." Or maybe just "No

sell

> price was specified in your request." or even a prompt for the missing

info. I think you've got at least three issues there. One is relating to speaking the user's language. The second is determing exactly where in the input the error is - which is more difficult. Third is showing what is expected next. Your "I understood the first part ..." is a good solution for showing the user where processing got to. As for the expecting bit. I don't think there is a general solution. Having to tell the user "error...expecting money" adds an application requirement making a bigger application. Each "expecting X" becomes an alternative rule to the rule that is attempting a match. So your rules bloat out quite a bit if you want to report on every grammar term. For some apps maybe it is enough to say "Recognised two stock orders. Could not recognise the rest of the instructions." Regards, Brett.

[11/12] from: greggirwin:mindspring at: 13-Sep-2002 12:52

Hi Brett, << As for the expecting bit. I don't think there is a general solution. Having to tell the user "error...expecting money" adds an application requirement making a bigger application. Each "expecting X" becomes an alternative rule to the rule that is attempting a match. So your rules bloat out quite a bit if you want to report on every grammar term. For some apps maybe it is enough to say "Recognised two stock orders. Could not recognise the rest of the instructions." >> I think I didn't say that I thought more than I said. :) What I was *thinking*, but didn't write in my message, was how many parsers provide context information when a syntax error occurs and that PARSE might be able to make available that kind of information so we don't have to do it all ourselves at the individual rule level. F'rinstance:

>> add 1

** Script Error: add expected value2 argument of type: number pair char money date time tuple ** Near: add 1 REBOL has rules about how to parse code based on the spec block (the rule) for a function, no? So, based on the rule(s) that have succeeded, you know what state the parser was in when it hit a rule that failed, and what that rule (or rules) was. I like the way it works today, not popping an error directly to the user, but you have no idea exactly where the parse failed unless, as you point out, you add stuff to all your rules to track exactly what's going on. Maybe that ends up being a viable alternative in some cases, but if PARSE gave us some more clues, it might be useful. Another problem I have is that I start thinking way ahead of myself about what could be . I can see great potential for higher level dialects driving PARSE; generating rules dynamically (including specific constraints) based on data from a database, previous messages in a conversation, etc.; and tools to help visualize and debug dialects. I just about jump out of my seat when I think about some of the possibilities. :) --Gregg

[12/12] from: brett:codeconscious at: 14-Sep-2002 10:15

Hi Gregg,

> What I was *thinking*, but didn't write in my message, was how many

parsers

> provide context information when a syntax error occurs and that PARSE

might

> be able to make available that kind of information so we don't have to do

> all ourselves at the individual rule level. F'rinstance:

I think I understood what the intent of your message was and even was going to suggest a new keyword to assist with the context info, but when I thought of some of the parse rules I'd written I could not imagine how any context information issued from them would help me produce a meaningful response to the user. So I unimaginatively said I don't think there is a general solution. :^)

> Another problem I have is that I start thinking way ahead of myself about > "what could be". I can see great potential for higher level dialects

driving

> PARSE; generating rules dynamically (including specific constraints) based > on data from a database, previous messages in a conversation, etc.; and > tools to help visualize and debug dialects. I just about jump out of my

seat

> when I think about some of the possibilities. :)

Hardly a problem. Perhaps just not always immediately measurably productive. On the other hand such thinking is part of the creative search I'm sure. Regards, Brett.

Notes

Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted