Script Library: 1247 scripts

Documentation for: parse-analysis.r

Parse Analysis Toolset

1. What is this all about?

A few functions to help analyse the way Parse works - good for learning too!

2. Why are these functions useful?

Understanding how parse works
Debugging parse rules that are not working
Making your parse rules more efficient

Returns some basic statistics on the Parse "events" that occur, totalled by rule name. The first column (array) represents the number of tests, the second is passes, the third is fails. A test event occurs when parse begins the rule. Pass and Fail events occur when the rule finishes.

This function could be useful to identify rules that are failing too often and therefore being too inefficient. By restructuring or maybe even only reordering your parse rules to accommodate the most likely input sequence you may be able to get your parsing to work faster.

3.1.1 Output format

    <block-of-rules>
    <block-of-test-counts>
    <block-of-pass-counts>
    <block-of-fail-counts>

3.2 Explain-Parse

This function is useful both when learning how parse works and when debugging complex parse rules.

Prints and indented trace of the rules as they are started and finished with the current input index position. Each event is numbered with a step number. Pass and Fail events will show what rule is finishing by referring to the step that it started on.

A nicer way to understand what is going is to use visualise-parse - (see parse-analysis-view.r).

3.2.1 Output format

Begin lines show when a rule begins evaluation:

    <step-number> begin <rule-identifier> at <input-index-position>

End lines show when a rule ends evaluation (successfully or not):

    <step-number> end <rule-identifier> at <input-index-position>
        started-on <step-number-of-begin> pass/fail

3.3 Tokenise-Parse

Treats the rule names as token names, for each successful rule it returns the name, the length of input matched and the input position that the token starts at.

3.3.1 Output format

    <rule-identifier><length-of-matched-input><input-index-position> ...

3.4 Hook-Parse and Unhook-Parse

These are low-level functions used by the other functions to insert tracing codeinto parse rules.

3.5 Example

Here are some simple rules:

    digit: charset {0123456789}
    hexalpha: charset {ABCDEF}
    sample: [hexalpha some digit]

Running explain-parse:

    >> explain-parse [hexalpha digit sample] [parse {A23} sample]
    1 begin sample at 1 level 1
      2 begin hexalpha at 1 level 2
      3 end hexalpha at 2 started-on 2 pass
      4 begin digit at 2 level 2
      5 end digit at 3 started-on 4 pass
      6 begin digit at 3 level 2
      7 end digit at 4 started-on 6 pass
      8 begin digit at 4 level 2
      9 end digit at 4 started-on 8 fail
    10 end sample at 4 started-on 1 pass

In the output above you can see that the Sample rule started on step 1 and finished on step 10.

The digit rule started on step 4 at input index position 2 and finished successfully on step 5 at input index position 3. Ie. that the digit rule matched exactly 1 character.

4. When will these functions not work?

Standard parse programming rules and even some dynamic parse programming should be fine. Some advanced dynamic parse programming may not work with these functions.

Obviously a bug could cause an infinite loop - there is a lot of work going on and a few assumptions being made. Therefore the best use of this script is in an ad-hoc fashion by developers not as part of production programs.

4.1 What do you mean by dynamic parse programming?

Changing the input or the rules as parse executes. If you have the skills to do this you should be able to work out if you can use these functions with your dynamic parse programming.

5. How do these functions work?

Most parse rules are identified by words. For example the word Digit here identifies a parse rule that matches numbers:

    digit: charset {0123456789}

You give the hook-parse function a block of words that identify parse rules and it will create new parse rules that provide the extra functionality along with calling the original rules. In this way the rules are "hooked" in-place. Using this hook technique useful information can be tracked as the REBOL Parse function evaluates the rules. Even existing built-in parse rules can be tracked (using the Bind function).Functions like tokenise-parse and explain-parse use hook-parse automtically so you do not have to use it.

5.1 How can I track built-in rules?

See the XML example in the documentation for parse-analysis-view.r for an example.

6. New REBOL Parse changes

Some parse keywords were introduced that may make some of this a bit easier, but I learnt parse a while agoand haven't updated myself on all the new features. So I have not tested the functions it's behaviour with these.I suspect most things will be fine.

6.1 Will paths in rules work?

The rule block supplied to the functions must be a block of words. Other than that, paths in your rules should be ok.

7. About the script author

Brett Handley started programming REBOL early 2000 and maintains a site of REBOL information and scripts at:

http://www.codeconscious.com/rebol/