r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[I'm new] Ask any question, and a helpful person will try to answer.

mhinson
14-Apr-2009
[1702]
Hi, 
Pekr,

I appreciate that the concept for parsing is different to the use 
of regular expressions, but there are some things that do map from 
one to the other & I wondered if any table of those things existed. 
 As a noob sometimes the hardest questions to get answered are the 
ones where the answer is that there is no concept such as that sought 
by the noob. e.g. how do you grow strawberries in the sea?
 

The first match must be at the begining of the line. If it was the 
first line in the set then it would not be after a new line, but 
other cases it would be.


I will use parse/all from now, I like the extra control you describe.


here a few lines of a test input, the script I am hoping to develop 
is to parse the config files from Cisco devices in order to extract 
the layer 2 & 3 information together with the interface names & descriptions.

lines: {interface FastEthernet0
 description The connection to the printer
!
interface FastEthernet1
!
interface Vlan1
 description User vlan (only 1 vlan allowed)
 no ip address
!
interface Dialer0
 description Outside
 ip address negotiated
!
interface BVI1
 description Inside
 ip address 192.168.0.1 255.255.255.0
!
ip sla 3
 icmp-echo 217.0.0.1 source-interface Dialer0

ip route 0.0.0.0 0.0.0.0 Dialer0

interface ATM0.1 point-to-point
 no ip redirects
 no snmp trap link-status
 pvc 0/38
  pppoe-client dial-pool-number 1
 !
}


; sqlab, your change to use "thru newline" does what I wanted in 
this case which is good.

; my next step is to try & understand the "or" construct properly 
as the code below dosn't quite cut it.

wanted: copy []
interface: ["interface" [to #"^/" | to "point-to-point"]]

parse lines [any [[copy temp interface (insert tail wanted temp)] 
| thru newline ]]
foreach line  wanted [print line]

; thanks very much for your help, /\/\
Pekr
14-Apr-2009
[1703x2]
I am far from parse guru, but above rule (while works) looks weird 
:-) Why to produce interface rule that way? The line is ending with 
line terminator anyway, no?

parse/all lines [
  any [
    [ "interface" copy int-name to newline
       (print int-name)
       newline
     | skip
    ]
  ]
]
... this is really simpler, no subrule to ruin your brain is needed 
...
sqlab
14-Apr-2009
[1705]
I am not sure that I understand your intention.

Do you want just  interface ATM0.1, then you have to switch the order 
of your interface rule, as the condition to  #"^/" (newline)  is 
already true and done, and your cursor behind  "point-to-point".
As the first part is true, the second will never be done.
Pekr
14-Apr-2009
[1706x2]
should point-to-point be filtered out? Then the rule would be a bit 
different ..
Slightly different version:

wanted: copy []

spacer: charset " ^/"
name-char: complement spacer

interface: [
  "interface "
  copy int-name some name-char
  (append wanted int-name)
  spacer
] 

parse/all lines [any [interface | skip]]

print mold wanted
mhinson
14-Apr-2009
[1708]
yes, point-to-point needs to be ignored from the result, an other 
similar cases in real life.

once the interface string & details are found the script will need 
a sub search that is looking for "description" or "ip address"


I was hoping that by extracting the rule used for each search i would 
make it easier to add new rules as the requirement becomes clear.

I tried swapping the order in the rule to
interface: ["interface" [to "point-to-point" | to #"^/"]]
but this just finds everything in the whole input.


Perhaps I am to old to learn this.  I worked programming in Pascal 
a good few years ago, but only for about a year.

I failed to grasp SmallTalk more recently & I am really struggling 
with this.

Thanks fpr all your helps. /\/\
Pekr
14-Apr-2009
[1709x2]
to [ aaaa | bbbb] is long time parse enhancement request, which is 
not yet implemented, but is planned for 3.0. It would really make 
lifes of parse beginners much easier. Your parse rule simply means 
- try to find "point-to-point" or the end of the line. But - it looks 
for the point-to-point till it reaches end of the input string.
mhinson - just don't give up ... if you are beginner with REBOL, 
you choosed to start with pretty advanced topic.
Henrik
14-Apr-2009
[1711]
yes, parsing is one of the most difficult topics of REBOL.
mhinson
14-Apr-2009
[1712]
Thanks for the encouragement..  I wont give up yet for a good while.


Most of the programming I have done is out of a need to produce a 
specific result & that quite often needs to be fairly complex, however 
having a real need also makes the effort seem more worth while.


I appreciate that parsing is quite hard, but it also seems to be 
one of the features that differentiates REBOL from other languages 
& is often refered to as being more efficent once the concepts are 
fully grasped.   If this is not true, then perhaps I would be better 
off with php or perl etc.


I have also already had some fun with the very straight forward graphical 
stuff which is fantastic.


I am off out now, I hope to make a bit more code work tommrow as 
I am on holiday this week. :-)

Thanks again
Pekr
14-Apr-2009
[1713x3]
you can also use rebol and call php or perl for some stuff :-) However 
- you rules could be made - you just need to scatter it into sections 
and find some rules for the parsed file structure.
spacer: charset " ^/"
name-char: complement spacer

interface: [
  "interface "
  copy int-text some name-char (print ["interface: " int-text])
  (append wanted int-text)
  thru newline
]

description: [
   "description "
   copy desc-text to newline (print ["description: " desc-text])
   newline
]

ip-address: [
  ["ip address "
   copy add-text to newline (print ["ip address: " add-text])
   newline
   | "no ip address" newline (print ["ip address:" "no adress"])
  ]
]


int-section: [interface any [description | ip-address | "!" break 
| skip]]

parse/all lines [any [int-section | skip]]
... ignore (append wanted inte-text) above - I did not use it in 
the code, I just used print to check how sections work ...
mhinson
15-Apr-2009
[1716x2]
Hi, I have broken this down to try & understand it, but my understanding 
is still very vague, paticularly in respect of the order of things 
like the copy statement & also the number of brackets needed is confusing 
me.


lines: {junk Interface fa0
!
interface fa1}

spacer: charset " ^/"
name-char: complement spacer

parse/all lines [
    any [   [   [
                "interface " copy int-text some name-char 
                (print ["interface: " int-text]) 
                thru newline
                ] any ["!" break | skip]
            ] | skip
        ]
    ]



I need to find some way to make it only get the "interface " if it 
starts at the first position on the line.  

I thought I needed to remove the word "any" to do this, but that 
did not work.
Perhaps I should also say that the structure of these Cisco config 
files tends to have the section start at the first position & sub 
sections are indented. The use of "!" is a bit sporadic & varies 
in different contexts.  I have been trying to hunt down a bunch of 
test examples without success, test data that can be shared freely 
is hard to get hold of. Thanks for your help.
PeterWood
15-Apr-2009
[1718x2]
It is quite easy to find something that starts in the first postion 
of a line by matching against newline+the something.


I'm too lazy to remember the newline character so I tend to write 
something like this:

>> interface: join newline "interface "

== "^/interface "

>> spacer: charset to string! newline
== make bitset! #{

0004000000000000000000000000000000000000000000000000000000000000

}


>> name-char: complement space
r
== make bitset! #{

FFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

}


 >> parse/all lines [any [interface copy int-text some name-char (print 
 ["interface: " int-text]) | skip]]
interface:  fa1
== true
In case this isn't clear. I'll try to explain the parse rule.


First any effectively says to match any of the rules in the following 
block until the end of the string is reached.

The first rule in the block is 

 [interface copy int-text some name-char (print ["interface: " int-text)]

says match with the word interface (newline + "interface ")

then if there is a match,  copy some name-char which says copy one 
or more characters which match the criteria of a name-char

then if there are some name-char characters evaluate the rebol code 
in parentheses.


If there wasn't a match with that first rule, then the second rule 
that follows the | will be applied.

skip will pass over one character and always provides a match.
mhinson
16-Apr-2009
[1720]
Thanks for your help. 

I am beginning to wonder if what I am trying to do is not possiable 
in Rebol. 

I am impressed at the number of responses, but I still cant find 
a way to use all the bits together to create a structure that is 
going to find the bits of data I am after.  


One of the problems seems to be that catching the the data starting 
with new line & ending at newline uses up the "newline" for the following 
line so then that line gets missed.


Is there really no symbolic way in Rebol to identify the begining 
of the line without using the newline char from the end of the previous 
line?
PeterWood
16-Apr-2009
[1721x2]
Mike - The method that I showed you does not use up the "newline" 
at the end of the line. If you check again, the parse rule simply 
says copy in-text some name-char. This "stops" before the newline 
at the end of the line.


In fact guessing at your requirements a little and assuming the name-char 
is available. Some thing along these lines should be close to what 
you want:


keywords: ["^/interface " | "^/another keyword " | "^/yet another 
kerword"]

parse/all lines [any
  [ 
     copy int-keyword [keywords copy int-text some name-char (
       print  int-keyword ": " int-text]
    )
    |
    skip
 ] 
]

{I obviously haven't tested this code.)
Sorry a typo, this line 
   copy int-keyword [keywords copy int-text some name-char (
should be
   copy int-keyword keywords copy int-text some name-char (
sqlab
16-Apr-2009
[1723]
I see just two ways to get what you desire

either you define different rules for interface at the beginning 
and interface after newline

or you do it in a two pass way:  first you separate the lines (either 
by parse or by read/lines) and then you process every line by itself.

I would go the easy way with two passes.
mhinson
16-Apr-2009
[1724]
The mist maybe slowly clearing (sorry to be so slow to catch on).

The 2 stage process may be the answer, perhaps I can add a key char 
at the first line position when I read the file, then use this as 
the line start reference, but continue to use the end of line as 
normal.


I think I understand Peter's example & have tweaked it a bit to make 
it work for me.


lines: {~junk Interface fa0
~!
~interface fa1
~interface fa2 point-to-point
~!
~interface Fa3
~ description test three
~ ip address 1.1.3.3 255.255.255.0
~!
~interface Fa4
~ ip address 1.1.4.4 255.255.255.0
~!
~interface Fa3
~ description test four etc
~}

spacer: charset "^/"
name-char: complement spacer
stopwords: "point-to-point"

keywords: ["~interface " | "~ description " | "~ ip address"]

parse/all lines [any
  [ 

     copy int-keyword keywords copy int-text [to stopwords | some name-char] 
     (
       print  [int-keyword ": " int-text]
    )
    |
    skip
 ] 
]
sqlab
16-Apr-2009
[1725x2]
This got very long, but i think it should work


ifrule: [ ifa: "interface"  some [ ife: "point-to-point"  break | 
ife: newline    break | skip  ]  (append/only  append wanted copy/part 
ifa ife   interf:  copy [] ) ]

drule: [ "description" copy descr to newline (append interf descr) 
]
iprule: ["ip address" copy ip to newline (append interf ip)  ]
norule: ["no" to newline]
pvcrule: ["pvc" to newline]
pprule: ["pppoe" to  newline]
!rule: ["!" to  newline]

rule: [(wanted: copy [] ) some [ifrule | some  [

 s: " interface"  | #" "   |  drule | iprule | norule | pvcrule | 
 pprule | !rule |   break ] thru newline  ]   
] 
parse/all lines rule
There is a flaw
use this
rule: [(wanted: copy [] ) some [ifrule | some  [

 s: " interface" (interf: copy []) | #" "   |  drule | iprule | norule 
 | pvcrule | pprule | !rule |   break ] thru newline  ]   
] 
prevents collecting the not wanted interface attributes.
Pekr
16-Apr-2009
[1727]
uh, was on slow connection, so my reply got lost. Mhinson - there 
is no symbolic way to represent beginning of the line. I don't know 
any in any system. The only thing I know is end-of-line (newline). 
I know what you probably mean - you want to identify beginning of 
your lines, but even for first line (so not a rule, matching newline 
first, then next char = beginning of line). But - there is still 
various ways of how to do it. First - I think that your config files 
are chaos. Do they have any rules for some sections at all? :-) I 
also like what sqlab mentioned - sometimes it is easier to break 
stuff into 2 pass strategy. Read/lines is your friend here. You can 
try it on text files and you'll see, that the result is going to 
be a block of lines. I usually do:

data: read/lines %my-data-file.txt

;--- remove empty lines from block of lines ...
remove-each line data [empty? trim copy line]

foreach line data [do something with data ....]


Simply put - if rules for parser are out of my scope of capabilities 
(which happens easily with me :-), I try to find my other way around 
...
mhinson
16-Apr-2009
[1728]
sqlab, I like this as it also gives the extracted data some structure, 
which will be essential when using it.

Pekr the type of symbolic start & end of line is described as regular 
expression anchoring
http://www.regular-expressions.info/anchors.html


matching a line using anchoring in the implimations I have seen does 
not preclude the following line from being matched even in this example.

^abcd$ will match both lines.
abcd
abcd


In some contexts this is concidered an extention to regular expressions, 
but it is very useful.
Izkata
16-Apr-2009
[1729x2]
Also, this is a bit slower, but avoids using complicated parse rules:
>> lines: {junk Interface fa0
{    !
{    interface fa1}
== "junk Interface fa0^/!^/interface fa1"

>> SplitLines: parse/all lines {^/}		; {^/} is a string containing 
only the newline character, so this is a list of the separate lines
== ["junk Interface fa0" "!" "interface fa1"]
>> foreach line SplitLines [               
[    if all [                                

[        not none? find line {interface}		;Find returns none! (equivalent 
of NULL or NIL) on "!"

[        head? find line {interface}		;find goes to the first instance 
of what is being searched for, and head? checks if it's currently 
at the beginning of the line
[        ][print line]
[    ]
interface fa1	;The only match
(hah, bit late to the party... I see it's gone beyond the simple 
question now)
mhinson
16-Apr-2009
[1731]
there is a lot to be said for straight forward finds & excludes, 
paticularly if it is done repeatedly on the previous output.

I am trying to understand how to use Rebol in a way that will be 
flexable to read maybe a few hundred Cisco config files & command 
outputs with perhaps 20 or 30 different types of rules for finding 
stuff then putting it into a structure that will be easy to search 
for patterns & extract summeries of  information. All the information 
you might have in a network diagram, but in a text or database format.
Sunanda
17-Apr-2009
[1732]
One huge parse may be technically neat. But it probably does not 
match the real world needs.


Petr's (and other's advice) to break the problem down into managable 
(and separately debuggable chunks) is a good approach.


And, remember, in the Real World, you may also need to track original 
line number (before removal of comments and blanks) to be able to 
give good error messages :  "Bad data encountered near line 12,652"
mhinson
17-Apr-2009
[1733]
I have been studying the code from sqlab but I cant understand it 
enough to modify it. This is a deconstruction of part of it with 
my comments added. I would love a hand to understand this a bit more. 
 I cant find any documentation for this sort of thing that I can 
understand. 

I have also been trying to retrieve an index number when reading 
lines so it can be used as suggested by Sunanda. drawn a blank so 
far.



parse/all lines [                ;; parse the whole block called 
lines /all makes parsing only use values given below 

                                            ;; I am not sure if this is itteratied or the whole block parsed 
                                            as one. 
	(wanted: copy [])  ;; initalise wanted 

 | some [                 ;; one or more matches needed to return 
 true

  ifa: "interface"  some [   ;; ifa is given a string value right in 
  the middle of the parsing code

                                            ;; I see why, but not how this is able to slip into the middle here

                                            ;; then some starts another block so perhaps the "interface" is used 
                                            by parse too??

   ife: "point-to-point"  break  ;; no idea how the syntax works here
			| ife: newline    break           ;; or here

   | skip                                      ;; this skips I think 
   till one of the OR conditions are met from below?
		]

  (append/only  append wanted copy/part ifa ife   interf:  copy []) 
    ;;  I dont understand what block append/only is working on here

                                                                                                                                           ;;  append to block wanted using a part copy between ifa & ife but 
                                                                                                                                           I 

                                                                                                                                           ;;  dont understand the source for the copy 

  | some [                                                     ;; I 
  think perhaps all the below rules are end or search paterns?   
			s: " interface" (interf: copy [])
	        | drule
	        | iprule
	        | norule
	        | pvcrul
	        | pprule
	        | !rule
	        | break 
		] thru newline           ;; final catchall end search pattern. 
	]
]


Sorry to ask so many questions, feel free to throw me out if this 
is just too much, but I have spent several hours on this fragment 
allready. Thanks.
Henrik
17-Apr-2009
[1734]
I think we need to take them a few bits at a time
mhinson
17-Apr-2009
[1735]
:-)
Geomol
17-Apr-2009
[1736]
You can parse strings and blocks. The /all refinement is used when 
parsing strings, not blocks. From your first comments, it seems, 
you're parsing blocks, so you don't need /all. What is lines? A string 
or a block?
Henrik
17-Apr-2009
[1737]
the ife: mentions you have there are not strings that are set in 
the middle of things. a set-word! will register the current index 
in the block being parsed.
mhinson
17-Apr-2009
[1738]
lines is from something like
lines: read %file.txt

or lines: {line one
line2
line3}
Geomol
17-Apr-2009
[1739x2]
Ok, you're parsing a string then. Then using /all is ok.
Put the
wanted: copy []

up front before you parse. Then drop the first or, |, just before 
SOME
Henrik
17-Apr-2009
[1741x3]
the difference between using a set-word and SET word!:

parse [a b c d] [
	w1: word! (probe w1)
	w2: word! (probe w1 probe w2)
	set w3 word! (probe w1 probe w2 probe w3)
	w4: word! (probew1 probe w2 probe w3 probe w4/1)
]
using a get-word! will allow you to change the position of parsing.
basically you must remember that a dialect doesn't uphold normal 
REBOL syntax.
mhinson
17-Apr-2009
[1744]
It sounds as if I have missed the understanding of what a dialect 
is.
[unknown: 5]
17-Apr-2009
[1745]
are you familiar with SQL?  SQL is a form of dialect.
mhinson
17-Apr-2009
[1746]
I know SQL in very general terms, but could not write a query
[unknown: 5]
17-Apr-2009
[1747]
That isn't important but what is important to understand a dialect 
is that a dialect means of expression that is interpreted by the 
underlying language.  Consider the following:

buy two cups soda at five dollars
mhinson
17-Apr-2009
[1748]
ok
[unknown: 5]
17-Apr-2009
[1749]
Each of those words could be interpreted by an underlying function 
to create a sum of the total cost.
Henrik
17-Apr-2009
[1750x2]
A dialect is just a block of data that is processed in a certain 
way by REBOL. You can't evaluate it directly, but need some kind 
of parser to process it. You can create all sorts of crazy languages 
that way. Both the first and the second arguments to PARSE are dialects. 
The first one is the dialect block you provide to PARSE, the second 
one is the dialect used to process the first dialect. :-)
Without this understanding, PARSE is very difficult to grasp the 
concept of.