World: r3wp

Join the discussions in the REBOL3 world...

[I'm new] Ask any question, and a helpful person will try to answer.

older newer	first last
mhinson 13-Apr-2009 [1682x2]	Hi. I am struggiling to understand parsing & hoping for some pointers. I have read everything I can find but still cant seem to use parsing for basic extraction of information from a number of lines (or even a single line). This is what I am trying to do & would love sme help or links to documentation I may have missed please. lines: {junk wanted line1 contentA rubbish junk notNeeded line2 wanted line three content B rubbish } ;I want to extract ;wanted line1 contentA ;wanted line three content B ;That is to say everything between "wanted" up to "rubbish" but including "wanted" Thanks, /\/\
mhinson 13-Apr-2009 [1682x2]	Another (maybe foolish) question please. I am trying to use this script to help me understand the use of parsing to extract data from files. If I paste the script into my REBOL/View console it pastes in the script ok, but the examples do not work. This seems very common with a lot of the scripts in this library and is a problem I have been fighting with for several days. This is what I get. >> ini: parse-ini-file %/c/windows/win.ini Script Error: Out of range or past end Where: parse-ini-file ** Near: append last current-section parsed-line/1 append >> Am I pasting the script & examples to the wrong type of console or something? I feel it must be something I am doing as so few of the example scripts work for me. Thanks, /\/\
Graham 13-Apr-2009 [1684x2]	You need to provide some rules for what you want and what you consider rubbish.
Graham 13-Apr-2009 [1684x2]	there has to be a pattern that you recognize to determine what is what.
PeterWood 13-Apr-2009 [1686x2]	>> extract: copy [] == [] >> parse lines [any [["wanted" copy temp to "rubbish" (append extract temp)] \| skip ]] == true >> extract == [" line1 contentA " " line three content B "]
PeterWood 13-Apr-2009 [1686x2]	Have you read this - http://www.codeconscious.com/rebol/parse-tutorial.html
mhinson 14-Apr-2009 [1688]	Hi, thanks very much for the fast replies. I have read the parse-tutorial and it seems very good for understanding how to create rules that will match patterns, however I only found one brief section that described using "copy" to extract the data from the line, rather than just confirming that a match was found (or not). I tried to use the copy examples but evey time I modified them I ended up with errors as I don't really understand how they work. Peter, thanks for your example, it does almost what I want but the result in 'extract' does not contain the part of the string matched by "wanted". In my simple example I could just append the word "wanted", but in a real world case I would be using a patern match to find the "wanted" key word. I also want to develop the code further to search for a different set of matches if the first set is found, in your example I am unclear where the block is that is performed if the string is found. Thanks very much for your help. /\/\
Geomol 14-Apr-2009 [1689]	There's a bit about COPY in PARSE here: http://www.rebol.com/docs/core23/rebolcore-15.html#section-7.3
Pekr 14-Apr-2009 [1690]	mhinson - dunno if somebody already replied to you, but 'copy works quite fine. The trouble is, when you change parsed string in paren. You have to put markers there, and return to correct position ...
PeterWood 14-Apr-2009 [1691x3]	Mike: A small change will include wanted: >> extract: copy [] == [] >> parse lines [any [[copy temp ["wanted" to "rubbish"] (append extract temp)] \| skip ]] == true >> extract == ["wanted line1 contentA " "wanted line three content B "]
	The code that is executed in a parse rule is enclosed in parentheses (). So the parse rule that finds wanted.... is copy temp ["wanted" to "rubbish"] (append extract temp) The copy copies the part of the input that matches from the start of "wanted" to the start of "rubbish". Then the Rebol code (append extract temp) is executed. (I would normally write the Rebol as - insert tail extract temp - as it is faster than append in Rebol 2.)
	You can also insert Rebol code at the start of the parse rules to perform intialisaton parse lines [(extract: copy []) any [[copy temp ["wanted" to "rubbish"] (insert tail extract temp)] \| skip ]]
sqlab 14-Apr-2009 [1694x2]	another solution >> rule: [(wanted: copy [] ) any [to "wanted" copy line to "rubbish" (append wanted line)] to end]
sqlab 14-Apr-2009 [1694x2]	better rule: [(wanted: copy [] ) any [to "wanted" copy line to "rubbish" (append wanted line) skip ] to end]
mhinson 14-Apr-2009 [1696x2]	Thanks very much Pater & sqlab. those examples both do exactly what I was thinking. I now need to try & understand how this relates to the parse-tutorial & hopefully I will be able to start using the principles myself. Thanks again.
mhinson 14-Apr-2009 [1696x2]	Hi again. Sorry to be asking questions again so soon. I started using the syntax suggested with success, but in my input file I find the first key word is only valid if it is right at the start of the line. I have been searching through the documentation for the last hour & failed to find any references to "start of line" or similar. (like ^ in reg expressions). I wondered if there was any document to help people convert from regular expressions to Rebol parse expressions too please? Thanks, /\/\
Pekr 14-Apr-2009 [1698x3]	Regexp is quite different beast, and there are no single rules for translation to REBOL's parse. However - what do you mean by the beginning of the line? Is it the first char right after the end-of-line?
	btw - do you use parse/all? I prefer to use parse with the refinement, because using plain 'parse ignores whitespaces, and I don't like when the engine messes with things instead of me :-)
	Could you please post few lines of your input file?
sqlab 14-Apr-2009 [1701]	thry this rule: [(wanted: copy [] ) any [copy line ["wanted" to "rubbish" ] (append wanted line) \| thru newline] ]
mhinson 14-Apr-2009 [1702]	Hi, Pekr, I appreciate that the concept for parsing is different to the use of regular expressions, but there are some things that do map from one to the other & I wondered if any table of those things existed. As a noob sometimes the hardest questions to get answered are the ones where the answer is that there is no concept such as that sought by the noob. e.g. how do you grow strawberries in the sea? The first match must be at the begining of the line. If it was the first line in the set then it would not be after a new line, but other cases it would be. I will use parse/all from now, I like the extra control you describe. here a few lines of a test input, the script I am hoping to develop is to parse the config files from Cisco devices in order to extract the layer 2 & 3 information together with the interface names & descriptions. lines: {interface FastEthernet0 description The connection to the printer ! interface FastEthernet1 ! interface Vlan1 description User vlan (only 1 vlan allowed) no ip address ! interface Dialer0 description Outside ip address negotiated ! interface BVI1 description Inside ip address 192.168.0.1 255.255.255.0 ! ip sla 3 icmp-echo 217.0.0.1 source-interface Dialer0 ip route 0.0.0.0 0.0.0.0 Dialer0 interface ATM0.1 point-to-point no ip redirects no snmp trap link-status pvc 0/38 pppoe-client dial-pool-number 1 ! } ; sqlab, your change to use "thru newline" does what I wanted in this case which is good. ; my next step is to try & understand the "or" construct properly as the code below dosn't quite cut it. wanted: copy [] interface: ["interface" [to #"^/" \| to "point-to-point"]] parse lines [any [[copy temp interface (insert tail wanted temp)] \| thru newline ]] foreach line wanted [print line] ; thanks very much for your help, /\/\
Pekr 14-Apr-2009 [1703x2]	I am far from parse guru, but above rule (while works) looks weird :-) Why to produce interface rule that way? The line is ending with line terminator anyway, no? parse/all lines [ any [ [ "interface" copy int-name to newline (print int-name) newline \| skip ] ] ]
Pekr 14-Apr-2009 [1703x2]	... this is really simpler, no subrule to ruin your brain is needed ...
sqlab 14-Apr-2009 [1705]	I am not sure that I understand your intention. Do you want just interface ATM0.1, then you have to switch the order of your interface rule, as the condition to #"^/" (newline) is already true and done, and your cursor behind "point-to-point". As the first part is true, the second will never be done.
Pekr 14-Apr-2009 [1706x2]	should point-to-point be filtered out? Then the rule would be a bit different ..
Pekr 14-Apr-2009 [1706x2]	Slightly different version: wanted: copy [] spacer: charset " ^/" name-char: complement spacer interface: [ "interface " copy int-name some name-char (append wanted int-name) spacer ] parse/all lines [any [interface \| skip]] print mold wanted
mhinson 14-Apr-2009 [1708]	yes, point-to-point needs to be ignored from the result, an other similar cases in real life. once the interface string & details are found the script will need a sub search that is looking for "description" or "ip address" I was hoping that by extracting the rule used for each search i would make it easier to add new rules as the requirement becomes clear. I tried swapping the order in the rule to interface: ["interface" [to "point-to-point" \| to #"^/"]] but this just finds everything in the whole input. Perhaps I am to old to learn this. I worked programming in Pascal a good few years ago, but only for about a year. I failed to grasp SmallTalk more recently & I am really struggling with this. Thanks fpr all your helps. /\/\
Pekr 14-Apr-2009 [1709x2]	to [ aaaa \| bbbb] is long time parse enhancement request, which is not yet implemented, but is planned for 3.0. It would really make lifes of parse beginners much easier. Your parse rule simply means - try to find "point-to-point" or the end of the line. But - it looks for the point-to-point till it reaches end of the input string.
Pekr 14-Apr-2009 [1709x2]	mhinson - just don't give up ... if you are beginner with REBOL, you choosed to start with pretty advanced topic.
Henrik 14-Apr-2009 [1711]	yes, parsing is one of the most difficult topics of REBOL.
mhinson 14-Apr-2009 [1712]	Thanks for the encouragement.. I wont give up yet for a good while. Most of the programming I have done is out of a need to produce a specific result & that quite often needs to be fairly complex, however having a real need also makes the effort seem more worth while. I appreciate that parsing is quite hard, but it also seems to be one of the features that differentiates REBOL from other languages & is often refered to as being more efficent once the concepts are fully grasped. If this is not true, then perhaps I would be better off with php or perl etc. I have also already had some fun with the very straight forward graphical stuff which is fantastic. I am off out now, I hope to make a bit more code work tommrow as I am on holiday this week. :-) Thanks again
Pekr 14-Apr-2009 [1713x3]	you can also use rebol and call php or perl for some stuff :-) However - you rules could be made - you just need to scatter it into sections and find some rules for the parsed file structure.
	spacer: charset " ^/" name-char: complement spacer interface: [ "interface " copy int-text some name-char (print ["interface: " int-text]) (append wanted int-text) thru newline ] description: [ "description " copy desc-text to newline (print ["description: " desc-text]) newline ] ip-address: [ ["ip address " copy add-text to newline (print ["ip address: " add-text]) newline \| "no ip address" newline (print ["ip address:" "no adress"]) ] ] int-section: [interface any [description \| ip-address \| "!" break \| skip]] parse/all lines [any [int-section \| skip]]
	... ignore (append wanted inte-text) above - I did not use it in the code, I just used print to check how sections work ...
mhinson 15-Apr-2009 [1716x2]	Hi, I have broken this down to try & understand it, but my understanding is still very vague, paticularly in respect of the order of things like the copy statement & also the number of brackets needed is confusing me. lines: {junk Interface fa0 ! interface fa1} spacer: charset " ^/" name-char: complement spacer parse/all lines [ any [ [ [ "interface " copy int-text some name-char (print ["interface: " int-text]) thru newline ] any ["!" break \| skip] ] \| skip ] ] I need to find some way to make it only get the "interface " if it starts at the first position on the line. I thought I needed to remove the word "any" to do this, but that did not work.
mhinson 15-Apr-2009 [1716x2]	Perhaps I should also say that the structure of these Cisco config files tends to have the section start at the first position & sub sections are indented. The use of "!" is a bit sporadic & varies in different contexts. I have been trying to hunt down a bunch of test examples without success, test data that can be shared freely is hard to get hold of. Thanks for your help.
PeterWood 15-Apr-2009 [1718x2]	It is quite easy to find something that starts in the first postion of a line by matching against newline+the something. I'm too lazy to remember the newline character so I tend to write something like this: >> interface: join newline "interface " == "^/interface " >> spacer: charset to string! newline == make bitset! #{ 0004000000000000000000000000000000000000000000000000000000000000 } >> name-char: complement space r == make bitset! #{ FFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF } >> parse/all lines [any [interface copy int-text some name-char (print ["interface: " int-text]) \| skip]] interface: fa1 == true
PeterWood 15-Apr-2009 [1718x2]	In case this isn't clear. I'll try to explain the parse rule. First any effectively says to match any of the rules in the following block until the end of the string is reached. The first rule in the block is [interface copy int-text some name-char (print ["interface: " int-text)] says match with the word interface (newline + "interface ") then if there is a match, copy some name-char which says copy one or more characters which match the criteria of a name-char then if there are some name-char characters evaluate the rebol code in parentheses. If there wasn't a match with that first rule, then the second rule that follows the \| will be applied. skip will pass over one character and always provides a match.
mhinson 16-Apr-2009 [1720]	Thanks for your help. I am beginning to wonder if what I am trying to do is not possiable in Rebol. I am impressed at the number of responses, but I still cant find a way to use all the bits together to create a structure that is going to find the bits of data I am after. One of the problems seems to be that catching the the data starting with new line & ending at newline uses up the "newline" for the following line so then that line gets missed. Is there really no symbolic way in Rebol to identify the begining of the line without using the newline char from the end of the previous line?
PeterWood 16-Apr-2009 [1721x2]	Mike - The method that I showed you does not use up the "newline" at the end of the line. If you check again, the parse rule simply says copy in-text some name-char. This "stops" before the newline at the end of the line. In fact guessing at your requirements a little and assuming the name-char is available. Some thing along these lines should be close to what you want: keywords: ["^/interface " \| "^/another keyword " \| "^/yet another kerword"] parse/all lines [any [ copy int-keyword [keywords copy int-text some name-char ( print int-keyword ": " int-text] ) \| skip ] ] {I obviously haven't tested this code.)
PeterWood 16-Apr-2009 [1721x2]	Sorry a typo, this line copy int-keyword [keywords copy int-text some name-char ( should be copy int-keyword keywords copy int-text some name-char (
sqlab 16-Apr-2009 [1723]	I see just two ways to get what you desire either you define different rules for interface at the beginning and interface after newline or you do it in a two pass way: first you separate the lines (either by parse or by read/lines) and then you process every line by itself. I would go the easy way with two passes.
mhinson 16-Apr-2009 [1724]	The mist maybe slowly clearing (sorry to be so slow to catch on). The 2 stage process may be the answer, perhaps I can add a key char at the first line position when I read the file, then use this as the line start reference, but continue to use the end of line as normal. I think I understand Peter's example & have tweaked it a bit to make it work for me. lines: {~junk Interface fa0 ~! ~interface fa1 ~interface fa2 point-to-point ~! ~interface Fa3 ~ description test three ~ ip address 1.1.3.3 255.255.255.0 ~! ~interface Fa4 ~ ip address 1.1.4.4 255.255.255.0 ~! ~interface Fa3 ~ description test four etc ~} spacer: charset "^/" name-char: complement spacer stopwords: "point-to-point" keywords: ["~interface " \| "~ description " \| "~ ip address"] parse/all lines [any [ copy int-keyword keywords copy int-text [to stopwords \| some name-char] ( print [int-keyword ": " int-text] ) \| skip ] ]
sqlab 16-Apr-2009 [1725x2]	This got very long, but i think it should work ifrule: [ ifa: "interface" some [ ife: "point-to-point" break \| ife: newline break \| skip ] (append/only append wanted copy/part ifa ife interf: copy [] ) ] drule: [ "description" copy descr to newline (append interf descr) ] iprule: ["ip address" copy ip to newline (append interf ip) ] norule: ["no" to newline] pvcrule: ["pvc" to newline] pprule: ["pppoe" to newline] !rule: ["!" to newline] rule: [(wanted: copy [] ) some [ifrule \| some [ s: " interface" \| #" " \| drule \| iprule \| norule \| pvcrule \| pprule \| !rule \| break ] thru newline ] ] parse/all lines rule
sqlab 16-Apr-2009 [1725x2]	There is a flaw use this rule: [(wanted: copy [] ) some [ifrule \| some [ s: " interface" (interf: copy []) \| #" " \| drule \| iprule \| norule \| pvcrule \| pprule \| !rule \| break ] thru newline ] ] prevents collecting the not wanted interface attributes.
Pekr 16-Apr-2009 [1727]	uh, was on slow connection, so my reply got lost. Mhinson - there is no symbolic way to represent beginning of the line. I don't know any in any system. The only thing I know is end-of-line (newline). I know what you probably mean - you want to identify beginning of your lines, but even for first line (so not a rule, matching newline first, then next char = beginning of line). But - there is still various ways of how to do it. First - I think that your config files are chaos. Do they have any rules for some sections at all? :-) I also like what sqlab mentioned - sometimes it is easier to break stuff into 2 pass strategy. Read/lines is your friend here. You can try it on text files and you'll see, that the result is going to be a block of lines. I usually do: data: read/lines %my-data-file.txt ;--- remove empty lines from block of lines ... remove-each line data [empty? trim copy line] foreach line data [do something with data ....] Simply put - if rules for parser are out of my scope of capabilities (which happens easily with me :-), I try to find my other way around ...
mhinson 16-Apr-2009 [1728]	sqlab, I like this as it also gives the extracted data some structure, which will be essential when using it. Pekr the type of symbolic start & end of line is described as regular expression anchoring http://www.regular-expressions.info/anchors.html matching a line using anchoring in the implimations I have seen does not preclude the following line from being matched even in this example. ^abcd$ will match both lines. abcd abcd In some contexts this is concidered an extention to regular expressions, but it is very useful.
Izkata 16-Apr-2009 [1729x2]	Also, this is a bit slower, but avoids using complicated parse rules: >> lines: {junk Interface fa0 { ! { interface fa1} == "junk Interface fa0^/!^/interface fa1" >> SplitLines: parse/all lines {^/} ; {^/} is a string containing only the newline character, so this is a list of the separate lines == ["junk Interface fa0" "!" "interface fa1"] >> foreach line SplitLines [ [ if all [ [ not none? find line {interface} ;Find returns none! (equivalent of NULL or NIL) on "!" [ head? find line {interface} ;find goes to the first instance of what is being searched for, and head? checks if it's currently at the beginning of the line [ ][print line] [ ] interface fa1 ;The only match
Izkata 16-Apr-2009 [1729x2]	(hah, bit late to the party... I see it's gone beyond the simple question now)
mhinson 16-Apr-2009 [1731]	there is a lot to be said for straight forward finds & excludes, paticularly if it is done repeatedly on the previous output. I am trying to understand how to use Rebol in a way that will be flexable to read maybe a few hundred Cisco config files & command outputs with perhaps 20 or 30 different types of rules for finding stuff then putting it into a structure that will be easy to search for patterns & extract summeries of information. All the information you might have in a network diagram, but in a text or database format.
older newer	first last