Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

help me parse this?

 [1/6] from: balayo::mindspring::com at: 5-Nov-2000 9:24


hey guys, in my long journey toward's REBOL competency, parse is still blocking my way. I can use the example of extracting the title from a web page, but how can I modify that to replace the text between the tags to something else? Even if there is a way with 'find, etc, I'd still rather do it with 'parse. anybody? Thanks! -- Spend less time composing sigs. -tom

 [2/6] from: petr:krenzelok:trz:cz at: 5-Nov-2000 17:43


----- Original Message ----- From: <[balayo--mindspring--com]> To: <[rebol-list--rebol--com]> Sent: Sunday, November 05, 2000 9:24 AM Subject: [REBOL] help me parse this?
> hey guys, > in my long journey toward's REBOL competency,
<<quoted lines omitted: 5>>
> rather do it with 'parse. anybody? > Thanks!
str: {<A HREF="http://www.rebol.com/">REBOL website</A>} parse str [ thru "<A HREF=" thru ">" start-replace: to </A> end-replace: (change/part start-replace "REBOL Technologies www site" offset? start-replace end-replace end-replace: index? start-replace) :end-replace to end ] by assigning the value - start: you tell parser to mark the input, so we marked start and end of the part of string we want to change ... then I used change/part, which will change the string. The function seems to be intelligent enough to insert rest of the string and return the series behind the point of insertion. I am not sure if required, but by putting :end back into parser we reassign position and tell parser to start right from that position

 [3/6] from: petr:krenzelok:trz:cz at: 5-Nov-2000 18:02


Answering once again, as I am not really used to Outlook Express and pressed wrong key combination so my email was accidentally sent to the list .... Ignore my previous message ... ----- Original Message ----- From: <[balayo--mindspring--com]> To: <[rebol-list--rebol--com]> Sent: Sunday, November 05, 2000 9:24 AM Subject: [REBOL] help me parse this?
> hey guys, > in my long journey toward's REBOL competency,
<<quoted lines omitted: 5>>
> rather do it with 'parse. anybody? > Thanks!
str: {<A HREF="http://www.rebol.com/">REBOL website</A>} parse str [ thru "<A HREF=" thru ">" start-replace: to </A> end-replace: (offset: offset? start-replace end-replace end-replace: change/part start-replace "REBOL Technologies www site" offset ) :end-replace copy rest to end (print rest) to end ] by assigning the value to start-replace: you tell parser to mark the input, so we marked start and end of the part of string we want to change ... then I used change/part, which will change the string. The function seems to be intelligent enough to insert rest of the string and return the series behind the point of insertion, so we reassigned 'end-replace ... Then we need to return the position back to parser by :end-replace .... To see we are in the right position I assigned rest of the string to 'rest word and printed it ... having 'replace as part of parser dialect was one of my suggestions to current parser implementation ... HopeThisHelps a little bit, Cheers, -pekr-

 [4/6] from: pierce:athenasecurity at: 5-Nov-2000 11:24


Is there anything in particular you want to parse? I've spent the last week or so working with parse, below are some of the things I've been working on: This is the Solaris output of 'ifconfig -a' on one of my machines, I needed to write a script to parse the information below looking for the broadcast address for each interface (more later on why). lo0: flags=849<UP,LOOPBACK,RUNNING,MULTICAST> mtu 8232 inet 127.0.0.1 netmask ff000000 hme0: flags=863<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST> mtu 1500 inet 192.168.10.139 netmask ffffff00 broadcast 192.168.10.255 ether 8:0:20:c6:ac:a9 qfe0: flags=863<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST> mtu 1500 inet 192.168.11.254 netmask ffffff00 broadcast 192.168.11.255 ether 8:0:20:c6:ac:a9 qfe1: flags=863<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST> mtu 1500 inet 10.0.0.254 netmask ffff0000 broadcast 10.0.255.255 ether 8:0:20:c6:ac:a9 I did this using REBOL/Command, with the follwoing: First I created a file with some pre-defined variable names, where each instance can be configured without changing the code. For this part the file had an entry called 'interface-arp-check' where the user enteres a string of interfaces to check. Let's say they put in "hme0 qfe0". The first thing I would need to do is get the information from the file: var: read %file.conf var: parse var none The first line reads the config file into var, then the second puts everything into a block at the spaces. For example the file might look like this: email "[Pierce--AthenaSecurity--Com]" interface-arp-check "hme0 qfe0" percent-disk-space "75" I use the 'none' option if I just want to break on the whitespace and put the resulting value into the variable at the front, in this case 'var'. The second line will create a block, stored in var, that looks like this: ["email" "[Pierce--AthenaSecurity--Com]" "interface-arp-check" "hme0 qfe0" percent-disk-space "75"] I find this helpful when I don't know how many times I'll need to loop through the block, I can now use 'while [length? var]' with a 'disarm try[]' around it (see below). The next thing I would do is set the variables, based upon the items in the config file: disarm try [ forall var [ set to-word first var second var ] ] This will go through and set the first item as the variable and the second item as the value, so the variables are now set as: email: "[Pierce--AthenaSecurity--Com]" interface-arp-check: "hme0 qfe0" percent-disk-space: "75" All of these are strings, it's important to remember that...it bit me hard when doing the disk space checking. If I left off the quotes in the config file the email would be of type email and the percent-disk-space would be integer. However, I don't what to keep reminding people when to use quotes and when not to, so I'll tell them everything needs quotes and deal with it later. Now that I know what interfaces to look at I need to parse the information. The purpose of this part of the script is to monitor the ARPs outside each interface. This program is installed on our firewalls to alert the administrator when someone bypasses the firewall. If they place a machine outside the firewall then it will be seen by the firewall, we can then compare the firewalls ARP table against a list of accepted machines for each interface. If anything is odd we send an email. ;I want to toss out any errors, since when I get to the end there will be an error signifying the end of the block. disarm try [ ;Read the 'interface-arp-check' variable and compare the results to the entries found in the '<interface>.ignore' file. interface-arp-check: parse interface-arp-check none {The line above takes the values in 'interface-arp-check', in the case above "hme0 qfe0" and splits them on the space. Now interface-arp-check is a block with two items ["hme0" "qfe0"].} while [length? interface-arp-check] [ interface: first interface-arp-check call/wait "/usr/sbin/ifconfig -a > if.aas" {The line above gets the ifconfig information (shown at the top of this email).} parse/case/all read %if.aas [thru interface copy dev-null thru broadcast copy broadcast to #"^/" copy ifconfig to end] {This line parses the information looking for the word "broadcast" in lowercase letters. The parsing is case sensitive because of the /case option. The /all option tells parse not to split on whitespace. {Notice that there isn't a variable before the parse command. If you use anything other than 'none' for the word rules parse will only return true or false which would overwrite the variable I would want to store. To get the variable, we use 'copy'. So a re-cap: parse/case/all This will parse but lower and uppercase will make a difference and it won't care about whitespace. read %if.aas This will use the text in if.aas to parse on, the 'call/wait' above creates this information. [thru interface copy dev-null thru "broadcast" copy broadcast to #"^/" copy ifconfig to end] This line is your 'word rules' or how the parsing will be done. In this case we are going to read 'thru' interface. Interface is the first item in interface-arp-check (see above), which in this case is "hme0". Once we have something that matches, we copy everything into a variable called 'dev-null' until we find the term "broadcast". Because we used the /case option broadcase MUST be lowercase, there are both an uppercase and lowercase broadcast in the ifconfig -a output...so this is required to get the right one. Once we find the right broadcast we copy it to a variable called broadcast. Everything is copied until the end of the line (#"^/" represents an end of line character) and finally we copy the rest of the text to a variable called ifconfig to the end. If we don't copy everything to the end we'll get a 'false' response back. If we do copy everything to the end we get a 'true' response back. We could also use the | to give multiple word rules. But that wasn't needed in this case. So now we have three variables set from this parse line: dev-null is the stuff I throw away, but it contains the beginning of the text upto and including the term broadcast. If we use thru it will put the term broadcast into dev-null. If we used 'to' then dev-null would stop right before the term broadcast. Since we don't care for the term broadcast, we only want what comes right after it we'll let dev-null claim it. broadcast has everything from right after the term broadcast (in lowercase letters) to the end of the line. This will be the broadcast address of the interface. ifconfig has everything after that. Since we may have to loop through the remaining interfaces we store this for future reference. Although each interface (in this script) starts over from scratch since we don't know what order thy put the interfaces in the config file.} call/wait "rm tmp.aas if.aas" tmp: call/wait/output join "/usr/sbin/ping " broadcast dev-null {This line takes the IP address we got from the parsing above and pings it. This will ping the entire network, each machine on the network will respond unless ping has been disabled. We do this to fill up the firewall's ARP table with all the hosts it can find. The results are put in the scratch variable, dev-null.} file: join interface ".ignore" if/else exists? to-file file: join interface ".ignore" [ call/wait join "/usr/sbin/arp -a | grep " join interface join " | grep -v ' S' | fgrep -v -f /rebol/" join interface ".ignore > tmp.aas" ] [ call/wait join "/usr/sbin/arp -a | grep " join interface " | grep -v ' S' > tmp.aas" ] {The lines above check to see if there is a local file called <interface>.ignore, such as hme0.ignore or qfe0.ignore. These files would contain the IP address of hosts that are expected to be seen on that interface, such as the POP router for the external interface.} if greater? length? read %tmp.aas 0 [ ifconfig: read %tmp.aas append msg join "The following machines were found outside interface " join interface join ": " join #"^/" ifconfig ] remove head interface-arp-check ] ] {Next we check to see if anything was found by testing the size of tmp.aas, if it's greater than 0 there were machines found. If machines were found we append them to msg (this is the body of an email message that gets sent if there are any critical alerts).} Questions? Wayne --- Wayne Pierce Director of Service Development Athena Security

 [5/6] from: balayo:mindspring at: 5-Nov-2000 15:26


wayne, whoa, nelly, that's way more complex than I need. I'm going to read through your message a couple more times to see if I can "get" what you're doing. What I need is only this: I have a local directory full of html with something like, <select> <option> blah blah </option> <option> blah blah </option> <option> blah blah </option> </select> in each page. I have a seperate file, with just <option> blah blah </option> <option> blah blah </option> <option> blah blah </option> I'd like to replace the options on all the pages with the options from the other file. That's alot less complicated than what you're doing! :-) -- Spend less time composing sigs. -tom

 [6/6] from: brett:codeconscious at: 6-Nov-2000 10:27


Parse maybe overkill for your example. You could try a string search/replace solution like this... the-page: {<html><body><form><select><option>blah blah</option></select></body></form></html>} the-new-bit: {<select><option>Red</option><option>Green</option><option>Blue</option></se lect>} start-select: find the-page "<select>" end-select: find/tail start-select "</select>" insert remove/part start-select end-select the-new-bit write %new-file.html the-page Brett.

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted