r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Chris
3-Jun-2008
[2578x2]
I've been toying with this to obtain a very parsable "dialect" -- 
my goal being to scrape live game updates from a certain sports web 
site (for personal use, natch).  It's reliant on 'parse-xml though, 
so ymmv....

do http://www.ross-gill.com/r/scrape.r
probe load-xml some-xml
Result is a little like:

	from -- <tag attr="attribute">Content</tag>
	to -- <tag> /attr attribute "Content"
Anton
4-Jun-2008
[2580]
Josh, using the REMOVE-EACH very often is what makes your parse slow. 
A remove operation in the middle of a large string is slow, and you 
are doing many removes. That's why the others suggested using copy.
Josh
6-Jun-2008
[2581]
Thanks for the input.  I will have to play around with those later 
as I am trying to get this finished up and then I can go back and 
clean up the code. The data is minimal enough for the script to finish 
in under a second anyway.   Parse is pretty sweet.   Makes this much 
neater than the alternative
Anton
7-Jun-2008
[2582]
No worries.
amacleod
30-Jun-2008
[2583]
I'm trying to copy some text from the position found iwhile parsing 
a document.
I'm using something like: 


rule: [some digit copy text to newline]    (--where "digit has ben 
defined as all digits 0 to 9)

 This copies eveerything after the digit. How would I copy the digit 
 itself as well?
Brock
30-Jun-2008
[2584x2]
would it not simply be....    to some digit    instead of what you 
have above?  I'll start playing around and see if I can be of any 
help (if you haven't already figured it out)
Not as easy as it seemed to be.  Will take more time than I have 
right now.
amacleod
30-Jun-2008
[2586]
Is there a difference between using "to" and "thru"
[unknown: 5]
30-Jun-2008
[2587]
yes
Graham
30-Jun-2008
[2588]
is this block parsing?
[unknown: 5]
30-Jun-2008
[2589]
to goes to the point and thru includes the point
amacleod
30-Jun-2008
[2590x2]
No
So to newline does not include the newline?
[unknown: 5]
30-Jun-2008
[2592]
no it wouldn't
Graham
30-Jun-2008
[2593x2]
rule: [ digit copy text to newline skip ]
parse stuff [ some rule ]
digits: [ some digit ]
rule: [ digits ... ]
amacleod
30-Jun-2008
[2595]
Graham, the digit String would be included in the copied text with 
this?
Graham
30-Jun-2008
[2596x3]
nope
because it matches digit and the cursor moves on
past the digit
amacleod
30-Jun-2008
[2599]
Right. Anyway to capture the digit?
[unknown: 5]
30-Jun-2008
[2600]
you can always do something like set n number!
Graham
30-Jun-2008
[2601x2]
rule: [ copy d thru digits  .... ]
He's using string parsing .. not block parsing
[unknown: 5]
30-Jun-2008
[2603]
yeah can't use set then.
amacleod
30-Jun-2008
[2604]
I'll try that Graham. Thanks
Graham
30-Jun-2008
[2605x3]
or
non-digits: complement digit
parse [ copy digit-text to non-digits copy text to newline skip ]
and correct syntax helps :)
[unknown: 5]
30-Jun-2008
[2608x2]
>> str: "193920347REBOL ROCKS!^/"
== "193920347REBOL ROCKS!^/"

>> parse str  compose [some (charset "0123456789") text: copy text 
thru newline]
== true
>> text
== "REBOL ROCKS!^/"
Something like that?
Brock
30-Jun-2008
[2610]
he was looking for the number and the string though.
amacleod
30-Jun-2008
[2611x2]
No

I have a text document with section numbers in front:

2. Hello
2.1 Hello Again
2.1.1 Hello already
3. Goodbye

I want the section number inclued in hte copy
It need not be included in hte same copy just as long as I can record 
it.
[unknown: 5]
30-Jun-2008
[2613]
So you just want each line then really?
amacleod
30-Jun-2008
[2614x2]
Well it gets a little more complicated.
some parts of the docment will be multilined.
I thought it would be a simple thing that I was missing. I may need 
to re-think the formatting of the document.
[unknown: 5]
30-Jun-2008
[2616x2]
So even if something is multiline you would still want each line 
of the multiline correct?
Or do you mean a multiline might looks something like this:

2.1 Hello
       Goodbye

Where the second line doesn't have the preceeding number?
amacleod
30-Jun-2008
[2618]
Yes and formating may need to be retained
[unknown: 5]
30-Jun-2008
[2619]
Ahhh yes that gets a bit more complicated.
amacleod
30-Jun-2008
[2620]
Paul, while you are there...
I was considering using Tretbase for this project
[unknown: 5]
30-Jun-2008
[2621]
Excellent choise ;-)
amacleod
30-Jun-2008
[2622]
Let me briefly explain where I'm going to see if you think its workable 
or perhaps a there is a better solution
[unknown: 5]
30-Jun-2008
[2623]
k
amacleod
30-Jun-2008
[2624x4]
I trying to put a set of Fire department related materials online.
THey are now in pdf
I'm converting them to text and reformatting them to parse
I want to hold each section in a seperate database record
So I can index for keywords and search and read only thse sections 
I need.