World: r3wp

Join the discussions in the REBOL3 world...

[Parse] Discussion of PARSE dialect

older newer	first last
Chris 3-Jun-2008 [2578x2]	I've been toying with this to obtain a very parsable "dialect" -- my goal being to scrape live game updates from a certain sports web site (for personal use, natch). It's reliant on 'parse-xml though, so ymmv.... do http://www.ross-gill.com/r/scrape.r probe load-xml some-xml
Chris 3-Jun-2008 [2578x2]	Result is a little like: from -- <tag attr="attribute">Content</tag> to -- <tag> /attr attribute "Content"
Anton 4-Jun-2008 [2580]	Josh, using the REMOVE-EACH very often is what makes your parse slow. A remove operation in the middle of a large string is slow, and you are doing many removes. That's why the others suggested using copy.
Josh 6-Jun-2008 [2581]	Thanks for the input. I will have to play around with those later as I am trying to get this finished up and then I can go back and clean up the code. The data is minimal enough for the script to finish in under a second anyway. Parse is pretty sweet. Makes this much neater than the alternative
Anton 7-Jun-2008 [2582]	No worries.
amacleod 30-Jun-2008 [2583]	I'm trying to copy some text from the position found iwhile parsing a document. I'm using something like: rule: [some digit copy text to newline] (--where "digit has ben defined as all digits 0 to 9) This copies eveerything after the digit. How would I copy the digit itself as well?
Brock 30-Jun-2008 [2584x2]	would it not simply be.... to some digit instead of what you have above? I'll start playing around and see if I can be of any help (if you haven't already figured it out)
Brock 30-Jun-2008 [2584x2]	Not as easy as it seemed to be. Will take more time than I have right now.
amacleod 30-Jun-2008 [2586]	Is there a difference between using "to" and "thru"
[unknown: 5] 30-Jun-2008 [2587]	yes
Graham 30-Jun-2008 [2588]	is this block parsing?
[unknown: 5] 30-Jun-2008 [2589]	to goes to the point and thru includes the point
amacleod 30-Jun-2008 [2590x2]	No
amacleod 30-Jun-2008 [2590x2]	So to newline does not include the newline?
[unknown: 5] 30-Jun-2008 [2592]	no it wouldn't
Graham 30-Jun-2008 [2593x2]	rule: [ digit copy text to newline skip ] parse stuff [ some rule ]
Graham 30-Jun-2008 [2593x2]	digits: [ some digit ] rule: [ digits ... ]
amacleod 30-Jun-2008 [2595]	Graham, the digit String would be included in the copied text with this?
Graham 30-Jun-2008 [2596x3]	nope
	because it matches digit and the cursor moves on
	past the digit
amacleod 30-Jun-2008 [2599]	Right. Anyway to capture the digit?
[unknown: 5] 30-Jun-2008 [2600]	you can always do something like set n number!
Graham 30-Jun-2008 [2601x2]	rule: [ copy d thru digits .... ]
Graham 30-Jun-2008 [2601x2]	He's using string parsing .. not block parsing
[unknown: 5] 30-Jun-2008 [2603]	yeah can't use set then.
amacleod 30-Jun-2008 [2604]	I'll try that Graham. Thanks
Graham 30-Jun-2008 [2605x3]	or
	non-digits: complement digit parse [ copy digit-text to non-digits copy text to newline skip ]
	and correct syntax helps :)
[unknown: 5] 30-Jun-2008 [2608x2]	>> str: "193920347REBOL ROCKS!^/" == "193920347REBOL ROCKS!^/" >> parse str compose [some (charset "0123456789") text: copy text thru newline] == true >> text == "REBOL ROCKS!^/"
[unknown: 5] 30-Jun-2008 [2608x2]	Something like that?
Brock 30-Jun-2008 [2610]	he was looking for the number and the string though.
amacleod 30-Jun-2008 [2611x2]	No I have a text document with section numbers in front: 2. Hello 2.1 Hello Again 2.1.1 Hello already 3. Goodbye I want the section number inclued in hte copy
amacleod 30-Jun-2008 [2611x2]	It need not be included in hte same copy just as long as I can record it.
[unknown: 5] 30-Jun-2008 [2613]	So you just want each line then really?
amacleod 30-Jun-2008 [2614x2]	Well it gets a little more complicated. some parts of the docment will be multilined.
amacleod 30-Jun-2008 [2614x2]	I thought it would be a simple thing that I was missing. I may need to re-think the formatting of the document.
[unknown: 5] 30-Jun-2008 [2616x2]	So even if something is multiline you would still want each line of the multiline correct?
[unknown: 5] 30-Jun-2008 [2616x2]	Or do you mean a multiline might looks something like this: 2.1 Hello Goodbye Where the second line doesn't have the preceeding number?
amacleod 30-Jun-2008 [2618]	Yes and formating may need to be retained
[unknown: 5] 30-Jun-2008 [2619]	Ahhh yes that gets a bit more complicated.
amacleod 30-Jun-2008 [2620]	Paul, while you are there... I was considering using Tretbase for this project
[unknown: 5] 30-Jun-2008 [2621]	Excellent choise ;-)
amacleod 30-Jun-2008 [2622]	Let me briefly explain where I'm going to see if you think its workable or perhaps a there is a better solution
[unknown: 5] 30-Jun-2008 [2623]	k
amacleod 30-Jun-2008 [2624x4]	I trying to put a set of Fire department related materials online. THey are now in pdf
	I'm converting them to text and reformatting them to parse
	I want to hold each section in a seperate database record
	So I can index for keywords and search and read only thse sections I need.
older newer	first last