• Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

AltME groups: search

Help · search scripts · search articles · search mailing list

results summary

worldhits
r4wp5907
r3wp58701
total:64608

results window for this page: [start: 30001 end: 30100]

world-name: r3wp

Group: Parse ... Discussion of PARSE dialect [web-public]
btiffin:
16-Mar-2008
context [ ]  is just a shortcut for  make object! [ ]   and it's 
great.  The more we hide in objects the easier it will be share, 
or at the least, easier to use code from a variety of developer sources. 
 Programming in the Many is important  in our context as there are 
relativily few of us in the "many" - so far.  So when even our small 
stuff is shareable we all  win.
BrianH:
17-Mar-2008
Does a bind/copy on its code block every time it is used.
Oldes:
17-Mar-2008
I should probably not to use the code evaluation so much directly 
in the parse rule block and rather call a function if I need a lot 
of temp variables to process the action.
Henrik:
28-Apr-2008
(note this block can only be made without a space at the end in rebol 
2.7)
Henrik:
11-May-2008
if I have a rule-block that does not exist in the same context as 
the main parse block, is there a simple way to rebind it without 
composing it into the main parse block? my current solution is to 
bind it to a temp block and use the temp block as a rule in the main 
parse block, which is less than optimal, I think.
Chris:
11-May-2008
Assuming you want to assign values to function locals from the external 
parse rules, you can a) bind as you are doing, b) create a larger 
context for the function encompassing your rules or c) compile the 
parse rule, either on creation of the function or for each instance.

a)
rule: [set tag tag!]
test: func [data /local tag][bind rule 'data parse data rule tag]

b)
test: use [tag][
    rule: [set tag tag!]
    func [data][parse data rule tag]
]

c)
rule: [set tag tag!]
test: func [data /local tag] compose/only [parse data (rule) tag]


Also, note that when you bind, it alters the original block -- no 
need to reassign to a new word.
Chris:
11-May-2008
When it comes to complex rules, I opt for b).  At that, I'd go for 
context [] where there are a lot of associated words...
Henrik:
12-May-2008
the function is recursive, so that may put a twist on b). I forgot 
that detail with BIND on a) so thanks for that. c) seems to work 
best.
amacleod:
15-May-2008
I'm trying to parse a tex document that I've formated into lines 
of text with blank lines between simialr to make doc format
amacleod:
15-May-2008
Most lines begin with a section number (2.), or a sub-section (2.3) 
or a sub-sub-section (2.3.5).
BrianH:
16-May-2008
If the section numbers always end with a period, you can do this:
    some [some digits "."]
If the section numbers don't end with period you can do this:
    some digits any ["." some digits]
BrianH:
16-May-2008
Look up recursive descent parsing, and take a not of the difference 
between left recursion and right recursion.
Chris:
16-May-2008
Don't want to add too much, but with parse you can really build up 
a vocubulary based on the patterns you know:


 section: [integer! ["." | 1 4 ["." integer!]]] ; -- or whatever rule 
 covers all permutations
	chars-sp: charset " " space: [some chars-sp]

	parse/all [copy sn section space [to newline | to end]]


Vocabularies are easy to wrap in their own context too.  Note also 
that [integer!] is a shorthand for [some digit] -- very useful : 
)
amacleod:
16-May-2008
Oldes, thanks for your suggestion. It works when I do a simple one 
line rule as you suggested but when I try to use multiple rules it 
fails.

Example of what I'm trying to do:
Example of the text document:
amacleod:
16-May-2008
3. CONSTRUCTION OF PORTABLE ALUMINUM LADDERS


3.1 Aluminum ladders are divided into two basic types of construction, 
viz:, solid beam and truss.


3.1.1  Solid Beam Aluminum Construction- This type of ladder has 
a solid side rail construction with aluminum rungs connecting with 
the side rails at fourteen inch intervals. The connection is generally 
either by a welded joint between rung and side rails, or by an expansion 
plug pinching the rung tightly to the side rails and internal backup 
plates. (Figure 2 A)


3.1.2  Aluminum Truss Construction- In the aluminum truss design, 
the top and bottom rails are connected to rung assemblies or rung 
blocks by rivets. The rungs are either welded or expansion plugged 
to the rung plate assemblies, which are supported by the top and 
bottom rails. (Figure 2B)
	

3.2 The base of the portable aluminum ladder is provided with either 
steel spikes or swiveling rubber safety shoes and aluminum spikes. 
For ladders equipped with the swiveling device, the rubber pads should 
be utilized when the ladder is to be raised and used on hard surfaces. 
(Figure 2A, 2B)
 3. CONSTRUCTION OF PORTABLE ALUMINUM LADDERS
BrianH:
16-May-2008
Any reason that the headings with one number have a trailing period 
and the rest don't?
amacleod:
16-May-2008
BrianH, sorry BRian the text above is just from a random and simpler 
section of the document.

if I copied the from the begining the first line would not have a 
number at all.
BrianH:
16-May-2008
But I made a mistake.
amacleod:
16-May-2008
This will give me a hit on any section or sub or sub sub?


I may want to do something different depending on each. does this 
allow me to ?
BrianH:
16-May-2008
If you are making your decisions on a per-line basis, you might consider 
doing a read/lines and parsing each line individually, maintaining 
your own state to tell you where you are in the greater document. 
It's the only way to parse documents greater than memory in size.
Anton:
17-May-2008
BrianH, eh? read/lines would still try to read the whole document 
wouldn't it ?

Or are you just suggesting that as a way which is then easily modified 
to allow larger than memory documents?
Chris:
17-May-2008
That would suck -- I use it.  Seems like a common enough scenario....
BrianH:
19-May-2008
I mean you can do open/lines/direct and stream - then you would only 
need the memory for one line and a state machine.
Josh:
3-Jun-2008
I'm finally digging into parse now, but I have a question about HTML. 
  Big idea:  pulling the data out of an HTML table (made in Word--ugh!). 
 Where I am stuck:  Is there a way to create a rule for opening tags 
such as <tr> that include a lot of formatting:  i.e. <tr style="mso........> 
?   I want to pull the info inbetween the opening and closing tags.
Josh:
3-Jun-2008
I came up with a rule:  [some [thru "<td" thru ">" y: to "</td>" 
(a: remove-each tag load/markup y [tag? tag])]]  but it seems to 
not be as efficient as it could be.
Geomol:
3-Jun-2008
Josh, if you do a load/markup on the whole string, you get a block 
with tags and strings. You can then pick the string from the block, 
maybe doing TRIM on them to sort out newlines and spaces. Like:

blk: load/markup your-data
foreach f blk [if all [string? f "" <> trim f] [print f]]
Chris:
3-Jun-2008
I've been toying with this to obtain a very parsable "dialect" -- 
my goal being to scrape live game updates from a certain sports web 
site (for personal use, natch).  It's reliant on 'parse-xml though, 
so ymmv....

do http://www.ross-gill.com/r/scrape.r
probe load-xml some-xml
Chris:
3-Jun-2008
Result is a little like:

	from -- <tag attr="attribute">Content</tag>
	to -- <tag> /attr attribute "Content"
Anton:
4-Jun-2008
Josh, using the REMOVE-EACH very often is what makes your parse slow. 
A remove operation in the middle of a large string is slow, and you 
are doing many removes. That's why the others suggested using copy.
Josh:
6-Jun-2008
Thanks for the input.  I will have to play around with those later 
as I am trying to get this finished up and then I can go back and 
clean up the code. The data is minimal enough for the script to finish 
in under a second anyway.   Parse is pretty sweet.   Makes this much 
neater than the alternative
amacleod:
30-Jun-2008
I'm trying to copy some text from the position found iwhile parsing 
a document.
I'm using something like: 


rule: [some digit copy text to newline]    (--where "digit has ben 
defined as all digits 0 to 9)

 This copies eveerything after the digit. How would I copy the digit 
 itself as well?
amacleod:
30-Jun-2008
Is there a difference between using "to" and "thru"
amacleod:
30-Jun-2008
No

I have a text document with section numbers in front:

2. Hello
2.1 Hello Again
2.1.1 Hello already
3. Goodbye

I want the section number inclued in hte copy
amacleod:
30-Jun-2008
Well it gets a little more complicated.
some parts of the docment will be multilined.
amacleod:
30-Jun-2008
I thought it would be a simple thing that I was missing. I may need 
to re-think the formatting of the document.
[unknown: 5]:
30-Jun-2008
Or do you mean a multiline might looks something like this:

2.1 Hello
       Goodbye

Where the second line doesn't have the preceeding number?
[unknown: 5]:
30-Jun-2008
Ahhh yes that gets a bit more complicated.
amacleod:
30-Jun-2008
Let me briefly explain where I'm going to see if you think its workable 
or perhaps a there is a better solution
amacleod:
30-Jun-2008
I trying to put a set of Fire department related materials online.
THey are now in pdf
amacleod:
30-Jun-2008
I want to hold each section in a seperate database record
[unknown: 5]:
30-Jun-2008
Well, TRETBASE 1.0 is the only finished product right now.  So the 
only available TRETBASE app is 1.0 which is really not a multi-user 
solution.
amacleod:
30-Jun-2008
I'm using mysql for the online component but I need a local storage 
method too for offline use
amacleod:
30-Jun-2008
What I would need is a simple method to sync them
amacleod:
18-Jul-2008
Is there a difference between a "space" and a "tab"? Can you parse 
for tab and not sapce?
Graham:
18-Jul-2008
I would think you would have to parse/all .. and a space is #" " 
and a tab is #"^-"
btiffin:
21-Aug-2008
A long time ago, I offered to try a lecture.  Don't feel worthy. 
 So I thought I'd throw out a few (mis)understandings and have them 
corrected to build up a level of comfort that I wouldn't be leading 
a group of high potential rebols down a garden path.


So; one of the critical mistakes in PARSE can be remembered as  "so 
many", or a butchery of some [ any [ , so many.

some asks for a truth among alternatives and any say's "yep, got 
zero of the thing I was looking for", but doesn't consume anything. 
 SOME says, great and then asks for a truth.  ANY say "yep, got zero 
of the thing I was looking for", and still doesn't move, ready to 
answer yes to every question SOME can ask.  An infinite PARSE loop.


Aside: to protect against infinite loops always start a fresh PARSE 
block with [()   the "immediate block" of the paren! will allow for 
a keyboard escape, and not the more drastic Ctrl-C.


So, I'd like to ask the audience; what other PARSE command sequences 
can cause infinite loops?


end?  and is it only  "end", "to end" but "thru end" will alleviate 
that one?  end end end end being true?

>> parse "" [some [() end end end]]
(escape)
>> parse "" [some [() thru end end end]]
== false
>> parse "" [some [() to end end end]]
(escape)
>> 


Ok, but thru end is false.  Is there an idiom to avoid looping on 
end, but still being true on the first hit?

Other trip ups?
Henrik:
28-Sep-2008
parse [a] ['a] ;== true

parse ['a] reduce [to-lit-word 'a] ; == false (why?)
Henrik:
28-Sep-2008
forget it. I was confused for a second, but is there a way to parse 
that 'a correctly? The same goes for get-word! and set-word!.
Henrik:
28-Sep-2008
I should clarify: I would like to parse a specific get-word!, lit-word! 
or set-word! as opposed to parsing on the type and then checking 
the value in some kind of action afterwards:


parse ['a 'b 'c] ['a 'b 'c] ;== true (I know this is the wrong parser 
block, but it's something to that effect I would like to see)
Anton:
28-Sep-2008
If I remember correctly, this was a problem of parse (and may still 
be)...
Anton:
28-Sep-2008
You may have to use a workaround.
Geomol:
28-Sep-2008
If you can go with a reduced block, this can work:

parse reduce ['a 'b 'c] ['a 'b 'c]
Henrik:
28-Sep-2008
what if there are set-words in it? I wanted to parse the content 
of an object, which can be a mixture of word types.
BrianH:
28-Sep-2008
In general that restriction of parse is part of an overall pattern 
in REBOL of encouraging you to use lit-words as lit-words rather 
than some other kind of datatype. Lit-words in REBOL are generally 
used to express literal expressions of words, rather than being used 
as a distinct datatype. In general you convert them to words before 
use.
BrianH:
28-Sep-2008
It's usually a bad idea to use lit-words as keywords - they make 
better values. If you are comparing to a particular lit-word value, 
that is using it as a keyword. If any lit-word value would do and 
their meaning is semantic rather than syntactic, that works. In general, 
PARSE is better for determining syntactic stuff - use the DO dialect 
code in the parens for semantic stuff.
BrianH:
28-Sep-2008
Not that I don't want a LIT or LITERAL directive in PARSE that would 
turn off the PARSE-dialect treatment of the next value in the spec.
Anton:
10-Oct-2008
term: [word! | into term]
parse [a b [c]] [some term]  ;== true
parse [a b [c d]] [some term]  ;== false
Anton:
10-Oct-2008
I'm a bit confused by that.  I need to parse recursively.
Anton:
10-Oct-2008
terms: [some [word! | into terms]]
parse [a b [c d]] terms  ;== true
Terry:
12-Oct-2008
blk: [aa "test" bb "two"  cc  "#block"]
rules: [some [cc set cc string! ]]
parse blk rules

no go? 

I have a more complicated rule set that chokes on the "#block" string.. 
does it think it's an issue! ?
sqlab:
30-Oct-2008
Yes, this is an old bug.
It does not work, if " is next to your delimiter.
Insert a blank, and it works again.
Graham:
3-Nov-2008
This is a result of using parse-xml and some cleanup

[document
	[soapenv:Envelope
		[soapenv:Body
			[ns1:getSpellingSuggestionsResponse
				[getSpellingSuggestionsReturn
					[getSpellingSuggestionsReturn "Penicillin G"]
					[getSpellingSuggestionsReturn "Penicillin V"]
					[getSpellingSuggestionsReturn "Penicillamine"]
					[getSpellingSuggestionsReturn "Polycillin"]
				]
			]
		]
	]
]
Graham:
3-Nov-2008
drugs: [set drugblock into [ 'getSpellingSuggestionsReturn set drugname 
string! ( print drugname) ]]

parse a [ 'document set envelope into [ 'soapEnv:envelope set body 
into [ 'soapEnv:body set response into [ 'ns1:GetSpellingsuggestionsresponse 
set returns into ['getspellingsuggestionsreturn some drugs to end 
]]]]]

works but is very long winded
Gregg:
4-Nov-2008
It's not so bad Graham. And whether you can shorten things depends 
on how exact you need to be.

rule: [
	'getspellingsuggestionsreturn some drugs
	| url! into rule
]
parse a ['document into rule]
PeterWood:
4-Nov-2008
This is a bit shorter but recursive:

pr: [any
          [
             [set b block! (parse b pr)] 
             |
	 ['getSpellingSuggestionsReturn set s string! (
    
                insert drug-names s

              ) 
           
             | 
            skip
     
      ]
     
]
]
Graham:
4-Nov-2008
the output I presented looks so close to being a rebol object .. 
and then I can use paths to access the data
PeterWood:
4-Nov-2008
Sorry about the formatting ... can't cut and paste in AltME on a 
Mac without reformatting.
PeterWood:
4-Nov-2008
If it's not fast enough you can speed it up by adding a rule to consume 
the unwanted parts.
PeterWood:
4-Nov-2008
gxs is a string of your xml listed above.
BrianH:
5-Nov-2008
So far we have been accepting proposals in these categories:
- Recognition: LIT, NOT, OF, TO and THRU extensions
- Modification: CHANGE, INSERT, REMOVE

- Structural and control flow: FAIL (may not be the final name), 
USE, CHECK (still debate here), REVERSE


There is still some debate even within these proposals (name of FAIL 
for example) and some of them might not make it. Some of the old 
PARSE REPs have been definitively rejected or changed, and some are 
still under debate and won't make it in without a lot more thought.
BrianH:
5-Nov-2008
These changes to PARSE are another example of changes to the R3 core 
happening as a side effect of the new GUI work :)
BrianH:
5-Nov-2008
Yup. We've been working on the Parse Project article a lot today. 
The last 2 things from the REP that might make it are the THROW and 
INTO-STRING proposals, though both will need some changes first. 
The rest are covered or rejected.
BrianH:
5-Nov-2008
Peter Wood's RETURN proposal is really interesting. I have been thinking 
about how to make a variant of it work.
Anton:
5-Nov-2008
I'd like to understand Peter Wood's START command a bit better. It's 
not clear to me from the example why it's needed. (or even how the 
example works..)
Anton:
5-Nov-2008
Peter's example, from the blog:
parse [a b c d] [
    any [
      start (acc: 0)
      |
      set inc integer! (acc: acc + inc)
      |
      end
    ]
  ]
BrianH:
5-Nov-2008
Here's a working version of that example:
parse [1 2 3 4] [
	(acc: 0)
	any [set inc integer! (acc: acc + inc)]
]
BrianH:
5-Nov-2008
Perhaps he thought a paren could only follow a rule.
BrianH:
5-Nov-2008
I like the RETURN proposal as this:

	RETURN rule


Match the rule and return a copy of the value from the PARSE function. 
Like COPY then BREAK, but without the temporary variable.
Anton:
5-Nov-2008
I vaguely remember suggesting PARSE dialect be extended into parens 
with a few commands. Parens are executed as normal rebol dialect 
(not parse dialected in any way). If I remember correctly, it was 
thought better to keep the parens 'pure' rebol. If that is to be 
maintained, then I think Peter's RETURN command ought to be morphed 
into a parse command, as you suggest above, Brian.
Anton:
5-Nov-2008
-- ie. that's a good idea.
BrianH:
5-Nov-2008
More importantly it will override the meaning of the RETURN function 
at a point where you would expect it to work.
PeterWood:
5-Nov-2008
Clearly my proposal for START is based on my ignorance and inability 
to search the documents properly :-)

It wouldn't hurt as a form of slef-documenting code, though.
BrianH:
5-Nov-2008
Actually, I think it would hurt (no offence). The word start is a 
common name for parse rules and every keyword we add can't be used 
as a parse rule name. Something to consider when making proposals.
Anton:
5-Nov-2008
Perhaps, Peter, you could post a withdrawl for START on the blog.
Chris:
5-Nov-2008
Other side of the coin, if 'end is a keyword, 'start is an intuitive 
companion.
BrianH:
5-Nov-2008
HEAD would be a better name for a directive to reset the position 
to the beginning of the data. That behavior would be more consistent 
with the series accessors :)
BrianH:
5-Nov-2008
It was an initialization proposal. Nonetheless, your HEAD? proposal 
sounds interesting. What problem are you solving that would need 
such a directive?
Pekr:
5-Nov-2008
Anton - but there is some point in time we should start to make rebol 
bigger by adding unnecessary things, or we will never reach 100MB 
executable size and outer world migt not consider us being a rellevant 
alternative :-)
Anton:
5-Nov-2008
One NOP keyword at a time :)
BrianH:
5-Nov-2008
In particular, it would return a copy, like the COPY directive, not 
the SET directive.
Chris:
5-Nov-2008
Like!  Would that work for values? -- [to "<" copy a thru ">" "<" 
return a] ; - returns a if there is a < next?
Anton:
5-Nov-2008
What would you do when you need to process the data a bit first ?

eg. You return tags from different places in a rule, and to distinguish 
them you need to also return something extra, by prepending a code 
to the beginning, for example.
BrianH:
5-Nov-2008
Carl was kinda weirded out by the modifying operations, but I pointed 
out that people do this anyway and get it wrong a lot.
BrianH:
5-Nov-2008
Everything in that Parse Proposals page has already been discussed 
with Carl and could go in, barring insurmountable problems with implementation. 
I stopped putting stuff in when he stopped working for the day. There 
will likely be a couple going in tonight, but Carl is actively involved 
in this process.
BrianH:
5-Nov-2008
The main thing that Carl is concerned about now is that some of the 
proposals make use of the value calculated in a paren on occasion. 
I don't know why this would be a problem, but I'm sure it will be 
worked out or around.
Chris:
5-Nov-2008
Using 'remove -- a) removing a bracket only at the end of a string 
(as per Graham's example):

	parse "[this]" [remove "[" to "]" remove ["]" end]]

b) where you go down a false path:

	parse "abcdef123" [remove "abc" "123" | remove "abcd" "ef123"]
Chris:
5-Nov-2008
Would a) work?  Would b) reset the string as the first rule didn't 
match?
BrianH:
5-Nov-2008
a) would work.

b) would not likely reset the string, just like code blocks don't 
undo.
BrianH:
6-Nov-2008
You might be able to do b) like this:

 parse "abcdef123" [use [a] [remove ["abc" a: "123" :a] | remove ["abcd" 
 a: "ef123" :a] to end]]
or like this:

 parse "abcdef123" [use [a] [remove ["abc" a: "123" :a | "abcd" a: 
 "ef123" :a] to end]]
Chris:
6-Nov-2008
How about this?

	parse "abc" ["a" to end reverse "bc"]
30001 / 6460812345...299300[301] 302303...643644645646647