r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Chris
28-Sep-2008
[2663x2]
Is there any objection to matching type -> checking value other than 
the inconvience?
You could also preprocess the block using an alternative to 'reduce 
--


 parse blk [any [mk: lit-word! (mk: change mk switch mk/1 [...]) :mk 
 | skip]]
BrianH
28-Sep-2008
[2665x4]
In general that restriction of parse is part of an overall pattern 
in REBOL of encouraging you to use lit-words as lit-words rather 
than some other kind of datatype. Lit-words in REBOL are generally 
used to express literal expressions of words, rather than being used 
as a distinct datatype. In general you convert them to words before 
use.
It's usually a bad idea to use lit-words as keywords - they make 
better values. If you are comparing to a particular lit-word value, 
that is using it as a keyword. If any lit-word value would do and 
their meaning is semantic rather than syntactic, that works. In general, 
PARSE is better for determining syntactic stuff - use the DO dialect 
code in the parens for semantic stuff.
Not that I don't want a LIT or LITERAL directive in PARSE that would 
turn off the PARSE-dialect treatment of the next value in the spec.
It would only be for block parsing though.
Anton
10-Oct-2008
[2669x5]
term: [word! | into term]
parse [a b [c]] [some term]  ;== true
parse [a b [c d]] [some term]  ;== false
I'm a bit confused by that.  I need to parse recursively.
duh... never mind.
Solution:
terms: [some [word! | into terms]]
parse [a b [c d]] terms  ;== true
Terry
12-Oct-2008
[2674x2]
blk: [aa "test" bb "two"  cc  "#block"]
rules: [some [cc set cc string! ]]
parse blk rules

no go? 

I have a more complicated rule set that chokes on the "#block" string.. 
does it think it's an issue! ?
... rules looks like this rather.. 
rules: [some ['cc set cc string! ]]
Henrik
12-Oct-2008
[2676]
Your parser would stop at 'aa, since you never specify it in the 
rule block.

Perhaps something like:

rules: [some [['cc set cc string!] | [word! string!]]
sqlab
12-Oct-2008
[2677]
rules: [some [set  ww  word! set ss string! (do reduce [to-set-word 
ww ss]) ]]
Henrik
30-Oct-2008
[2678]
>> parse/all {2008-10-30|"This is" NOK|http://www.example.com}"|"
== ["2008-10-30" "This is" " NOK" "http://www.example.com"]

I caught this on the mailing list. Bug?
sqlab
30-Oct-2008
[2679]
Yes, this is an old bug.
It does not work, if " is next to your delimiter.
Insert a blank, and it works again.
Graham
3-Nov-2008
[2680x3]
This is a result of using parse-xml and some cleanup

[document
	[soapenv:Envelope
		[soapenv:Body
			[ns1:getSpellingSuggestionsResponse
				[getSpellingSuggestionsReturn
					[getSpellingSuggestionsReturn "Penicillin G"]
					[getSpellingSuggestionsReturn "Penicillin V"]
					[getSpellingSuggestionsReturn "Penicillamine"]
					[getSpellingSuggestionsReturn "Polycillin"]
				]
			]
		]
	]
]
what's the cleanest way to extract the drug names?
drugs: [set drugblock into [ 'getSpellingSuggestionsReturn set drugname 
string! ( print drugname) ]]

parse a [ 'document set envelope into [ 'soapEnv:envelope set body 
into [ 'soapEnv:body set response into [ 'ns1:GetSpellingsuggestionsresponse 
set returns into ['getspellingsuggestionsreturn some drugs to end 
]]]]]

works but is very long winded
Gregg
4-Nov-2008
[2683]
It's not so bad Graham. And whether you can shorten things depends 
on how exact you need to be.

rule: [
	'getspellingsuggestionsreturn some drugs
	| url! into rule
]
parse a ['document into rule]
PeterWood
4-Nov-2008
[2684x3]
This is a bit shorter but recursive:

pr: [any
          [
             [set b block! (parse b pr)] 
             |
	 ['getSpellingSuggestionsReturn set s string! (
    
                insert drug-names s

              ) 
           
             | 
            skip
     
      ]
     
]
]
Usage:

>>drug-names: copy []

>> parse gx pr
 
== true

>> drug-names

== ["Polycillin" "Penicillamine" "Penicillin V" "Penicillin G"]
If all you're extracting is the drug names wouldn't it be simpler 
to just parse the XMLstring directly?
Graham
4-Nov-2008
[2687x7]
not sure if it is
<?xml version="1.0" encoding="utf-8" ?> 

- <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
- <soapenv:Body>

- <ns1:getSpellingSuggestionsResponse soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"
xmlns:ns1="http://db.rxnorm.nlm.nih.gov">

- <getSpellingSuggestionsReturn soapenc:arrayType="soapenc:string[4]" 
xsi:type="soapenc:Array" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/">

  <getSpellingSuggestionsReturn xsi:type="soapenc:string">Penicillin 
  G</getSpellingSuggestionsReturn> 

  <getSpellingSuggestionsReturn xsi:type="soapenc:string">Penicillin 
  V</getSpellingSuggestionsReturn> 

  <getSpellingSuggestionsReturn xsi:type="soapenc:string">Penicillamine</getSpellingSuggestionsReturn> 

  <getSpellingSuggestionsReturn xsi:type="soapenc:string">Polycillin</getSpellingSuggestionsReturn> 
  </getSpellingSuggestionsReturn>
  </ns1:getSpellingSuggestionsResponse>
  </soapenv:Body>
  </soapenv:Envelope>
forget about the " - " present ...
I always find parsing xmlstrings somewhat fragile ....
I'm not even sure how your parsing works!  But it does :)
the output I presented looks so close to being a rebol object .. 
and then I can use paths to access the data
rule: [
	'getspellingsuggestionsreturn some drugs
	| word! into rule
]

is I think what Gregg wanted to write
Pekr
4-Nov-2008
[2694]
Graham - what xml REBOL tool do you use? I might need to parse XML 
stuff soon. In the past I used one tool (don't remember the author), 
which made object from parsed data automatically ...
Graham
4-Nov-2008
[2695x2]
I tried rebelxml.r but I can't get it to work
it works for simple stuff as in the documentation ... but can't cope 
with the example above.
PeterWood
4-Nov-2008
[2697x4]
pr: [any [
  [{<getSpellingSuggestionsReturn xsi:type="soapenc:string">} 

    copy s to "</getSpellingSuggestionsReturn>"
    (insert tail 
drug-names s)
  ]
  |
  skip
  ]
]
usage:

>> drug-names: copy [
]
== []

>> parse/all gxs pr  
 
== true

>> drug-names

== ["Penicillin G" "Penicillin V" "Penicillamine" "Polycillin"]
Sorry about the formatting ... can't cut and paste in AltME on a 
Mac without reformatting.
pr: [any [

  [{<getSpellingSuggestionsReturn xsi:type="soapenc:string">}
 
    copy s to "</getSpellingSuggestionsReturn>"

    (insert tail drug-names )
 
   ]
  |

  skip
  
  ]

]
Graham
4-Nov-2008
[2701]
Are the { } required ?
PeterWood
4-Nov-2008
[2702x3]
If it's not fast enough you can speed it up by adding a rule to consume 
the unwanted parts.
I think so because of the "soapenc:string"
gxs is a string of your xml listed above.
Graham
4-Nov-2008
[2705]
I guess it avoids all the other stuff I was doing to force it to 
rebol blocks to allow block parsing :)
Gregg
4-Nov-2008
[2706]
I pasted your code here, which loads the block. I guess the XML parser 
produces the output with those values as words.
Tomc
5-Nov-2008
[2707x2]
foreach item load/markup xml [if not tag? item[ print item]]
not parse and not pretty  but  you get the idea
Graham
5-Nov-2008
[2709]
rebelxml works well for most things  ..not just the ones where the 
namespace is in the tag name
Pekr
5-Nov-2008
[2710]
http://www.rebol.net/r3blogs/0155.html- if you want some improvements 
to parse, now is the time to ask for them ...
BrianH
5-Nov-2008
[2711x2]
We've hammered out some proposals so far, but we are really interested 
in more ideas, especially if we can make them fit.
So far we have been accepting proposals in these categories:
- Recognition: LIT, NOT, OF, TO and THRU extensions
- Modification: CHANGE, INSERT, REMOVE

- Structural and control flow: FAIL (may not be the final name), 
USE, CHECK (still debate here), REVERSE


There is still some debate even within these proposals (name of FAIL 
for example) and some of them might not make it. Some of the old 
PARSE REPs have been definitively rejected or changed, and some are 
still under debate and won't make it in without a lot more thought.