World: r3wp
[Parse] Discussion of PARSE dialect
older newer | first last |
Henrik 18-Oct-2010 [5303] | how do you specify an element to be of the type any-type! except none! ? |
Ladislav 18-Oct-2010 [5304] | I am afraid, that you need to list all types excluding none |
Henrik 18-Oct-2010 [5305] | does R3 solve this? if not, maybe that would be a good problem to solve. |
Ladislav 18-Oct-2010 [5306] | R3 can let you define that typeset and use it any time you like |
Henrik 18-Oct-2010 [5307] | ok, that is possibly good enough for generating specs. |
Gregg 18-Oct-2010 [5308] | I don't remember what all we did Henrik, but some of our test generation stuff on another world had some support for typesets IIRC. |
Henrik 18-Oct-2010 [5309] | Gregg, ok |
Steeve 18-Oct-2010 [5310] | Henrik, with a parse rule ? |
Henrik 18-Oct-2010 [5311] | Steeve, yes. |
Steeve 18-Oct-2010 [5312] | R3 does it |
AdrianS 18-Oct-2010 [5313] | Graham, try http://gskinner.com/RegExrfor working out regexes. It has a really nice UI where you can hover over the components of the regex and see exactly what they do. |
GrahamC 18-Oct-2010 [5314] | Thanks |
Sunanda 4-Nov-2010 [5315] | Question on StackOverflow.....there must be a better answer than mine, and I'd suspect it involves PARSE (better answers usually do:) http://stackoverflow.com/questions/4093714/is-there-finer-granularity-than-load-next-for-reading-structured-data |
GrahamC 4-Nov-2010 [5316x3] | Use fixed length records |
Anyone got a parse rule that strips out everything between tags in an "xml" document | |
whitespace: charset [ "^/^- " ] swsp: [ any whitespace ] result: copy "" parse/all pqri-xml [ some [ copy t thru ">" (append result t) swsp to "<" ]] | |
Ladislav 4-Nov-2010 [5319] | Posted an answer mentioning the test framework, which does almost exactly what Fork asked |
Gabriele 5-Nov-2010 [5320x3] | also, Carl's clean-script and script colorizer use parse + load/next to do the same thing. my Wetan uses the same method. |
http://www.colellachiara.com/soft/MD3/emitters/wetan.html#section-4.2 | |
basically, as long as you skip over [, (, ), and ] you can just use load/next. I'm also skipping over #[ because I want to preserve literal values while formatting (that is, preserve what the user typed) | |
Oldes 1-Dec-2010 [5323] | How to use the new INTO parse keyword? Could it be used to avoid the temp parse like in this (very simplified example)? parse "<a>123</a>" [thru "<a>" copy tmp to "</a>" (probe tmp probe parse tmp ["123"]) to end] Note that I know that in this example it's easy to use just one parse and avoid the temp. |
Ladislav 1-Dec-2010 [5324x3] | INTO is neither new, not it is meant for string parsing |
You can take advantage of using it when parsing a block and needing to parse a subblock (of any-block! type) or a substring | |
(of the said block) | |
Oldes 1-Dec-2010 [5327] | can you give me a simple example, please? |
Ladislav 1-Dec-2010 [5328x2] | >> parse [a b "123" c] [2 word! into [3 skip] word!] == true |
>> parse [a b c/d/e] [2 word! into [3 word!]] == true | |
Oldes 1-Dec-2010 [5330x2] | I understand now, thanks. |
it's very useful, I woder why I've not found it earlier :) | |
Ladislav 1-Dec-2010 [5332] | The substring property is just a recent addition |
Oldes 1-Dec-2010 [5333] | And is there any nice solution for my string parsing above? I can live with the temps, just was thinking if it could be done better.. anyway, at least I know how to use INTO:) |
Ladislav 1-Dec-2010 [5334x2] | That is normally a "job" for a subrule |
it looks, that you could use e.g. the REJECT keyword | |
Oldes 1-Dec-2010 [5336x2] | I know, but that would require complex rules, I'm lazy parser:) Btw.. my real example looks like: some [ thru {<h2><a} thru ">" copy name to {<} copy doc to {^/ </div>} ( parse doc [ thru {<pre class="code">} copy code to {</pre} ( probe name probe code ) any [ thru {<h5>} copy arg to {<} thru {<ol><p>} copy arg-desc to {</p></ol>} ( printf [" * " 10 " - "] reduce [arg arg-desc] ) ] ] ) ] |
Never mind, I can live with current way anyway.. I was just wondering if the INTO is not intended for such a cases. Now I know it isn't. | |
Ladislav 1-Dec-2010 [5338x3] | For comparison, a similar rule can be written as follows: some [ thru {<h2><a} thru ">" copy name to {<} copy doc any [ and {^/ </div>} break | thru {<pre class="code">} copy code to {</pre} ( probe name probe code ) any [ thru {<h5>} copy arg to {<} thru {<ol><p>} copy arg-desc to {</p></ol>} (printf [" * " 10 " - "] reduce [arg arg-desc]) ] | skip ] ] |
Aha, sorry, that is not similar enough :-( To be similar, it should look as follows, I guess: some [ thru {<h2><a} thru ">" copy name to {<} copy doc any [ thru {<pre class="code">} copy code to {</pre} ( probe name probe code ) any [ thru {<h5>} copy arg to {<} thru {<ol><p>} copy arg-desc to {</p></ol>} (printf [" * " 10 " - "] reduce [arg arg-desc]) ] to {^/ </div>} ] ] | |
Still not cigar, third time: some [ thru {<h2><a} thru ">" copy name to {<} copy doc [ thru {<pre class="code">} copy code to {</pre} ( probe name probe code ) any [ thru {<h5>} copy arg to {<} thru {<ol><p>} copy arg-desc to {</p></ol>} (printf [" * " 10 " - "] reduce [arg arg-desc]) ] to {^/ </div>} ] ] | |
Oldes 1-Dec-2010 [5341x2] | That's not correct.. there is a reason for the temp parse and that's here because thru "<h5" would skip out of the div. |
the DOC is just the temp var for the second parse. | |
Ladislav 1-Dec-2010 [5343] | But, in that case your "inner parse" fails, without you noticing it? |
Oldes 1-Dec-2010 [5344x2] | why? it does not fails.. or maybe fails, but I have the data from the doc div, that's all.. it's lazy parsing :) |
btw.. I need to parse the source only once so I really don't have to care about some exceptions. | |
Ladislav 1-Dec-2010 [5346] | I have the data - I doubt you get the data if the "inner parse" fails |
Oldes 1-Dec-2010 [5347x2] | believe me I have.. :) the script is already ready.. I was just thinking if there is some special parse keyword, like INTO, so I could do it without the second parse next time, that's all. I use such a lazy parsing very often. |
in your case I would need to jump at least over each tag start, not using thru "<h5". But then there would be problem, that I need to stop the doc div only if it's exactly "^/ </div" (to avoid case that there would be another inner giv). I know it's not safe, but I can see what I do by examining the source I want to parse first. (240kB html in my case) | |
Ladislav 1-Dec-2010 [5349] | Aha, that "I can see what I do by examining..." looks substantial. Nevertheless, there is still a way how to do a similar thing without calling Parse again |
Oldes 1-Dec-2010 [5350] | I believe, but important is if it would be easy enough to satisfy my lazines... something like the INTO for block parsing. |
Ladislav 1-Dec-2010 [5351x2] | what about this, is it the rule you wanted? some [ thru {<h2><a} thru ">" copy name to {<} to {^/ </div>} doc: [ thru {<pre class="code">} copy code to {</pre} ( probe name probe code ) any [ here: if (lesser? index? here index? doc) thru {<h5>} copy arg to {<} thru {<ol><p>} copy arg-desc to {</p></ol>} (printf [" * " 10 " - "] reduce [arg arg-desc]) ] ] :doc ] |
aha, I missed there should be doc-start and doc-end | |
older newer | first last |