World: r3wp
[Parse] Discussion of PARSE dialect
older newer | first last |
Ladislav 22-Sep-2010 [5294x3] | But, as you said, one of my motivations was to write it as a mezzanine to have some "inspiration"/experiences with it for Carl. |
, since I guess, that this way, he will not have to just go into an "unknown territory" | |
I must say, that I was actually surprised, how people (including me) have struggled to circumvent this problem, while having such an elegant way available to solve it. | |
GrahamC 18-Oct-2010 [5297] | a regex question ... ([0-9]{4})(-([0-9]{2})(-([0-9]{2})(T([0-9]{2}):([0-9]{2})(:([0-9]{2})(\.([0-9]+))?)?(Z|(([-+])([0-9]{2}):([0-9]{2})))))) is apparently failing this string : 2010-10-18T07:06:25.00Z What tool can I use to check this string against this regex ? |
Sunanda 18-Oct-2010 [5298] | Regexlib has a different ISO-8601 date matching regex: http://regexlib.com/REDetails.aspx?regexp_id=2092 And the ability to enter any regex and target strings to test what happens: http://regexlib.com/RETester.aspx? |
GrahamC 18-Oct-2010 [5299x2] | found this one too http://www.fileformat.info/tool/regex.htm |
and it seems my string is passing ... hmm | |
Sunanda 18-Oct-2010 [5301] | The problem with regexes is they are impossible to debug.....Best just to rewrite continually until they work :) |
GrahamC 18-Oct-2010 [5302] | I'm trying to validate some XML against an online validator and it's rejecting my dates :( |
Henrik 18-Oct-2010 [5303] | how do you specify an element to be of the type any-type! except none! ? |
Ladislav 18-Oct-2010 [5304] | I am afraid, that you need to list all types excluding none |
Henrik 18-Oct-2010 [5305] | does R3 solve this? if not, maybe that would be a good problem to solve. |
Ladislav 18-Oct-2010 [5306] | R3 can let you define that typeset and use it any time you like |
Henrik 18-Oct-2010 [5307] | ok, that is possibly good enough for generating specs. |
Gregg 18-Oct-2010 [5308] | I don't remember what all we did Henrik, but some of our test generation stuff on another world had some support for typesets IIRC. |
Henrik 18-Oct-2010 [5309] | Gregg, ok |
Steeve 18-Oct-2010 [5310] | Henrik, with a parse rule ? |
Henrik 18-Oct-2010 [5311] | Steeve, yes. |
Steeve 18-Oct-2010 [5312] | R3 does it |
AdrianS 18-Oct-2010 [5313] | Graham, try http://gskinner.com/RegExrfor working out regexes. It has a really nice UI where you can hover over the components of the regex and see exactly what they do. |
GrahamC 18-Oct-2010 [5314] | Thanks |
Sunanda 4-Nov-2010 [5315] | Question on StackOverflow.....there must be a better answer than mine, and I'd suspect it involves PARSE (better answers usually do:) http://stackoverflow.com/questions/4093714/is-there-finer-granularity-than-load-next-for-reading-structured-data |
GrahamC 4-Nov-2010 [5316x3] | Use fixed length records |
Anyone got a parse rule that strips out everything between tags in an "xml" document | |
whitespace: charset [ "^/^- " ] swsp: [ any whitespace ] result: copy "" parse/all pqri-xml [ some [ copy t thru ">" (append result t) swsp to "<" ]] | |
Ladislav 4-Nov-2010 [5319] | Posted an answer mentioning the test framework, which does almost exactly what Fork asked |
Gabriele 5-Nov-2010 [5320x3] | also, Carl's clean-script and script colorizer use parse + load/next to do the same thing. my Wetan uses the same method. |
http://www.colellachiara.com/soft/MD3/emitters/wetan.html#section-4.2 | |
basically, as long as you skip over [, (, ), and ] you can just use load/next. I'm also skipping over #[ because I want to preserve literal values while formatting (that is, preserve what the user typed) | |
Oldes 1-Dec-2010 [5323] | How to use the new INTO parse keyword? Could it be used to avoid the temp parse like in this (very simplified example)? parse "<a>123</a>" [thru "<a>" copy tmp to "</a>" (probe tmp probe parse tmp ["123"]) to end] Note that I know that in this example it's easy to use just one parse and avoid the temp. |
Ladislav 1-Dec-2010 [5324x3] | INTO is neither new, not it is meant for string parsing |
You can take advantage of using it when parsing a block and needing to parse a subblock (of any-block! type) or a substring | |
(of the said block) | |
Oldes 1-Dec-2010 [5327] | can you give me a simple example, please? |
Ladislav 1-Dec-2010 [5328x2] | >> parse [a b "123" c] [2 word! into [3 skip] word!] == true |
>> parse [a b c/d/e] [2 word! into [3 word!]] == true | |
Oldes 1-Dec-2010 [5330x2] | I understand now, thanks. |
it's very useful, I woder why I've not found it earlier :) | |
Ladislav 1-Dec-2010 [5332] | The substring property is just a recent addition |
Oldes 1-Dec-2010 [5333] | And is there any nice solution for my string parsing above? I can live with the temps, just was thinking if it could be done better.. anyway, at least I know how to use INTO:) |
Ladislav 1-Dec-2010 [5334x2] | That is normally a "job" for a subrule |
it looks, that you could use e.g. the REJECT keyword | |
Oldes 1-Dec-2010 [5336x2] | I know, but that would require complex rules, I'm lazy parser:) Btw.. my real example looks like: some [ thru {<h2><a} thru ">" copy name to {<} copy doc to {^/ </div>} ( parse doc [ thru {<pre class="code">} copy code to {</pre} ( probe name probe code ) any [ thru {<h5>} copy arg to {<} thru {<ol><p>} copy arg-desc to {</p></ol>} ( printf [" * " 10 " - "] reduce [arg arg-desc] ) ] ] ) ] |
Never mind, I can live with current way anyway.. I was just wondering if the INTO is not intended for such a cases. Now I know it isn't. | |
Ladislav 1-Dec-2010 [5338x3] | For comparison, a similar rule can be written as follows: some [ thru {<h2><a} thru ">" copy name to {<} copy doc any [ and {^/ </div>} break | thru {<pre class="code">} copy code to {</pre} ( probe name probe code ) any [ thru {<h5>} copy arg to {<} thru {<ol><p>} copy arg-desc to {</p></ol>} (printf [" * " 10 " - "] reduce [arg arg-desc]) ] | skip ] ] |
Aha, sorry, that is not similar enough :-( To be similar, it should look as follows, I guess: some [ thru {<h2><a} thru ">" copy name to {<} copy doc any [ thru {<pre class="code">} copy code to {</pre} ( probe name probe code ) any [ thru {<h5>} copy arg to {<} thru {<ol><p>} copy arg-desc to {</p></ol>} (printf [" * " 10 " - "] reduce [arg arg-desc]) ] to {^/ </div>} ] ] | |
Still not cigar, third time: some [ thru {<h2><a} thru ">" copy name to {<} copy doc [ thru {<pre class="code">} copy code to {</pre} ( probe name probe code ) any [ thru {<h5>} copy arg to {<} thru {<ol><p>} copy arg-desc to {</p></ol>} (printf [" * " 10 " - "] reduce [arg arg-desc]) ] to {^/ </div>} ] ] | |
Oldes 1-Dec-2010 [5341x2] | That's not correct.. there is a reason for the temp parse and that's here because thru "<h5" would skip out of the div. |
the DOC is just the temp var for the second parse. | |
Ladislav 1-Dec-2010 [5343] | But, in that case your "inner parse" fails, without you noticing it? |
older newer | first last |