Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

parse, again...

 [1/25] from: hallvard:ystad:helpinhand at: 1-Nov-2001 14:46


I seem to get something wrong. Look here:
>> html-code: [ copy tag ["<" thru ">"] | copy txt to "<" ]
== [copy tag ["<" thru ">"] | copy txt to "<"]
>> parse/all read http://www.rebol.com/ [to "<" some html-code]
connecting to: www.rebol.com == false This returns false because the website ends with some whitespace after the last tag. I redefine 'html-code and try again:
>> html-code: [ copy tag ["<" thru ">"] | copy txt [to "<" | thru end]]
== [copy tag ["<" thru ">"] | copy txt [to "<" | thru end]]
>> parse/all read http://www.rebol.com/ [to "<" some html-code]
connecting to: www.rebol.com == false Hm. Why does this return false? Another try:
>> html-code: [ copy tag ["<" thru ">"] | copy txt to "<" | copy txt thru
end ] == [copy tag ["<" thru ">"] | copy txt to "<" | copy txt thru end]
>> parse/all read http://www.rebol.com/ [to "<" some html-code]
connecting to: www.rebol.com == false Still wrong return value from 'parse. But then, finally:
>> html-code: [ copy tag ["<" thru ">"] | copy txt to "<" | skip]
== [copy tag ["<" thru ">"] | copy txt to "<" | skip]
>> parse/all read http://www.rebol.com/ [to "<" some html-code]
connecting to: www.rebol.com == true So I managed, finally. But what if I want to use whatever is written after the last tag? And especially: what's wrong with my second approach? I know this probably is basic, and that it's probably been answered a hundred times already on this mailing list, but I just can't seem to find the solution (and I _have_ searched, yes). ~H

 [2/25] from: ryanc:iesco-dms at: 1-Nov-2001 9:17


Seems like 'thru 'end always returns false. Try it this way...
>> chars: complement charset []
== make bitset! #{ FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF }
>> html-code: [ copy tag ["<" thru ">"] | copy txt to "<" | some chars]
== [copy tag ["<" thru ">"] | copy txt to "<" | some chars]
>> parse/all read http://www.rebol.com [to "<" some html-code]
== true
>>
If you want to confirm that it is html and not just a text page, add a 'to <html> in front, otherwise you might get false positives. --Ryan Hallvard Ystad wrote:
> I seem to get something wrong. Look here: > >> html-code: [ copy tag ["<" thru ">"] | copy txt to "<" ]
<<quoted lines omitted: 33>>
> [rebol-request--rebol--com] with "unsubscribe" in the > subject, without the quotes.
-- Ryan Cole Programmer Analyst www.iesco-dms.com 707-468-5400

 [3/25] from: nitsch-lists:netcologne at: 1-Nov-2001 18:37


RE: [REBOL] parse, again... [hallvard--ystad--helpinhand--com] wrote:
> I seem to get something wrong. Look here: > >> html-code: [ copy tag ["<" thru ">"] | copy txt to "<" ]
<<quoted lines omitted: 9>>
> connecting to: www.rebol.com > == false
well,
>> parse "hello" [thru end]
== false
>> parse "hello" [to end]
== true ;-) Volker

 [4/25] from: ryanc:iesco-dms at: 1-Nov-2001 9:48


Haha, good point Volker, someone cant go thru the end! --Ryan [nitsch-lists--netcologne--de] wrote:
> RE: [REBOL] parse, again... > [hallvard--ystad--helpinhand--com] wrote:
<<quoted lines omitted: 55>>
> [rebol-request--rebol--com] with "unsubscribe" in the > subject, without the quotes.
-- Ryan Cole Programmer Analyst www.iesco-dms.com 707-468-5400

 [5/25] from: hallvard:ystad:helpinhand at: 1-Nov-2001 19:27


[nitsch-lists--netcologne--de] skrev (18.37 01.11.2001):
>RE: [REBOL] parse, again... >well,
<<quoted lines omitted: 3>>
>== true >;-) Volker
But try html-code: [ copy tag ["<" thru ">"] | copy txt [to "<" | to end] (print txt)] parse/all read http://www.rebol.com/ [to "<" some html-code] ...and see what you get... (Why?) ~H

 [6/25] from: lmecir::mbox::vol::cz at: 1-Nov-2001 22:43


Hi Halvard,
> But try > html-code: [ copy tag ["<" thru ">"] | copy txt [to "<" | to end] (print
txt)]
> parse/all read http://www.rebol.com/ [to "<" some html-code] > > ...and see what you get... (Why?) > > ~H > > Praetera censeo Carthaginem esse delendam
your bug simplified: rule1: [to end] parse/all "a" [some rule1] you cannot write a rule like this, because parse will run forever (hint: rule1 is always fulfilled). This is why the correct form of your rule should be: html-code: [ copy tag ["<" thru ">"] | copy txt [to "<" | some skip] (print txt)] parse/all read http://www.rebol.com/ [to "<" some html-code] Cheers Ladislav

 [7/25] from: nitsch-lists:netcologne at: 2-Nov-2001 7:19


RE: [REBOL] Re: parse, again... [lmecir--mbox--vol--cz] wrote:
> Hi Halvard, > > But try
<<quoted lines omitted: 16>>
> (print txt)] > parse/all read http://www.rebol.com/ [to "<" some html-code]
or [skip to end], which tells you want at least one char in the string. html-code: [ copy tag ["<" thru ">"] | copy txt [to "<" | skip to end] (print txt)] parse/all read http://www.rebol.com/ [to "<" some html-code] -Volker

 [8/25] from: ingo:2b1 at: 1-Nov-2001 20:52


Hi Hallvard, I'll answer your question as meaning "how do I get the work done?", so ... Once upon a time Hallvard Ystad spoketh thus:
> I seem to get something wrong. Look here: > >> html-code: [ copy tag ["<" thru ">"] | copy txt to "<" ] > == [copy tag ["<" thru ">"] | copy txt to "<"] > >> parse/all read http://www.rebol.com/ [to "<" some html-code] > connecting to: www.rebol.com > == false
I'd do it this way:
>> html-code: [ some [ set tag tag! | set str string! ]]
== [ some [ set tag tag! | set str string! ]]
>> parse/all load/markup read http://www.rebol.com/ html-code
== true (load/markup parese the block and returns a block containing tag!s and string!s, so ost of your work has already been done.) kind regards, Ingo

 [9/25] from: hallvard:ystad:helpinhand at: 2-Nov-2001 8:33


Thanks, Ingo. Now the fact is, the 'html-code rule I presented here is a pretty simplified one. The one I _really_ use triggers different functions depending on _what_ tag I get. It's 41 lines long. Since both 'load and 'parse are natvie, I doubt there would be any performance gain on changing to load/markup (and adding the code to trigger actions for different tags elsewhere). Or?? Thanks Volker and Ladislav too. Volker: I can't seem to get your rule and Ladislav's to behave differently. They both react only if there is at least one character in the sitrng. So it was indeed a bug that bugged me, not only (as it normally is) my lack of rebolism. ~H

 [10/25] from: ingo:2b1 at: 2-Nov-2001 9:26


Hi Hallvard, Once upon a time Hallvard Ystad spoketh thus:
> Thanks, Ingo. Now the fact is, the 'html-code rule I presented here is a > pretty simplified one. The one I _really_ use triggers different functions > depending on _what_ tag I get. It's 41 lines long. Since both 'load and > 'parse are natvie, I doubt there would be any performance gain on changing > to load/markup (and adding the code to trigger actions for different tags > elsewhere). Or??
I created an html viewer this way, used load/markup, then when I found a tag I parsed it to get a block like this: [tag-name param1 value1 param2 ... ] I don't know about runtime performance, but code-time performance seemed pretty good to me ;-) Anyway, if you're interested, it's at http://www.h-o-h.org/browser.r kind regards, Ingo

 [11/25] from: nitsch-lists:netcologne at: 3-Nov-2001 3:46


RE: [REBOL] Re: parse, again... Hi [hallvard--ystad--helpinhand--com] wrote:
> Thanks, Ingo. Now the fact is, the 'html-code rule I presented here is a > pretty simplified one. The one I _really_ use triggers different functions > depending on _what_ tag I get. It's 41 lines long. Since both 'load and > 'parse are natvie, I doubt there would be any performance gain on changing > to load/markup (and adding the code to trigger actions for different tags > elsewhere). Or?? >
hmm, if you want to use .. "<" [to "href" .. | to "font" ..] to ">" .. you get a problem: parse would look for a href before it looks for font. so with "<font> <href>" it would find the href and return it, with all the font-tag-stuff before. font would olny be searched if there is no "href" anywhere in the text. so it can make sense to pre-parse all tags and analyse them in a seperate parse-call. .. copy tag "<" thru ">" (parse tag [to "href" .. | to "font" ..]) .. in that case load/markup would do the same.
> Thanks Volker and Ladislav too. Volker: I can't seem to get your rule and > Ladislav's to behave differently. They both react only if there is at least > one character in the sitrng. >
Yes, its just another way to say it. well, i was fighting a while until i solved this, so i was eager to reply too ;-) oh, and it has [to end] in it!!
> So it was indeed a bug that bugged me, not only (as it normally is) my lack > of rebolism. >
i think having [thru end] to say "i got all, stop!" would be a feature. our solutions work pretty, but not that obvious.
> ~H >
-Volker

 [12/25] from: lmecir:mbox:vol:cz at: 3-Nov-2001 19:27


Hi,
> > Thanks Volker and Ladislav too. Volker: I can't seem to get your rule
and
> > Ladislav's to behave differently. They both react only if there is at
least
> > one character in the sitrng.
There may be a speed difference, you can do some benchmarking to find out.
> i think having [thru end] to say "i got all, stop!" would be a feature.
Actually, in http://www.sweb.cz/LMecir/parseen.r I use [to end skip]. [thru end] seems to be shorter while having the same meaning.

 [13/25] from: nitsch-lists:netcologne at: 4-Nov-2001 1:13


RE: [REBOL] Re: parse, again... [lmecir--mbox--vol--cz] wrote:
> Hi, > > > Thanks Volker and Ladislav too. Volker: I can't seem to get your rule
<<quoted lines omitted: 6>>
> Actually, in http://www.sweb.cz/LMecir/parseen.r I use [to end skip]. [thru > end] seems to be shorter while having the same meaning.
difference is
>> parse "hello" [to end skip]
== false
>> parse "hello" [skip to end]
== true [to end skip] will stop, but gives false. our "workarounds" gives true but are not so obvious. i would like to get "stop" and "true", and somehow, [thru end] would make sense to me (its what one thinks of first?) -Volker

 [14/25] from: lmecir:mbox:vol:cz at: 4-Nov-2001 15:48


Hi Volker,
> > Actually, in http://www.sweb.cz/LMecir/parseen.r I use [to end skip].
[thru
> > end] seems to be shorter while having the same meaning.
It seems that instead of [to end skip] I am using I can use [end skip] or [thru end], that is all I wanted to say. Your suggestion is not compatible with the current behaviour of PARSE. I am not sure your suggestion is compatible with the PARSE philosophy.

 [15/25] from: robert:muench:robertmuench at: 5-Nov-2001 11:31


> -----Original Message----- > From: [rebol-bounce--rebol--com] [mailto:[rebol-bounce--rebol--com]]On Behalf Of
<<quoted lines omitted: 6>>
> == false > [to end skip] will stop, but gives false.
Hi, I just jump in maybe I miss the point but 'to end' goes beyond the last character and a following 'skip' tries to go even beyond this virtual end-tag. Of course parse returns false.
> >> parse "hello" [skip to end] > == true> our "workarounds" gives true but are not so obvious.
I think this is very obvious, this tells parse to 'skip to end' that's skip beyond the last consumable character.
> i would like to get "stop" and "true", and somehow, > [thru end] would make sense to me (its what one thinks of first?)
If we define 'end as the position behind the last consumable character it doesn't make sense to go thru the end. It's OK to go/skip to the end. Robert

 [16/25] from: brett:codeconscious at: 5-Nov-2001 23:39


> > >> parse "hello" [skip to end] > > == true> our "workarounds" gives true but are not so obvious. > > I think this is very obvious, this tells parse to 'skip to end' that's
skip
> beyond the last consumable character.
No I don't think so. The above parse rule has two instructions: 1) SKIP - skips a single character only in this case 2) TO END - Moves the index to the tail of the input stream. For example:
>> parse "hello" [skip mark: (print mark) to end mark: (probe tail?
mark) ] ello true == true Brett.

 [17/25] from: robert:muench:robertmuench at: 5-Nov-2001 16:38


> -----Original Message----- > From: [rebol-bounce--rebol--com] [mailto:[rebol-bounce--rebol--com]]On Behalf Of
<<quoted lines omitted: 5>>
> No I don't think so. The above parse rule has two instructions: > 1) SKIP - skips a single character only in this case
Hi, in what case? IMO skip isn't optional it's mandatory, so the rule is only satisfied if a skip is possible.
>> parse "a" [skip to end]
== true
>> parse " " [skip to end]
== false
>> parse "" [skip to end]
== false
>> parse/all " " [skip to end]
== true
> 2) TO END - Moves the index to the tail of the input stream.
Yep, and the tail is beyond the last character.
>> test: "test"
== "test"
>> last test
== #"t"
>> tail test
== "" Robert

 [18/25] from: brett:codeconscious at: 6-Nov-2001 10:49


> > >> parse "hello" [skip to end] > > > > No I don't think so. The above parse rule has two instructions: > > 1) SKIP - skips a single character only in this case > > Hi, in what case?
Robert, your email earlier had:
> > >> parse "hello" [skip to end] > I think this is very obvious, this tells parse to 'skip to end' that's
skip
> beyond the last consumable character.
As it reads, it looks as if you are saying that "skip to end" is a single atomic parse instruction, that Rebol understands the English meaning of the phrase 'skip to end' - I disagree with this. I was simply saying that you have two parse instructions here. "skip" is carried out first, then "to end" is carried out.
> IMO skip isn't optional it's mandatory, so the rule is only > satisfied if a skip is possible.
Perhaps your comment that skip is mandatory is relative to some particular situation (or the stopping condition discussion), if so I'm sorry I obviously missed your intention. I just want to ensure that someone learning parse was not confused about the meaning of the rule [skip to end], which is "match one and move to tail". Brett.

 [19/25] from: nitsch-lists:netcologne at: 6-Nov-2001 4:02


RE: [REBOL] Re: parse, again... Hi Robert [robert--muench--robertmuench--de] wrote:
> > -----Original Message----- > > From: [rebol-bounce--rebol--com] [mailto:[rebol-bounce--rebol--com]]On Behalf Of
<<quoted lines omitted: 17>>
> If we define 'end as the position behind the last consumable character it > doesn't make sense to go thru the end. It's OK to go/skip to the end. Robert
thread started like parse "this that something some others"[some[thru "something"|thru end]] this was the way the thread-starter used it intuitively, and wondered why 'parse said "not successfully parsed" i replied untested parse "this that something some others"[some[thru "something"|to end]] but oops, this gives an infinite loop. then solution was to make the [to end] part smarter with [some skip] (Ladislav) or [skip to end] (me). but because [thru "something"] means "all including this is parsed", [thru end] could well mean "all including end is parsed" IMHO. i would like it. -Volker

 [20/25] from: hallvard:ystad:helpinhand at: 6-Nov-2001 8:28


[nitsch-lists--netcologne--de] skrev (Tuesday 06.11.2001, kl. 04.02):
>thread started like > parse "this that something some others"[some[thru "something"|thru end]] >this was the way the thread-starter used it intuitively, >and wondered why 'parse said "not successfully parsed"
Actually, I wasn't using "thru end" _intuitively_, although it was among my different approaches. But once I thought about it, I _do_ find it logic to be able to do a parse "thru end", since 'end is indeed defined as a something (even though simply a marker) after the last character in a string. "To end" produces an infinite loop (guess this is a bug), so I ended up using 'skip. Whatever is the most logical solution, "to end" or "thru end," is a philosophical or computer technical question that I won't probe into. But the discussion following my question has been interesting, and thank you all for helping to answer my question.
>but because [thru "something"] means "all including this is parsed", >[thru end] could well mean "all including end is parsed" IMHO. >i would like it.
I would too, but I'd like just as much that "to end" would simply work... ~H

 [21/25] from: lmecir:mbox:vol:cz at: 6-Nov-2001 10:03


Hi Halvard, Volker: << ...thread started like parse "this that something some others"[some[thru "something"|thru end]] ...thread-starter ... wondered why 'parse said "not successfully parsed"...
>>
Halvard: <<Actually, I wasn't using "thru end" _intuitively_, although it was among my different approaches. But once I thought about it, I _do_ find it logic to be able to do a parse "thru end", since 'end is indeed defined as a something (even though simply a marker) after the last character in a string. "To end" produces an infinite loop (guess this is a bug)... I'd like just as much that "to end" would simply work...
>>
The problem is, that [to end] must work as it does to work correctly. See this: parse "" [to end to end] ; == true I am sure, that this behaviour is correct regardless of the quantity of [to end] you put in there. That is why any attempt to write parse "" rule: [to end rule] or parse "" [any [to end]] Correctly ends up in an infinite cycle, because PARSE is unable to finish its work. It's up to the user to write a rule that really parses through INPUT and doesn't try to pretend it's doing anything useful when it actually isn't. Cheers Ladislav

 [22/25] from: robert:muench:robertmuench at: 6-Nov-2001 10:04


> -----Original Message----- > From: [rebol-bounce--rebol--com] [mailto:[rebol-bounce--rebol--com]]On Behalf Of
<<quoted lines omitted: 5>>
> atomic parse instruction, that Rebol understands the English meaning of the > phrase 'skip to end' - I disagree with this.
Hi, no that wasn't my intention. Of course these are two instructions 'skip and 'to and one special meaning terminal symbol 'end ;-). My sentence just specified the meaning of the whole rule.
> I was simply saying that you > have two parse instructions here. "skip" is carried out first, then "to end" > is carried out.
Correct. As said two instructions and one terminal symbol.
> Perhaps your comment that skip is mandatory is relative to some particular > situation (or the stopping condition discussion), if so I'm sorry I > obviously missed your intention.
No problem, I'm not a native english speaker so I might not write specific enough. With mandatory I meant the meaning of skip is 'skip one symbol' and not 'skip one symbol if possible else ignore the rule' Robert

 [23/25] from: hallvard::ystad::helpinhand::com at: 6-Nov-2001 10:24

Re: parse, again... (or rather: still!)


Ladislav Mecir skrev (Tuesday 06.11.2001, kl. 10.03):
>The problem is, that [to end] must work as it does to work correctly. See >this:
<<quoted lines omitted: 7>>
>Correctly ends up in an infinite cycle, because PARSE is unable to finish >its work.
Just a sec... Why does parse "" [to end to end] yield 'true? What happens when 'parse has done the first [to end]? Where is 'parse at then? And wouldn't it be simple for rebol to know that once 'parse has reached 'end, all further parse instructions must fail or be ignored? (I just feel the infinite loop is unneccessary).
>It's up to the user to write a rule that really parses through >INPUT and doesn't try to pretend it's doing anything useful when it >actually >isn't.
That's just my debuggin' everyday life...! ~H

 [24/25] from: joel:neely:fedex at: 6-Nov-2001 0:33


Hi, Hallvard, Hallvard Ystad wrote:
> > Just a sec... Why does parse "" [to end to end] yield 'true? > What happens when 'parse has done the first [to end]? Where is > 'parse at then? And wouldn't it be simple for rebol to know that > once 'parse has reached 'end, all further parse instructions > must fail or be ignored? (I just feel the infinite loop is > unneccessary). >
Please! NO! Let's not add one more special case that has to be memorized because it doesn't follow the simple, consistent pattern. Consider this example..
>> parse "x" [to "x" to "x" to "x" to "x" to "x" to "x" skip]
== true As I understand it (and I'm sure someone will tell me if I'm off target ;-), the meaning of to foo is very simple: Is it possible to move forward 0 or more characters and arrive at foo? (I'm thinking here only in terms of the result of the entire PARSE expression, and ignoring the side-effects for the moment.) Being able to move forward zero characters is highly useful in parsing (generally speaking, not REBOL specific) and I'd certainly not like to lose that ability. Of course, the meaning of skip is Is it possible to move forward exactly one character? and the final result of PARSE means Did the supplied rule reach the end of the string? With only these pieces of model, we can accurately predict the behavior both of the previous example and this one:
>> parse "x" [to "x" to end to end to end to end to end to end]
== true Now, consider that some foo means Can we match the sub-pattern FOO one or more times? or, imperatively Match the sub-pattern FOO as many times (>= 1) as possible. Then it's not surprising that
>> parse "x" [some [to "x"] skip]
will wander off into the weeds and never come back (and we can get the same with ANY = "... zero or more ..."). I suspect that any programming language sufficienly powerful for practical use allows one to write an infinite loop. Complicating it with extra heuristics designed to protect me from myself seems the road to ruin. Just my $0.02... -jn- -- The most important thing in the programming language is the name. A language will not succeed without a good name. I have recently invented a very good name and now I am looking for a suitable language. -- Donald Knuth joel&dot&FIX&PUNCTUATION&neely&at&fedex&dot&com

 [25/25] from: hallvard:ystad:helpinhand at: 6-Nov-2001 14:30


Joel Neely skrev (Tuesday 06.11.2001, kl. 07.33):
>[...] >and the final result of PARSE means > > Did the supplied rule reach the end of the string?
Well, but:
>> parse "x" [to "x" to end to end to end to end to end to "x"]
== false The rule certainly reached the end of the string (but itself couldn't be completed!)... I missed the fact that [to end] reaches the end, but doesn't thereby stop 'parse. So If I ask for more than one occurance, there will be a loop. Logical. Thanks for all help, ~H

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted