r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Core] Discuss core issues

btiffin
7-May-2007
[7822]
I can usually get away with the expression 'utility' when doc'ing 
the library scripts, but
I don't want to slip up too many times.  :)
BrianH
7-May-2007
[7823]
I usually reserve "utility" for standalone scripts, not library functions.
btiffin
7-May-2007
[7824x2]
A lot of the library scripts are "stand alone".  Hopefully that situation 
will change as we
progress.
We (well some smarter guys) are hard at work defining a real library 
function library :)
BrianH
7-May-2007
[7826]
I don't mean scripts from the script library, I mean functions that 
are meant to be part of an API.
btiffin
7-May-2007
[7827x2]
Yeah, understood.  Right now I'm putting up Usage docs for the more 
'fluffy' rebol.org 
entries.  Meaty docs should start soon.
Fluffy may me the wrong term.  There is a lot to learn in the samples.
Chris
8-May-2007
[7829]
>> any-function? :now
== true
TimW
10-May-2007
[7830]
Well, I didn't know where to ask this, so maybe here is a good place.
x: load/markup http://www.xanga.com/timwylie/588202056/item.html

then look at x/211, It's a big chunk of tags.  Why?  The only issue 
I can see is that the first tag at 211 has two quotes in a row.  
Any help in fixing this problem?
Henrik
10-May-2007
[7831x2]
looks like the double quotes are throwing load/markup off
so it thinks the rest is a big string inside the tags
Gregg
10-May-2007
[7833]
If that's the case, check RAMBO and submit it if it isn't there already.
Anton
10-May-2007
[7834x2]
We should try to isolate the problem first.
A huge bunch of html isn't very useful. Can you make a small example 
of the problem.
Gregg
10-May-2007
[7836x3]
>> load/markup {<span id="xprofimg"">}
== [{span id="xprofimg"">}]
Looks like it can't load it, so just returns the whole thing, unparsed, 
as a string.
Because of the trailing open quote. Remove that and it parses.
Anton
10-May-2007
[7839]
Good Gregg. This issue came up years ago. Load/markup only accepts 
perfect input.
TimW
10-May-2007
[7840]
Well, that's what I was afraid of.  Thanks guys.  I'll just have 
to parse it another way.
Oldes
10-May-2007
[7841]
I never found load/markup useful for getting data from html content... 
using just parse is faster and safer
btiffin
10-May-2007
[7842]
TimW;  Haven't tried it, but check mdlparser.r in the rebol.org library. 
 It make work
round some of the problems.
Anton
10-May-2007
[7843]
Yes, you can't rely on html to be well-formed. In my web data extractors, 
I found that brutish, direct methods required less code and were 
less brittle than methods that tried to parse each html tag correctly.
TimW
10-May-2007
[7844]
Well, I guess I'll just parse the string.  I'll check out the other 
script as well.  Thanks.
Anton
10-May-2007
[7845x2]
One of the problems is that the format of web pages changes often. 
Web developers don't hand out any guarantee that the format of their 
data in a html page is going to be in the same place all the time. 
eg. I can' t always go to the second table inside the fourth row 
of the third outermost table to get my data, because after all that 
navigation, they might just rearrange the data into the third table 
of the second row of the second outermost table. Arg!
So I usually look for short key strings which are unlikely to change 
to jump ever closer to the data I need.
TimW
10-May-2007
[7847]
Yeah, I was actually parsing on the div class sections, but I can 
just as easily ''parse on the class names.  I just liked being able 
to get to the data by throwing the tags away like [thru <div class="x"> 
any tag! copy data string! to end].
Brock
10-May-2007
[7848x5]
I do the same as Anton.  Grab the smallest unique text to bracket 
the content you want using parse first.  I then use load/markup to 
get at the other bits.
I was actually thinking of building a dialect to extract the data 
that I want, but not certain how to proceed with this as each row 
of the table I need to grab although similar is always missing the 
odd element so using element /x will not always give me the same 
data.  I have found approx 10 variations in the data for the tables 
I'm trying to pull the data out of, so not sure if that is the best 
way or not.  Any advice would be great.
here's an example of the type of content I am trying to effectively 
parse.
http://www.nhl.com/nhl/app?service=page&page=Schedule
I was using the length of the returned block after the load-markup, 
but I would likely be better off defining a simple parse statement 
to grab the contents of the rows.
TimW
10-May-2007
[7853]
Oh.  that worked great.  I just read the string, found the div tag, 
then loaded it from there and i didn't have to change my code.
btiffin
10-May-2007
[7854]
Brock;  Check out Daniel's rebol.org submissons...  mdlparser, quickparser 
and 
rblxelparser.  Might be a few hints and tips.
Brock
10-May-2007
[7855]
will do, thanks Brian.
Terry
12-May-2007
[7856]
I have a question.. 

What's the best way to iterate through a hash/dict! etc looking for 
values to multiple keys with the same name  ie:

n:[one "test1"  two "test2"  one "test3"]

where it returns an array of all 'one' key values?
btiffin
12-May-2007
[7857x5]
remove-each [key item] n [not key = 'one]
Of course, you'd want a copy  :)
iirc, newer REBOLs may have other '-each' native! words.
During a Sunanda ML contest, someone came up with a parse solution 
to compacting

...it was fairly blazing in speed, but I got busy after phase one 
of the contest and didn't 
follow as close as I would have liked to.

Check http://www.rebol.org/cgi-bin/cgiwrap/rebol/ml-display-thread.r?m=rmlPCJC
for some
details...watch the entries by Romano, Peter and Christian.
The winning code ended up in

http://www.rebol.org/cgi-bin/cgiwrap/rebol/view-script.r?script=rse-ids.r

It's not the same problem set, but parse seems to be a bit of a magic 
bullet, speed
wise.
I'm completely new to parse...haven't timed it...it probably breaks...
out: copy []
key: 'one

parse n [to key  any [thru key (insert tail out first n  insert tail 
out second n)  skip]]
Terry
12-May-2007
[7862]
Im thinking parse is not the way to go.. I'm referring to very large 
hash tables)
Chris
12-May-2007
[7863x3]
; This probably isn't efficient enough either:
remove-dups: func [block [any-block!] /local this-key][

    forskip block 2 [

        this-key: pick block 1
        remove-each [key val] next next block [this-key = key]
    ]
    block
]
; Parse?
remove-dups: func [block [any-block!] /locals this mk mk2][

	parse block [

		any [
			set this word! skip mk:

			any [to this mk2: (remove/part mk2 2)]

			:mk

		]

	]

	block

]
Sorry, read the problem first :)
btiffin
12-May-2007
[7866]
Chris; :)  I was using the remove-each for it's nativeness.

Terry;  Parse has proven itself to be very fast.  It may be faster 
than
remove-each [key item] copy n [not key = 'one] for large sets

Otherwise foreach [key item] n [if key = 'one [insert tail out reduce 
[key item]]

But I'll bet the parse is faster (assuming it works...that is one 
of my first parse's)
Chris
12-May-2007
[7867]
; Parse, this time addressing the spec:
consolidate: func [block [any-block!] /locals key val mk dup][
	parse block [
		any [
			set key word! val: skip mk:
			opt [
				to key dup: (change/only val compose [(val/1)])
				any [to key dup: (append val/1 dup/2 remove/part dup 2)]
			]
			:mk
		]
	]
	block
]
btiffin
12-May-2007
[7868]
Terry; go with Chris...He'll not lead you wrong.  :)
Chris
12-May-2007
[7869]
B: I wonder if 'parse or other loops work faster with hashes?
btiffin
12-May-2007
[7870]
Hmm.  Good point.   R3 is going to have other native! '-each' words 
right?
Terry
12-May-2007
[7871]
um .. not trying to reomve dupes.. trying to collect them all (you 
know, like pokemon cards)