web mining
[1/19] from: kevin::wise::totalpartsplus::com at: 12-Apr-2005 13:50
Is rebol good to do web mining? I have a site that does analysis for stocks
and options. I want to automate the lookup and retrieval of the analysis
and put it in a database or text file. Does anyone have a suggestion for
this project? Thanks
Kevin
[2/19] from: SunandaDH::aol::com at: 12-Apr-2005 14:42
Kevin:
> Is rebol good to do web mining? I have a site that does analysis for stocks
> and options. I want to automate the lookup and retrieval of the analysis
> and put it in a database or text file. Does anyone have a suggestion for
> this project? Thanks
Here's an example of a script that does something like that:
http://www.rebol.org/cgi-bin/cgiwrap/rebol/view-script.r?script=get-stock.r
Except, of course, it's using an interface to a Yahoo stocks site.
If your site can provide a similar sort of download file, the adaptations
needed may be trivial.
If you can't, and need to screenscrape, then 'parse or 'load/markup are your
friends.
Sunanda.
[3/19] from: kevin:wise:totalpartsplus at: 12-Apr-2005 13:59
Great thanks. I bought the Official Guide book. Is that a good source to
get me started as well? Form submission by the script will be very helpful.
It is an asp based site.
[4/19] from: greggirwin:mindspring at: 12-Apr-2005 13:01
Hi Kevin,
KW> I bought the Official Guide book. Is that a good source to
KW> get me started as well?
The hard-copy books are really out of date, unfortunately. It will
still have some good things in there, to be sure, but the resources
available on line are much more current and cover more areas (like
View).
-- Gregg
[5/19] from: Izkata::Comcast::net at: 12-Apr-2005 20:06
http://www.rebol.org/cgi-bin/cgiwrap/rebol/view-script.r?script=webcrawler.r
^ An old webcrawler, it seems to still work....
[6/19] from: antonr::lexicon::net at: 13-Apr-2005 15:25
Yes, The Official Guide is good for learning about series manipulation.
Also learn about PARSE here:
http://www.rebol.com/docs/core23/rebolcore-15.html
That should cover all you need to know for such work.
Of course ask here for any issues that pop up.
Anton.
[7/19] from: premshree:pillai:g:mail at: 13-Apr-2005 16:59
On 4/13/05, Kevin Wise <[kevin--wise--totalpartsplus--com]> wrote:
> Is rebol good to do web mining? I have a site that does analysis for stocks
Web mining typically would involve unstructured data. That typically
translates into good string manipulation functions. REBOL's parse may
serve your purposes. However, I'm not sure if it's as mature (and
complete) as regexps (Perl, Python, Ruby, etc.).
> and options. I want to automate the lookup and retrieval of the analysis
> and put it in a database or text file. Does anyone have a suggestion for
<<quoted lines omitted: 3>>
> To unsubscribe from the list, just send an email to
> lists at rebol.com with unsubscribe as the subject.
--
Premshree Pillai
http://www.livejournal.com/users/premshree/
[8/19] from: kevin:wise:totalpartsplus at: 13-Apr-2005 7:00
Great thanks for the reference.
[9/19] from: kevin:wise:totalpartsplus at: 13-Apr-2005 7:02
I agree. Thanks.
[10/19] from: kevin:wise:totalpartsplus at: 13-Apr-2005 7:06
Excellent. Thanks for the help.
[11/19] from: SunandaDH:aol at: 13-Apr-2005 10:02
Kevin:
> Great thanks. I bought the Official Guide book. Is that a good source to
> get me started as well?
The Official Guide will teach you a lot about core programming -- no
REBOL/View, but as you are doing CGI work, that doesn't matter.
It's an annoying book at times as it hides the stuff you want to know by
embedding it into a book-length example of building a character-based application.
If you have money left in the training budget, in many ways REBOL for Dummies
is a better read. Again, no REBOL/View but a much less wordy pass over the
same range of material as the Official Guide.
> Form submission by the script will be very helpful.
> It is an asp based site.
If you literally just want to submit a form, then you just have to build a
URL.
Example -- does a search on REBOL.org for any script containing the word "cgi"
print read http://www.rebol.org/cgi-bin/cgiwrap/rebol/search.r?find=cgi
then you need to screenscrape the resulting page.
Either that, or have the site provide a programmable interface.....Eg,
REBOL.org has an interface called Library Data Services. Here's the same query using
that:
do http://www.rebol.org/library/public/lds-local.r;; initialise the
interface
probe lds/send-server 'find-scripts ["cgi"]
Sunanda.
[12/19] from: andreas:bolka:gmx at: 13-Apr-2005 19:04
Wednesday, April 13, 2005, 1:29:09 PM, Premshree wrote:
> REBOL's parse may serve your purposes. However, I'm not sure if it's
> as mature (and complete) as regexps (Perl, Python, Ruby, etc.).
'parse is not only "as complete as (Perl's) regexps" but even "more
complete", to stay in your diction.
meaning that you can do everything with parse that you can do with
regexps but you can not do everything with regexps that you can do
with parse.
--
Best regards,
Andreas
[13/19] from: heg:poczta:onet:pl at: 14-Apr-2005 18:22
> http://www.rebol.org/cgi-bin/cgiwrap/rebol/view-script.r?script=webcrawler.r
> > Is rebol good to do web mining? I have a site that does analysis for
> > stocks
> > and options. I want to automate the lookup and retrieval of the analysis
> > and put it in a database or text file. Does anyone have a suggestion for
> > this project? Thanks
IMHO, Rebol is an excellent choice provided that you don't need statistical
computations to enchance your results. Simple parsing for string patterns
can be easily done with 'parse.
However, there is a problem with statistial analysis needed to implement
such features as like keyword and phrase extraction or n-gram and HMM models
simply becouse Rebol lacks proper functions and libraries (eg. chi-square tests).
Cheers,
PG
--
<< Pawe=B3 Gawro=F1ski *** [hegemon--sgh--waw--pl] >>
[14/19] from: greggirwin:mindspring at: 14-Apr-2005 16:29
Hi Paweł,
PG> However, there is a problem with statistial analysis needed to implement
PG> such features as like keyword and phrase extraction or n-gram and HMM models
PG> simply becouse Rebol lacks proper functions and libraries (eg. chi-square
PG> tests).
Not my area, but chi-square is easy enough to do. Would it be useful
to anyone, or does there need to be more to it than just the simple
chi-square calculation?
-- Gregg
[15/19] from: kevin:wise:totalpartsplus at: 15-Apr-2005 6:17
Not something I really need right now. Thanks.
[16/19] from: lmecir:mbox:vol:cz at: 15-Apr-2005 14:14
Kevin Wise napsal(a):
>Not something I really need right now. Thanks.
>-----Original Message-----
<<quoted lines omitted: 18>>
>To unsubscribe from the list, just send an email to
>lists at rebol.com with unsubscribe as the subject.
Some statistic functions are available at my site and I can offer more
implemented in Rebol, I was just too lazy to make them more visible.
-L
[17/19] from: thomas:cr:gmai:l at: 16-Apr-2005 15:20
rebol is in fact quite useful for web mining which is the reason i'm
using it for a project i'm working on called webminutes
(http://www.webminutes.org).
parse grammars are in fact quite powerful for that!
you might be interested in looking at how webminutes works.
unfortunately, the web site is only in french right now. the code
however is in english.
so what is this webminutes concept? it was originally designed for
people like me who take public transport, have access to a printer and
haven't quite found their ideal newspaper. the idea is thus to
generate automatically a newspaper which you print at work with
content retrieved from various websites.
you configure it once on the website using a php interface by writing
rebol code and parsing grammars in the html forms. you then use daily
the rebol/core scripts to generate your webminutes newspaper.
if there is a demand, i might look into making the website
multilingual... let me know.
i believe i put some english documentation a while ago on sourceforge
but it might help but is out-of-date.
tc
On 4/15/05, Ladislav Mecir <[lmecir--mbox--vol--cz]> wrote:
[18/19] from: kevin:wise:totalpartsplus at: 18-Apr-2005 7:00
Great Idea! I will give it a look. I hope web minutes has great success.
[19/19] from: hallvard:ystad:oops-as:no at: 18-Apr-2005 14:05
Hey, nice work!
This reminds me of what we did some time back in a firm
called HelpInHand. We built an application that would
fetch a web page and turn it into a WML-page (dividing it
into several pages if necessary). (See
http://web.archive.org/web/20010721134013/http://helpinhand.com/)
Mobile phone users then could surf anything.
Now, mobile phones using wml never was a great success,
and nowadays, mobile phones surf html, so the application
wasn't all that much used.
I used it a bit myself (along with the "customizer", that
you may find some links to from the above link), for
composing web pages with content from different sites (to
view in an ordinary html broswer on a desktop computer).
Rebol did a great job parsing through web pages and
determining their structure (or lack of such).
I reused som of that code when making the distorter
(http://www.oops-as.no/roy/dis), another "funny web
filter".
And some of that old code is still in use for the RIX (a
rebol search engine indexing rebol stuff:
http://www.oops-as.no/rix).
So, as we all seem to agree, rebol is great for web
mining. Of all sorts.
HY
Dixit Thomas C <[thomas--cr--gmail--com]> (Sat, 16 Apr 2005
15:20:52 +0200):
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted