Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

web mining

 [1/19] from: kevin::wise::totalpartsplus::com at: 12-Apr-2005 13:50


Is rebol good to do web mining? I have a site that does analysis for stocks and options. I want to automate the lookup and retrieval of the analysis and put it in a database or text file. Does anyone have a suggestion for this project? Thanks Kevin

 [2/19] from: SunandaDH::aol::com at: 12-Apr-2005 14:42


Kevin:
> Is rebol good to do web mining? I have a site that does analysis for stocks > and options. I want to automate the lookup and retrieval of the analysis > and put it in a database or text file. Does anyone have a suggestion for > this project? Thanks
Here's an example of a script that does something like that: http://www.rebol.org/cgi-bin/cgiwrap/rebol/view-script.r?script=get-stock.r Except, of course, it's using an interface to a Yahoo stocks site. If your site can provide a similar sort of download file, the adaptations needed may be trivial. If you can't, and need to screenscrape, then 'parse or 'load/markup are your friends. Sunanda.

 [3/19] from: kevin:wise:totalpartsplus at: 12-Apr-2005 13:59


Great thanks. I bought the Official Guide book. Is that a good source to get me started as well? Form submission by the script will be very helpful. It is an asp based site.

 [4/19] from: greggirwin:mindspring at: 12-Apr-2005 13:01


Hi Kevin, KW> I bought the Official Guide book. Is that a good source to KW> get me started as well? The hard-copy books are really out of date, unfortunately. It will still have some good things in there, to be sure, but the resources available on line are much more current and cover more areas (like View). -- Gregg

 [5/19] from: Izkata::Comcast::net at: 12-Apr-2005 20:06


http://www.rebol.org/cgi-bin/cgiwrap/rebol/view-script.r?script=webcrawler.r ^ An old webcrawler, it seems to still work....

 [6/19] from: antonr::lexicon::net at: 13-Apr-2005 15:25


Yes, The Official Guide is good for learning about series manipulation. Also learn about PARSE here: http://www.rebol.com/docs/core23/rebolcore-15.html That should cover all you need to know for such work. Of course ask here for any issues that pop up. Anton.

 [7/19] from: premshree:pillai:gmai:l at: 13-Apr-2005 16:59


On 4/13/05, Kevin Wise <[kevin--wise--totalpartsplus--com]> wrote:
> Is rebol good to do web mining? I have a site that does analysis for stocks
Web mining typically would involve unstructured data. That typically translates into good string manipulation functions. REBOL's parse may serve your purposes. However, I'm not sure if it's as mature (and complete) as regexps (Perl, Python, Ruby, etc.).
> and options. I want to automate the lookup and retrieval of the analysis > and put it in a database or text file. Does anyone have a suggestion for
<<quoted lines omitted: 3>>
> To unsubscribe from the list, just send an email to > lists at rebol.com with unsubscribe as the subject.
-- Premshree Pillai http://www.livejournal.com/users/premshree/

 [8/19] from: kevin:wise:totalpartsplus at: 13-Apr-2005 7:00


Great thanks for the reference.

 [9/19] from: kevin:wise:totalpartsplus at: 13-Apr-2005 7:02


I agree. Thanks.

 [10/19] from: kevin:wise:totalpartsplus at: 13-Apr-2005 7:06


Excellent. Thanks for the help.

 [11/19] from: SunandaDH:aol at: 13-Apr-2005 10:02


Kevin:
> Great thanks. I bought the Official Guide book. Is that a good source to > get me started as well?
The Official Guide will teach you a lot about core programming -- no REBOL/View, but as you are doing CGI work, that doesn't matter. It's an annoying book at times as it hides the stuff you want to know by embedding it into a book-length example of building a character-based application. If you have money left in the training budget, in many ways REBOL for Dummies is a better read. Again, no REBOL/View but a much less wordy pass over the same range of material as the Official Guide.
> Form submission by the script will be very helpful. > It is an asp based site.
If you literally just want to submit a form, then you just have to build a URL. Example -- does a search on REBOL.org for any script containing the word "cgi" print read http://www.rebol.org/cgi-bin/cgiwrap/rebol/search.r?find=cgi then you need to screenscrape the resulting page. Either that, or have the site provide a programmable interface.....Eg, REBOL.org has an interface called Library Data Services. Here's the same query using that: do http://www.rebol.org/library/public/lds-local.r;; initialise the interface probe lds/send-server 'find-scripts ["cgi"] Sunanda.

 [12/19] from: andreas:bolka:gmx at: 13-Apr-2005 19:04


Wednesday, April 13, 2005, 1:29:09 PM, Premshree wrote:
> REBOL's parse may serve your purposes. However, I'm not sure if it's > as mature (and complete) as regexps (Perl, Python, Ruby, etc.).
'parse is not only "as complete as (Perl's) regexps" but even "more complete", to stay in your diction. meaning that you can do everything with parse that you can do with regexps but you can not do everything with regexps that you can do with parse. -- Best regards, Andreas

 [13/19] from: heg:poczta:onet:pl at: 14-Apr-2005 18:22


> http://www.rebol.org/cgi-bin/cgiwrap/rebol/view-script.r?script=webcrawler.r > > Is rebol good to do web mining? I have a site that does analysis for > > stocks > > and options. I want to automate the lookup and retrieval of the analysis > > and put it in a database or text file. Does anyone have a suggestion for > > this project? Thanks
IMHO, Rebol is an excellent choice provided that you don't need statistical computations to enchance your results. Simple parsing for string patterns can be easily done with 'parse. However, there is a problem with statistial analysis needed to implement such features as like keyword and phrase extraction or n-gram and HMM models simply becouse Rebol lacks proper functions and libraries (eg. chi-square tests). Cheers, PG -- << Pawe=B3 Gawro=F1ski *** [hegemon--sgh--waw--pl] >>

 [14/19] from: greggirwin:mindspring at: 14-Apr-2005 16:29


Hi Paweł, PG> However, there is a problem with statistial analysis needed to implement PG> such features as like keyword and phrase extraction or n-gram and HMM models PG> simply becouse Rebol lacks proper functions and libraries (eg. chi-square PG> tests). Not my area, but chi-square is easy enough to do. Would it be useful to anyone, or does there need to be more to it than just the simple chi-square calculation? -- Gregg

 [15/19] from: kevin:wise:totalpartsplus at: 15-Apr-2005 6:17


Not something I really need right now. Thanks.

 [16/19] from: lmecir:mbox:vol:cz at: 15-Apr-2005 14:14


Kevin Wise napsal(a):
>Not something I really need right now. Thanks. >-----Original Message-----
<<quoted lines omitted: 18>>
>To unsubscribe from the list, just send an email to >lists at rebol.com with unsubscribe as the subject.
Some statistic functions are available at my site and I can offer more implemented in Rebol, I was just too lazy to make them more visible. -L

 [17/19] from: thomas:cr::gmail at: 16-Apr-2005 15:20


rebol is in fact quite useful for web mining which is the reason i'm using it for a project i'm working on called webminutes (http://www.webminutes.org). parse grammars are in fact quite powerful for that! you might be interested in looking at how webminutes works. unfortunately, the web site is only in french right now. the code however is in english. so what is this webminutes concept? it was originally designed for people like me who take public transport, have access to a printer and haven't quite found their ideal newspaper. the idea is thus to generate automatically a newspaper which you print at work with content retrieved from various websites. you configure it once on the website using a php interface by writing rebol code and parsing grammars in the html forms. you then use daily the rebol/core scripts to generate your webminutes newspaper. if there is a demand, i might look into making the website multilingual... let me know. i believe i put some english documentation a while ago on sourceforge but it might help but is out-of-date. tc On 4/15/05, Ladislav Mecir <[lmecir--mbox--vol--cz]> wrote:

 [18/19] from: kevin:wise:totalpartsplus at: 18-Apr-2005 7:00


Great Idea! I will give it a look. I hope web minutes has great success.

 [19/19] from: hallvard:ystad:oops-as:no at: 18-Apr-2005 14:05


Hey, nice work! This reminds me of what we did some time back in a firm called HelpInHand. We built an application that would fetch a web page and turn it into a WML-page (dividing it into several pages if necessary). (See http://web.archive.org/web/20010721134013/http://helpinhand.com/) Mobile phone users then could surf anything. Now, mobile phones using wml never was a great success, and nowadays, mobile phones surf html, so the application wasn't all that much used. I used it a bit myself (along with the "customizer", that you may find some links to from the above link), for composing web pages with content from different sites (to view in an ordinary html broswer on a desktop computer). Rebol did a great job parsing through web pages and determining their structure (or lack of such). I reused som of that code when making the distorter (http://www.oops-as.no/roy/dis), another "funny web filter". And some of that old code is still in use for the RIX (a rebol search engine indexing rebol stuff: http://www.oops-as.no/rix). So, as we all seem to agree, rebol is great for web mining. Of all sorts. HY Dixit Thomas C <[thomas--cr--gmail--com]> (Sat, 16 Apr 2005 15:20:52 +0200):

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted