World: r3wp
[Power Mezz] Discussions of the Power Mezz
older newer | first last |
florin 27-Sep-2010 [59] | Is it not enough to pass an html (string!) to load-html ? |
Gabriele 28-Sep-2010 [60x2] | yes, html string should be enough. Script: {Allows "importing" modules at the console} (none) Script: "Modules for REBOL 2" (none) >> import %mezz/load-html.r >> result: load-html "<p>This is a paragraph<p>This is another one" == [root none [] [html [...] [] [head [...] [] [title [...] []]] [body [...] [] [p [...] [] [text [...] [value "This is a paragraph... |
>> import %mezz/trees.r >> print mold-tree result [root [] [html [] [head [] [title []]] [body [] [p [] [text [value "This is a paragraph"]]] [p [] [text [value "This is another one"]]]]]] | |
florin 28-Sep-2010 [62] | Gabriele, thanks, it worked. I downloaded the build provided from above and it worked. Thank you. |
Gabriele 28-Sep-2010 [63] | modules were designed to help with building larger applications. i understand it can get in your way if you are doing something very small. You can still just DO the scripts but then you need to resolve the dependencies yourself, ie. do all the needed scripts in order. (there is a function inside tests/run.r that returns a block with all the modules a module depends on.) |
florin 1-Oct-2010 [64x5] | In the following segment: p [...] [id "myId" class "pclass"] [text [...] [value "Some text in the paragraph"]] what does [...] represent? Except "root" every html element is followed by this [...]. |
Is there a way to navigate this tree? Something similar to, let's say xslt path? | |
If I want to find a tag with a given ID, how is it done? | |
I'll try this: http://www.rebol.org/view-script.r?script=xpath.r | |
How do you eat your words on Altme? | |
Maxim 1-Oct-2010 [69] | hehehe |
Gabriele 3-Oct-2010 [70x3] | if you use MOLD-TREE, you won't see those [...]. they are back-references that are necessary internally. MOLD won't print them or it would be an infinite cycle. |
you can navigate with GET-NODE, I haven't needed something like XSLT Paths yet (I had something closer to that in Temple). root: load-html ... p: get-node root/childs/html/childs/body/childs/p ; for eg. most of the logic to do what you want is already in trees.r actually, because of the rewrite-tree function (that I don't use anymore). Anyway, a simple way would be: get-node-with-id: func [root id] [ if id = get-node root/prop/id [return root] foreach child get-node root/childs [ if get-node-with-id child id [return child] ] none ] | |
(warning: not tested) | |
florin 3-Oct-2010 [73] | Thanks. |
PatrickP61 15-Dec-2010 [74x3] | Hi Gabriele, I'm trying out your power-mezz for the first time. Do you have any other documentation on how to set it up properly? Here is what I'm doing: power-mezz-path: to-path e:/Projects/PT/Rebol/power-mezz-built-1.0.0/ print "Starting mezz/module.r" do power-module: to-url ajoin [power-mezz-path 'mezz/module.r] print "Returned mezz/module.r" load-module/from power-mezz-path module [ imports: [%mezz/html-to-text.r] ] --> e:/Projects/PT/Rebol/power-mezz-built-1.0.0/ --> Starting mezz/module.r ** Access Error: Invalid port spec: e:/Projects/PT/Rebol/power-mezz-built-1.0.0/mezz/module.r ** Near: do power-module: to-url ajoin [power-mezz-path 'mezz/module.r] Any ideas on what I did wrong? |
I've got a meeting to run to -- will check back in couple of hours :-) | |
Here now, Anyone have ideas on how to use Power-Mezz for the beginner? Also, what is the difference between power-mezz-1.0.0 and power-mezz-built-1.0.0? | |
PatrickP61 17-Dec-2010 [77] | Anyone have info on how to use Power-Mezz? |
Maxim 17-Dec-2010 [78x2] | I installed it yesterday, it worked pretty well for what I needed. |
do you need help on install or on what the actually mezz code does? | |
PatrickP61 17-Dec-2010 [80] | Hi Maxim, I'm still learning Rebol, but I'd like to see how I can use Power-Mezz. How do you install it? |
Gabriele 18-Dec-2010 [81] | Patrick, e:/Projects/... is not a valid rebol file path. try with something like: power-mezz-path: %/E/Projects/PT/Rebol/power-mezz-built-1.0.0/ do power-mezz-path/mezz/module.r load-modules/from power-mezz-path ; etc. |
PatrickP61 18-Dec-2010 [82x2] | Oops -- Didn't see the malformed file path!!! Where can I find examples of how to use Power Mezz? |
The particular script I am writing is called GET ADDRESS. This script takes a CSV file called contacts which has first and last name, city and state of all of my friends that I'd like to get addresses for Christmas cards, but have forgotten or misplaced. So far, the script takes each entry and sends it to SUPERPAGES.com where the HTML sent back contains the information. Right now, I'm simply saving the HTML as a file for each entry in my CSV. What I would like to do is somehow parse the HTML from it and extract out the address lines, zip code, phone number etc. But I admit that parsing through HTML is daunting to me. So after looking around on the internet, I discovered HTML-TO-TEXT in your Power Mezz. That is where I am now, trying to figure it out and see how it works. I've read some of your documentation, but I admit, I am still in the dark as to how it works -- at least for my application. Any advice you have is welcome. Thanks in advance. | |
Kaj 18-Dec-2010 [84x2] | I don't think that's a good function to use for that. It seems to me it's meant for making readable text, not processable text |
Use "5.10 Parse HTML text into a tree" instead | |
PatrickP61 18-Dec-2010 [86] | Thank you Kaj. I'll check that out! |
Kaj 18-Dec-2010 [87] | There's no usage documentation, though, only code documentation |
Kaj 19-Dec-2010 [88] | There's also "7.4 [X][HT]ML Parser" so it's not clear to me which one is preferred |
Oldes 19-Dec-2010 [89x2] | To be honest, if you just like to parse some HTML page to get some parts of it, you don't need to use Power Mezz at all.. I'm using Rebol more than 10 years and still consider PM as a too complex staff for me. If you are REBOL newbie, better to start reading REBOL doc. In your case something about parsing. |
There is a lot of pages about 'parse' on net... for example this one: http://www.rebol.com/docs/core23/rebolcore-15.html | |
PatrickP61 19-Dec-2010 [91] | Thanks Oldes -- Will check into that |
Kaj 19-Dec-2010 [92x2] | Yeah, there's some tipping point in parsing web pages and such. When the pages are consistent and the data you want to scrape is simple, I use PARSE, too, or even just string processing |
But when the HTML and the data become more complex, there are so many exceptions you have to program, that a real HTML parser becomes more convenient | |
Henrik 19-Dec-2010 [94x2] | would a real HTML parser convert the data to a REBOL object? |
hmm... not sure that is possible. | |
Kaj 19-Dec-2010 [96] | I don't know what the two PowerMezz ones do, but I figure they just produce blocks. It's static data, so no need for bindings |
Henrik 19-Dec-2010 [97x2] | a good one would be to convert R3 rich text to HTML and vice versa. |
but that is of course not related to parsing... | |
Kaj 19-Dec-2010 [99] | Yeah, I think someone will have to do that :-) |
Anton 20-Dec-2010 [100x2] | Kaj, I think it's the other way around! I found when the HTML and the data become more complex, then a simpler "hack" parse job is more likely to survive changes to the source. This happened to me several times with a weather forecast and television guide scraper etc. that I made (and remade, and remade..). |
(back when I used to care about television, that is) | |
Kaj 20-Dec-2010 [102x2] | That's true when you have to write the parser yourself, but I'm assuming the PowerMezz parsers handle all of HTML :-) |
Also, it's probably not as much within reach for novice programmers | |
Oldes 20-Dec-2010 [104x2] | I was using REBOL for datamining a few years ago and I can say it was easier to do string based parsing to get what I've needed. |
It's always easier to do: parse html [thru "<title>" copy title to "<"] than parse complete html to something like a block structure and dig title in it. | |
Kaj 20-Dec-2010 [106] | For you, but my business partner wants to scrape web pages, and I don't think he would understand how to do it with parse |
Oldes 20-Dec-2010 [107] | I believe that if he would not understand simple parse, than he would not understand PowerMezz as well, but maybe I'm wrong. Also it very depends what page do you parse. |
Kaj 20-Dec-2010 [108] | Scraping a title is the simplest example. In reality, you get all sorts of tags with extra attributes that you need to skip, and values with extraneous newlines. He wouldn't understand how to normalise that, so his data would be left as a mess |
older newer | first last |