AltME groups: search
Help · search scripts · search articles · search mailing listresults summary
world | hits |
r4wp | 5907 |
r3wp | 58701 |
total: | 64608 |
results window for this page: [start: 30001 end: 30100]
world-name: r3wp
Group: Parse ... Discussion of PARSE dialect [web-public] | ||
btiffin: 16-Mar-2008 | context [ ] is just a shortcut for make object! [ ] and it's great. The more we hide in objects the easier it will be share, or at the least, easier to use code from a variety of developer sources. Programming in the Many is important in our context as there are relativily few of us in the "many" - so far. So when even our small stuff is shareable we all win. | |
BrianH: 17-Mar-2008 | Does a bind/copy on its code block every time it is used. | |
Oldes: 17-Mar-2008 | I should probably not to use the code evaluation so much directly in the parse rule block and rather call a function if I need a lot of temp variables to process the action. | |
Henrik: 28-Apr-2008 | (note this block can only be made without a space at the end in rebol 2.7) | |
Henrik: 11-May-2008 | if I have a rule-block that does not exist in the same context as the main parse block, is there a simple way to rebind it without composing it into the main parse block? my current solution is to bind it to a temp block and use the temp block as a rule in the main parse block, which is less than optimal, I think. | |
Chris: 11-May-2008 | Assuming you want to assign values to function locals from the external parse rules, you can a) bind as you are doing, b) create a larger context for the function encompassing your rules or c) compile the parse rule, either on creation of the function or for each instance. a) rule: [set tag tag!] test: func [data /local tag][bind rule 'data parse data rule tag] b) test: use [tag][ rule: [set tag tag!] func [data][parse data rule tag] ] c) rule: [set tag tag!] test: func [data /local tag] compose/only [parse data (rule) tag] Also, note that when you bind, it alters the original block -- no need to reassign to a new word. | |
Chris: 11-May-2008 | When it comes to complex rules, I opt for b). At that, I'd go for context [] where there are a lot of associated words... | |
Henrik: 12-May-2008 | the function is recursive, so that may put a twist on b). I forgot that detail with BIND on a) so thanks for that. c) seems to work best. | |
amacleod: 15-May-2008 | I'm trying to parse a tex document that I've formated into lines of text with blank lines between simialr to make doc format | |
amacleod: 15-May-2008 | Most lines begin with a section number (2.), or a sub-section (2.3) or a sub-sub-section (2.3.5). | |
BrianH: 16-May-2008 | If the section numbers always end with a period, you can do this: some [some digits "."] If the section numbers don't end with period you can do this: some digits any ["." some digits] | |
BrianH: 16-May-2008 | Look up recursive descent parsing, and take a not of the difference between left recursion and right recursion. | |
Chris: 16-May-2008 | Don't want to add too much, but with parse you can really build up a vocubulary based on the patterns you know: section: [integer! ["." | 1 4 ["." integer!]]] ; -- or whatever rule covers all permutations chars-sp: charset " " space: [some chars-sp] parse/all [copy sn section space [to newline | to end]] Vocabularies are easy to wrap in their own context too. Note also that [integer!] is a shorthand for [some digit] -- very useful : ) | |
amacleod: 16-May-2008 | Oldes, thanks for your suggestion. It works when I do a simple one line rule as you suggested but when I try to use multiple rules it fails. Example of what I'm trying to do: Example of the text document: | |
amacleod: 16-May-2008 | 3. CONSTRUCTION OF PORTABLE ALUMINUM LADDERS 3.1 Aluminum ladders are divided into two basic types of construction, viz:, solid beam and truss. 3.1.1 Solid Beam Aluminum Construction- This type of ladder has a solid side rail construction with aluminum rungs connecting with the side rails at fourteen inch intervals. The connection is generally either by a welded joint between rung and side rails, or by an expansion plug pinching the rung tightly to the side rails and internal backup plates. (Figure 2 A) 3.1.2 Aluminum Truss Construction- In the aluminum truss design, the top and bottom rails are connected to rung assemblies or rung blocks by rivets. The rungs are either welded or expansion plugged to the rung plate assemblies, which are supported by the top and bottom rails. (Figure 2B) 3.2 The base of the portable aluminum ladder is provided with either steel spikes or swiveling rubber safety shoes and aluminum spikes. For ladders equipped with the swiveling device, the rubber pads should be utilized when the ladder is to be raised and used on hard surfaces. (Figure 2A, 2B) 3. CONSTRUCTION OF PORTABLE ALUMINUM LADDERS | |
BrianH: 16-May-2008 | Any reason that the headings with one number have a trailing period and the rest don't? | |
amacleod: 16-May-2008 | BrianH, sorry BRian the text above is just from a random and simpler section of the document. if I copied the from the begining the first line would not have a number at all. | |
BrianH: 16-May-2008 | But I made a mistake. | |
amacleod: 16-May-2008 | This will give me a hit on any section or sub or sub sub? I may want to do something different depending on each. does this allow me to ? | |
BrianH: 16-May-2008 | If you are making your decisions on a per-line basis, you might consider doing a read/lines and parsing each line individually, maintaining your own state to tell you where you are in the greater document. It's the only way to parse documents greater than memory in size. | |
Anton: 17-May-2008 | BrianH, eh? read/lines would still try to read the whole document wouldn't it ? Or are you just suggesting that as a way which is then easily modified to allow larger than memory documents? | |
Chris: 17-May-2008 | That would suck -- I use it. Seems like a common enough scenario.... | |
BrianH: 19-May-2008 | I mean you can do open/lines/direct and stream - then you would only need the memory for one line and a state machine. | |
Josh: 3-Jun-2008 | I'm finally digging into parse now, but I have a question about HTML. Big idea: pulling the data out of an HTML table (made in Word--ugh!). Where I am stuck: Is there a way to create a rule for opening tags such as <tr> that include a lot of formatting: i.e. <tr style="mso........> ? I want to pull the info inbetween the opening and closing tags. | |
Josh: 3-Jun-2008 | I came up with a rule: [some [thru "<td" thru ">" y: to "</td>" (a: remove-each tag load/markup y [tag? tag])]] but it seems to not be as efficient as it could be. | |
Geomol: 3-Jun-2008 | Josh, if you do a load/markup on the whole string, you get a block with tags and strings. You can then pick the string from the block, maybe doing TRIM on them to sort out newlines and spaces. Like: blk: load/markup your-data foreach f blk [if all [string? f "" <> trim f] [print f]] | |
Chris: 3-Jun-2008 | I've been toying with this to obtain a very parsable "dialect" -- my goal being to scrape live game updates from a certain sports web site (for personal use, natch). It's reliant on 'parse-xml though, so ymmv.... do http://www.ross-gill.com/r/scrape.r probe load-xml some-xml | |
Chris: 3-Jun-2008 | Result is a little like: from -- <tag attr="attribute">Content</tag> to -- <tag> /attr attribute "Content" | |
Anton: 4-Jun-2008 | Josh, using the REMOVE-EACH very often is what makes your parse slow. A remove operation in the middle of a large string is slow, and you are doing many removes. That's why the others suggested using copy. | |
Josh: 6-Jun-2008 | Thanks for the input. I will have to play around with those later as I am trying to get this finished up and then I can go back and clean up the code. The data is minimal enough for the script to finish in under a second anyway. Parse is pretty sweet. Makes this much neater than the alternative | |
amacleod: 30-Jun-2008 | I'm trying to copy some text from the position found iwhile parsing a document. I'm using something like: rule: [some digit copy text to newline] (--where "digit has ben defined as all digits 0 to 9) This copies eveerything after the digit. How would I copy the digit itself as well? | |
amacleod: 30-Jun-2008 | Is there a difference between using "to" and "thru" | |
amacleod: 30-Jun-2008 | No I have a text document with section numbers in front: 2. Hello 2.1 Hello Again 2.1.1 Hello already 3. Goodbye I want the section number inclued in hte copy | |
amacleod: 30-Jun-2008 | Well it gets a little more complicated. some parts of the docment will be multilined. | |
amacleod: 30-Jun-2008 | I thought it would be a simple thing that I was missing. I may need to re-think the formatting of the document. | |
[unknown: 5]: 30-Jun-2008 | Or do you mean a multiline might looks something like this: 2.1 Hello Goodbye Where the second line doesn't have the preceeding number? | |
[unknown: 5]: 30-Jun-2008 | Ahhh yes that gets a bit more complicated. | |
amacleod: 30-Jun-2008 | Let me briefly explain where I'm going to see if you think its workable or perhaps a there is a better solution | |
amacleod: 30-Jun-2008 | I trying to put a set of Fire department related materials online. THey are now in pdf | |
amacleod: 30-Jun-2008 | I want to hold each section in a seperate database record | |
[unknown: 5]: 30-Jun-2008 | Well, TRETBASE 1.0 is the only finished product right now. So the only available TRETBASE app is 1.0 which is really not a multi-user solution. | |
amacleod: 30-Jun-2008 | I'm using mysql for the online component but I need a local storage method too for offline use | |
amacleod: 30-Jun-2008 | What I would need is a simple method to sync them | |
amacleod: 18-Jul-2008 | Is there a difference between a "space" and a "tab"? Can you parse for tab and not sapce? | |
Graham: 18-Jul-2008 | I would think you would have to parse/all .. and a space is #" " and a tab is #"^-" | |
btiffin: 21-Aug-2008 | A long time ago, I offered to try a lecture. Don't feel worthy. So I thought I'd throw out a few (mis)understandings and have them corrected to build up a level of comfort that I wouldn't be leading a group of high potential rebols down a garden path. So; one of the critical mistakes in PARSE can be remembered as "so many", or a butchery of some [ any [ , so many. some asks for a truth among alternatives and any say's "yep, got zero of the thing I was looking for", but doesn't consume anything. SOME says, great and then asks for a truth. ANY say "yep, got zero of the thing I was looking for", and still doesn't move, ready to answer yes to every question SOME can ask. An infinite PARSE loop. Aside: to protect against infinite loops always start a fresh PARSE block with [() the "immediate block" of the paren! will allow for a keyboard escape, and not the more drastic Ctrl-C. So, I'd like to ask the audience; what other PARSE command sequences can cause infinite loops? end? and is it only "end", "to end" but "thru end" will alleviate that one? end end end end being true? >> parse "" [some [() end end end]] (escape) >> parse "" [some [() thru end end end]] == false >> parse "" [some [() to end end end]] (escape) >> Ok, but thru end is false. Is there an idiom to avoid looping on end, but still being true on the first hit? Other trip ups? | |
Henrik: 28-Sep-2008 | parse [a] ['a] ;== true parse ['a] reduce [to-lit-word 'a] ; == false (why?) | |
Henrik: 28-Sep-2008 | forget it. I was confused for a second, but is there a way to parse that 'a correctly? The same goes for get-word! and set-word!. | |
Henrik: 28-Sep-2008 | I should clarify: I would like to parse a specific get-word!, lit-word! or set-word! as opposed to parsing on the type and then checking the value in some kind of action afterwards: parse ['a 'b 'c] ['a 'b 'c] ;== true (I know this is the wrong parser block, but it's something to that effect I would like to see) | |
Anton: 28-Sep-2008 | If I remember correctly, this was a problem of parse (and may still be)... | |
Anton: 28-Sep-2008 | You may have to use a workaround. | |
Geomol: 28-Sep-2008 | If you can go with a reduced block, this can work: parse reduce ['a 'b 'c] ['a 'b 'c] | |
Henrik: 28-Sep-2008 | what if there are set-words in it? I wanted to parse the content of an object, which can be a mixture of word types. | |
BrianH: 28-Sep-2008 | In general that restriction of parse is part of an overall pattern in REBOL of encouraging you to use lit-words as lit-words rather than some other kind of datatype. Lit-words in REBOL are generally used to express literal expressions of words, rather than being used as a distinct datatype. In general you convert them to words before use. | |
BrianH: 28-Sep-2008 | It's usually a bad idea to use lit-words as keywords - they make better values. If you are comparing to a particular lit-word value, that is using it as a keyword. If any lit-word value would do and their meaning is semantic rather than syntactic, that works. In general, PARSE is better for determining syntactic stuff - use the DO dialect code in the parens for semantic stuff. | |
BrianH: 28-Sep-2008 | Not that I don't want a LIT or LITERAL directive in PARSE that would turn off the PARSE-dialect treatment of the next value in the spec. | |
Anton: 10-Oct-2008 | term: [word! | into term] parse [a b [c]] [some term] ;== true parse [a b [c d]] [some term] ;== false | |
Anton: 10-Oct-2008 | I'm a bit confused by that. I need to parse recursively. | |
Anton: 10-Oct-2008 | terms: [some [word! | into terms]] parse [a b [c d]] terms ;== true | |
Terry: 12-Oct-2008 | blk: [aa "test" bb "two" cc "#block"] rules: [some [cc set cc string! ]] parse blk rules no go? I have a more complicated rule set that chokes on the "#block" string.. does it think it's an issue! ? | |
sqlab: 30-Oct-2008 | Yes, this is an old bug. It does not work, if " is next to your delimiter. Insert a blank, and it works again. | |
Graham: 3-Nov-2008 | This is a result of using parse-xml and some cleanup [document [soapenv:Envelope [soapenv:Body [ns1:getSpellingSuggestionsResponse [getSpellingSuggestionsReturn [getSpellingSuggestionsReturn "Penicillin G"] [getSpellingSuggestionsReturn "Penicillin V"] [getSpellingSuggestionsReturn "Penicillamine"] [getSpellingSuggestionsReturn "Polycillin"] ] ] ] ] ] | |
Graham: 3-Nov-2008 | drugs: [set drugblock into [ 'getSpellingSuggestionsReturn set drugname string! ( print drugname) ]] parse a [ 'document set envelope into [ 'soapEnv:envelope set body into [ 'soapEnv:body set response into [ 'ns1:GetSpellingsuggestionsresponse set returns into ['getspellingsuggestionsreturn some drugs to end ]]]]] works but is very long winded | |
Gregg: 4-Nov-2008 | It's not so bad Graham. And whether you can shorten things depends on how exact you need to be. rule: [ 'getspellingsuggestionsreturn some drugs | url! into rule ] parse a ['document into rule] | |
PeterWood: 4-Nov-2008 | This is a bit shorter but recursive: pr: [any [ [set b block! (parse b pr)] | ['getSpellingSuggestionsReturn set s string! ( insert drug-names s ) | skip ] ] ] | |
Graham: 4-Nov-2008 | the output I presented looks so close to being a rebol object .. and then I can use paths to access the data | |
PeterWood: 4-Nov-2008 | Sorry about the formatting ... can't cut and paste in AltME on a Mac without reformatting. | |
PeterWood: 4-Nov-2008 | If it's not fast enough you can speed it up by adding a rule to consume the unwanted parts. | |
PeterWood: 4-Nov-2008 | gxs is a string of your xml listed above. | |
BrianH: 5-Nov-2008 | So far we have been accepting proposals in these categories: - Recognition: LIT, NOT, OF, TO and THRU extensions - Modification: CHANGE, INSERT, REMOVE - Structural and control flow: FAIL (may not be the final name), USE, CHECK (still debate here), REVERSE There is still some debate even within these proposals (name of FAIL for example) and some of them might not make it. Some of the old PARSE REPs have been definitively rejected or changed, and some are still under debate and won't make it in without a lot more thought. | |
BrianH: 5-Nov-2008 | These changes to PARSE are another example of changes to the R3 core happening as a side effect of the new GUI work :) | |
BrianH: 5-Nov-2008 | Yup. We've been working on the Parse Project article a lot today. The last 2 things from the REP that might make it are the THROW and INTO-STRING proposals, though both will need some changes first. The rest are covered or rejected. | |
BrianH: 5-Nov-2008 | Peter Wood's RETURN proposal is really interesting. I have been thinking about how to make a variant of it work. | |
Anton: 5-Nov-2008 | I'd like to understand Peter Wood's START command a bit better. It's not clear to me from the example why it's needed. (or even how the example works..) | |
Anton: 5-Nov-2008 | Peter's example, from the blog: parse [a b c d] [ any [ start (acc: 0) | set inc integer! (acc: acc + inc) | end ] ] | |
BrianH: 5-Nov-2008 | Here's a working version of that example: parse [1 2 3 4] [ (acc: 0) any [set inc integer! (acc: acc + inc)] ] | |
BrianH: 5-Nov-2008 | Perhaps he thought a paren could only follow a rule. | |
BrianH: 5-Nov-2008 | I like the RETURN proposal as this: RETURN rule Match the rule and return a copy of the value from the PARSE function. Like COPY then BREAK, but without the temporary variable. | |
Anton: 5-Nov-2008 | I vaguely remember suggesting PARSE dialect be extended into parens with a few commands. Parens are executed as normal rebol dialect (not parse dialected in any way). If I remember correctly, it was thought better to keep the parens 'pure' rebol. If that is to be maintained, then I think Peter's RETURN command ought to be morphed into a parse command, as you suggest above, Brian. | |
Anton: 5-Nov-2008 | -- ie. that's a good idea. | |
BrianH: 5-Nov-2008 | More importantly it will override the meaning of the RETURN function at a point where you would expect it to work. | |
PeterWood: 5-Nov-2008 | Clearly my proposal for START is based on my ignorance and inability to search the documents properly :-) It wouldn't hurt as a form of slef-documenting code, though. | |
BrianH: 5-Nov-2008 | Actually, I think it would hurt (no offence). The word start is a common name for parse rules and every keyword we add can't be used as a parse rule name. Something to consider when making proposals. | |
Anton: 5-Nov-2008 | Perhaps, Peter, you could post a withdrawl for START on the blog. | |
Chris: 5-Nov-2008 | Other side of the coin, if 'end is a keyword, 'start is an intuitive companion. | |
BrianH: 5-Nov-2008 | HEAD would be a better name for a directive to reset the position to the beginning of the data. That behavior would be more consistent with the series accessors :) | |
BrianH: 5-Nov-2008 | It was an initialization proposal. Nonetheless, your HEAD? proposal sounds interesting. What problem are you solving that would need such a directive? | |
Pekr: 5-Nov-2008 | Anton - but there is some point in time we should start to make rebol bigger by adding unnecessary things, or we will never reach 100MB executable size and outer world migt not consider us being a rellevant alternative :-) | |
Anton: 5-Nov-2008 | One NOP keyword at a time :) | |
BrianH: 5-Nov-2008 | In particular, it would return a copy, like the COPY directive, not the SET directive. | |
Chris: 5-Nov-2008 | Like! Would that work for values? -- [to "<" copy a thru ">" "<" return a] ; - returns a if there is a < next? | |
Anton: 5-Nov-2008 | What would you do when you need to process the data a bit first ? eg. You return tags from different places in a rule, and to distinguish them you need to also return something extra, by prepending a code to the beginning, for example. | |
BrianH: 5-Nov-2008 | Carl was kinda weirded out by the modifying operations, but I pointed out that people do this anyway and get it wrong a lot. | |
BrianH: 5-Nov-2008 | Everything in that Parse Proposals page has already been discussed with Carl and could go in, barring insurmountable problems with implementation. I stopped putting stuff in when he stopped working for the day. There will likely be a couple going in tonight, but Carl is actively involved in this process. | |
BrianH: 5-Nov-2008 | The main thing that Carl is concerned about now is that some of the proposals make use of the value calculated in a paren on occasion. I don't know why this would be a problem, but I'm sure it will be worked out or around. | |
Chris: 5-Nov-2008 | Using 'remove -- a) removing a bracket only at the end of a string (as per Graham's example): parse "[this]" [remove "[" to "]" remove ["]" end]] b) where you go down a false path: parse "abcdef123" [remove "abc" "123" | remove "abcd" "ef123"] | |
Chris: 5-Nov-2008 | Would a) work? Would b) reset the string as the first rule didn't match? | |
BrianH: 5-Nov-2008 | a) would work. b) would not likely reset the string, just like code blocks don't undo. | |
BrianH: 6-Nov-2008 | You might be able to do b) like this: parse "abcdef123" [use [a] [remove ["abc" a: "123" :a] | remove ["abcd" a: "ef123" :a] to end]] or like this: parse "abcdef123" [use [a] [remove ["abc" a: "123" :a | "abcd" a: "ef123" :a] to end]] | |
Chris: 6-Nov-2008 | How about this? parse "abc" ["a" to end reverse "bc"] |
30001 / 64608 | 1 | 2 | 3 | 4 | 5 | ... | 299 | 300 | [301] | 302 | 303 | ... | 643 | 644 | 645 | 646 | 647 |