AltME groups: search
Help · search scripts · search articles · search mailing listresults summary
world | hits |
r4wp | 5907 |
r3wp | 58701 |
total: | 64608 |
results window for this page: [start: 17201 end: 17300]
world-name: r3wp
Group: Parse ... Discussion of PARSE dialect [web-public] | ||
Graham: 1-Jul-2006 | there is no whitespace inside a macroname | |
Tomc: 1-Jul-2006 | so there is a seperate extendable file with the macro=expansion | |
Graham: 1-Jul-2006 | actually the file will be saved in a database and loaded when the program starts | |
Tomc: 1-Jul-2006 | as a model orginism | |
Tomc: 1-Jul-2006 | that the macro-expansoion fioe needs to self check for incidental occurances of a "macro" in an "expansion" and protect against | |
Tomc: 1-Jul-2006 | I wouls still sort the macros by longest to shortest so cant glob on to part of a macro .. | |
Graham: 1-Jul-2006 | so, basically you created a single parse rule from the macro list and then parsed the text in one go. | |
BrianH: 1-Jul-2006 | Tomc, that is a good point - I'll fix it. Graham, that's right. | |
Graham: 1-Jul-2006 | We need a masterclass in parse .... | |
Graham: 1-Jul-2006 | it's a local so memory will be released anyway .. | |
BrianH: 1-Jul-2006 | It's a speed optimization. This might change with REBOL 3. | |
Graham: 1-Jul-2006 | memory use is a large with Rebol. | |
BrianH: 1-Jul-2006 | Most of the excessive memory overhead of REBOL is just sloppy (no offense Carl). It's not much of a problem for most, but I have run into memory limits when running on embedded or handheld platforms, or running hundreds of instances on servers. | |
Tomc: 1-Jul-2006 | but that can just be a static rule outside of compose | |
Graham: 1-Jul-2006 | the above macro is supposed to expand into a multiline statement. | |
BrianH: 1-Jul-2006 | Then it is a good thing that the ^/ is in the expansion. | |
Graham: 1-Jul-2006 | No, as it ends up on screen showing ^/ instead of a visual newline. | |
BrianH: 1-Jul-2006 | Are they writing ^/ in the expansion text source data to indicate a newline? | |
Graham: 1-Jul-2006 | They're using ^/ as the macros are being read in from a text file using read/lines | |
Henrik: 9-Jul-2006 | how "local" are variables that are set during a parse? I was looking at Geomol's postscript.r and looked at: coords: [err: (pos: none) [set pos pair! | set x number! set y number!] ( either pos [ append output compose [(pos/x) " " (pos/y) " lineto^/"] ][ append output compose [(x) " " (y) " lineto^/"] ] ) ] | |
Anton: 9-Jul-2006 | Mmm let me make a few tests. | |
Henrik: 9-Jul-2006 | actually, there is a difference between my code and this, which may be causing it: I need to loop the block with 'any. I suspect the contents is lost after the first run. | |
Oldes: 9-Jul-2006 | if the parse is inside function and you set pos in the function as a local - it will be local | |
Henrik: 9-Jul-2006 | I want to assign a variable to each element so I can process them later | |
Anton: 9-Jul-2006 | add a block to control the evaluation. | |
Anton: 9-Jul-2006 | I'm trying to figure out a simple example to show why. | |
Henrik: 9-Jul-2006 | I wonder what the difference is? If it's only for controlling how global a variable is, it seems a little backwards to me | |
Henrik: 9-Jul-2006 | the brackets would make it a "real" rule, wouldn't it? it would be possible to replace the rule with a variable and have the rule block placed elsewhere in your code | |
Anton: 9-Jul-2006 | You have to think of a rule like this: [ integer! | ] as equivalent to [ integer! | none ] or opt [ integer! ] | |
Anton: 9-Jul-2006 | I think he might be using 'test-image in place of a real image! for this example ? | |
Henrik: 9-Jul-2006 | It's also a good thing with these discussions. I've never really grown 100% comfortable with parse. | |
DideC: 10-Jul-2006 | About Layout : parse handles only the layout words (origin, space, at...), see source layout. The face description is handled by a loop, not by parse. See system/view/vid/grow-facets | |
Pekr: 19-Jul-2006 | Hi, need a bit of help .... | |
Pekr: 19-Jul-2006 | I now can create simply a func, which will accept mark name, and do some code-block accordingly - sql query, simple replace of value, whatever (well, it will not work for cases like img tags, so it is not as flexible as full html parser in temple for e.g., but hey, it is meant being simple) | |
Chris: 19-Jul-2006 | Petr, I have a copy with some notes here: http://www.ross-gill.com/techniques/rsp/ | |
JaimeVargas: 31-Aug-2006 | Very nice comments. But comparing a parser with a regex is a bit unfair ;-) | |
Volker: 31-Aug-2006 | That scoping is the difference between a closure and doing a "string" here. | |
BrianH: 31-Aug-2006 | REBOL blocks don't reference a context, but they may contain words that reference a context. Still, this distinction makes no difference to the argument that Peters was making - REBOL text processing is more powerful than regex and easier to use. It would be easier to replicate REBOL-style parsing in Python using closures and generators anyway (Peters' real subject), since that is the closest Python gets to Icon-style backtracking. | |
Volker: 31-Aug-2006 | its not important what references the context, but that a variable can find one. | |
Volker: 31-Aug-2006 | result := a > b ifTrue:[ 'greater' ] ifFalse:[ 'less' ] | |
Ladislav: 31-Aug-2006 | besides, Tim was a REBOL 1.x user | |
Oldes: 15-Sep-2006 | Maybe someone find it usefull: remove-tags: func[html /except allowed-tags /local new x tag name tagchars][ new: make string! length? html tagchars: charset [#"a" - #"z" #"A" - #"Z"] parse/all html [ any [ copy x to {<} copy tag thru {>} ( if not none? x [insert tail new x] if all [ except parse tag ["<" opt #"/" copy name some tagchars to end] find allowed-tags name ][ insert tail new tag ] ) ] copy x to end (if not none? x [insert tail new x]) ] new ] | |
Gregg: 25-Sep-2006 | If it were a safe and easy thing to change, I can see some value in it as an option but, since words--and REBOL--are case insensitive, I'm inclined to live with things as they are, and use string parsing if case sensitivity is needed. I think it's Oldes or Rebolek that sometimes requests the ability to parse non-loadable strings, using percentage values as an example. I think loading percentages would be awesome, but then there are other values we might want to load as well; where do you draw the line? I'm waiting to see what R3 holds with custom datatypes and such. | |
Gregg: 25-Sep-2006 | And didn't you suggest that values throwing errors could be coerced to string! or another type? e.g. add an /any refinement to load, and any value in the string that can't be loaded would become a string (or maybe you could say you want them to be tags for easy identification). | |
Oldes: 25-Sep-2006 | I think, load/next can be used to handle invalid datatypes now: >> b: {1 2 3 'x' ,} == "1 2 3 'x' ," >> while [v: load/next b not empty? second v][probe v b: v/2] [1 " 2 3 'x' ,"] [2 " 3 'x' ,"] [3 " 'x' ,"] ['x' " ,"] ** Syntax Error: Invalid word -- , ** Near: (line 1) , Just add some hadler to convert the invalid datatype to something else what is loadable and then parse as a block | |
Oldes: 25-Sep-2006 | But such a preloader will slow down:( | |
Oldes: 26-Sep-2006 | (it should be a question - is there such a example?) | |
Rebolek: 26-Sep-2006 | Words should be non-case sensitive, but is it always the case? I've found this today accidentaly: >> a: [small Small] == [small Small] >> find/case a to word! "small" == [small Small] >> find/case a to word! "Small" == [Small] | |
Gabriele: 26-Sep-2006 | well... case insensitivity for words is done via automatic aliasing of words that differ in case only. (i know this because we found a bug related to this :) | |
Anton: 27-Sep-2006 | Here's an idea to toss into the mix: I am thinking of a new notation for strings using underscore (eg. _"hello"_ ) in a parse block, which allows to specify whether they are delimited by whitespace or not. This would allow you to enable/disable the necessity for delimiters per-string. eg: parse input [ _"house"_ ; a complete word surrounded both sides by whitespace _"hous" ; this would match "house", "housing", "housed" or even "housopoly" etc.. but left side must be whitespace "ad"_ ; this would match "ad", "fad", "glad" and right side must be whitespace ] But this would need string datatype to change. On the other hand, I could just set underscore _ to a charset of whitespace, then use that with parse/all eg: _: charset " ^-^/" parse/all input [ [ _ "house" _ ] ] though that wouldn't be as comfortable. Maybe I can create parse rules from a simpler dialect which understands the underscore _. Just an idea... | |
MikeL: 27-Sep-2006 | Anton, Andrew had defined white space patterns in his patterns.r script which seems usable then you can use [ ws* "house" ws*] or other combinations as needed without underscore. Andrew's solution for this and a lot of other things have given me some good mileage over the past few years. WS*: [some WS] and WS?: [any WS]. It makes for clean parse scripts clear once you adopt it. | |
Anton: 27-Sep-2006 | Oh yes, I've seen Andrew's patterns.r. I was just musing how to make it more concise without even using a short word like WS. Actually the use case which sparked this idea was more of a "regex-level" pattern matcher, just a simple pattern matcher where the user writes the pattern to match filenames and to match strings appearing in file contents. | |
Anton: 27-Sep-2006 | Gregg, + * ? could be a good idea. I'll throw that into my mix-bowl. | |
Gregg: 28-Sep-2006 | I also have a naming convention I've been playing with for a while, where parse rule words have an "=" at the end (e.g. date=) and parse variables--values set during the parse process--have it at the beginning (e.g. =date). The idea is that it's sort of a cross between BNF syntax for production rules and set-word/get-word syntax; the goal being to easily distinguish parse-related words. By using the same word for a rule and an associated variable, with the equal sign at the head or tail, respectively, it also makes it easier to keep track of what gets set where, when you have a lot of rules. | |
Graham: 29-Sep-2006 | This was I thought a simple task .. to parse a csv file.... | |
Graham: 29-Sep-2006 | this seems to be a difficult line as there is an embedded quote viz "123 "c" Avenue" | |
Graham: 29-Sep-2006 | this is Gabriele's published parser CSV-parser: make object! [ line-rule: [field any [separator field]] field: [[quoted-string | string] (insert tail fields any [f-val copy ""])] string: [copy f-val any str-char] quoted-string: [{"} copy f-val any qstr-char {"} (replace/all f-val {""} {"})] str-char: none qstr-char: [{""} | separator | str-char] fields: [] f-val: none separator: #";" set 'parse-csv-line func [ "Parses a CSV line (returns a block of strings)" line [string!] /with sep [char!] "The separator between fields" ] [ clear fields separator: any [sep #";"] str-char: complement charset join {"} separator parse/all line line-rule copy fields ] ] | |
Graham: 29-Sep-2006 | this might fix Gabriele's parser .. CSV-parser: make object! [ line-rule: [field any [separator field]] field: [[quoted-string | string] (insert tail fields any [f-val copy ""])] string: [copy f-val any str-char] quoted-string: [{"} copy f-val any qstr-char {"} (if found? f-val [ replace/all f-val {""} {"}])] str-char: none qstr-char: [{""} | separator | str-char] fields: [] f-val: none separator: #";" set 'parse-csv-line func [ "Parses a CSV line (returns a block of strings)" line [string!] /with sep [char!] "The separator between fields" ] [ clear fields separator: any [sep #";"] str-char: complement charset join {"} separator parse/all line line-rule copy fields ] ] | |
Izkata: 3-Oct-2006 | That's a ~very~ good example, Oldes... it should be put in the docs somewhere (if it isn't already.) I didn't understand how get-words and set-words worked in parse, either, before.. | |
Anton: 4-Oct-2006 | string: "<good tag><bad tag><other tag><good tag>" entity: "<ENTITY>" parse/all string [ any [ to "<" start: skip to ">" end: skip (if not find copy/part start end "good tag" [ change/part start entity 1 ; fix up END (for when your entity is other than a 1-character long string) end: skip end (length? entity) - 1 change/part end entity 1 ; fix up END again end: skip end (length? entity) - 1 ]) :end skip ] to end ] string ;== {<good tag><ENTITY>bad tag<ENTITY><ENTITY>other tag<ENTITY><good tag>} | |
Anton: 4-Oct-2006 | Such unmatched tags cause a headache for any parser. | |
Anton: 4-Oct-2006 | Ok, give this a burl. | |
Anton: 4-Oct-2006 | string: "<good tag><bad tag> 3 > 5 <other tag><good tag with something inside>" string: " > >> < <<good tag><bad tag> 3 > 5 <other tag><good tag etc> >> > " ; (1) search for end tags >, they are erroneous so replace them ; (2) search for start tags <, if there is more than one, replace all except the last one ; (3) search for end tag >, check tag body and replace if necessary entity: "&entity;" ntag: complement charset "<>" ; non tag parse/all result: copy string [ any [ ; (1) any [ any ntag start: ">" end: ( change/part start entity 1 end: skip start length? entity ;print [1 index? start] ) :end ] ; (2) (start: none stop?: none) any [ any ntag start: "<" end: ;(print [2 mold start]) any ntag "<" ( ;print "found a second start tag" change/part start entity 1 end: skip start length? entity ;(print [2.1 mold copy/part start end]) start: none ) :end ] (if none? start [stop?: 'break]) stop? ; ok, we found at least one start tag ;(print ["OK we found at least one start tag" mold start]) :start skip ; (3) any ntag end: ">" ;(print [3 mold copy/part start end]) (if not find copy/part start end "good tag" [ ;print ["found a bad tag" mold copy/part start end] change/part start entity 1 ; fix up END (for when your entity is other than a 1-character long string) end: skip end (length? entity) - 1 change/part end entity 1 ; fix up END again end: skip end (length? entity) - 1 ]) :end skip ] to end ] result | |
Anton: 4-Oct-2006 | Holy ---- ! where did two and a half hours go ? | |
Anton: 4-Oct-2006 | oh no.. maybe I only spent one and a half hours on it, but still...! | |
Oldes: 5-Oct-2006 | And Rebolek, you can use this my code to remove unwanted tags (It's already here - posted a few days befere - but with a little bug - this should be OK as I'm using it) remove-tags: func[html /except allowed-tags /local new x tag name tagchars][ if not string? html [return html] new: make string! length? html tagchars: charset [#"a" - #"z" #"A" - #"Z"] parse/all html [ any [ copy x to {<} copy tag thru {>} ( if not none? x [insert tail new x] if all [ except parse/all tag ["<" opt #"/" copy name some tagchars to end] find allowed-tags name ][ insert tail new tag ] ) ] copy x to end (if not none? x [insert tail new x]) ] new ] | |
Oldes: 5-Oct-2006 | With such a converter we should theoretically be able to easily parse any language | |
Oldes: 5-Oct-2006 | ...There are actually lots of programs that can be given (E)BNF grammars as input and automatically produce code for parsers for the given grammar. In fact, this is the most common way to produce a compiler: by using a so-called compiler-compiler that takes a grammar as input and produces parser code in some programming language.... | |
Anton: 5-Oct-2006 | Well, I just spent two days making a matching algorithm for searching file contents, and I was considering making a "compile-rules" function (possibly similar to Gabriele or someone else's). Looks like I don't have to make that for now, but my mind is in this place at the moment. I long for the day when I don't have to use filesystems at all (which obviates the need for file search programs) - hopefully we can stick all our info in a database soon. Probably an associative database. | |
Anton: 5-Oct-2006 | While on this topic - Was it Gregg or Sunanda who made a mini dialect for a file contents matcher ? That's the algorithm I just made, and I'm now interested to review other implementations. While developing I also came to an apparent cross-roads, a choice between a simple, "digital", logical algorithm or a more "fuzzy" algorithm with a ranking system like Google. This reminded me of a discussion a while back where this point was made. | |
Gregg: 5-Oct-2006 | WRT BNF, it should be possible. I think Brett Handley did it, or the reverse, at one point; might be on codeconscious.com, not sure. I've also done something similar, for ABNF. It was built for a client, so I'd have to ask if it could be released. ABNF is what is used in a lot of RFCs, so it could be used on a lot of things for standards interop. | |
Robert: 9-Oct-2006 | The main problem I see is that a "normal" BNF parser checks all rules in parallel and uses the first match. Whereas PARSE uses a sequential approach using the first match. So, the rule to use PARSE is, always have the maximum width matching rule at the beginning. For example you want to parse for: . .. ... You need to put the ... as first. Otherwise the rule will match for a single . first and be fired three times. | |
BrianH: 10-Oct-2006 | Actually Robert, "normal" BNF parsers usually have similar restrictions to the parse dialect, only more so. Shift-reduce parsers like yacc need the maximum width rule first; recursive-descent parsers need to be refactored extensively (in a way that is too complicated to go into now). The parse dialect is recursive-descent with backtracking, which in theory is less restricted than either LR (shift-reduce) or LL (recursive-descent). I tend to do LL refactoring on my parse rules just because that makes them faster, but it's nice that it is not always required, that I can do LR-style rules if I need to. | |
BrianH: 10-Oct-2006 | Perhaps you are thinking of lexers that convert a source syntax with restrictions similar to those of regular expressions into a state machine. Those could be thought to operate in parallel (not really, but close enough), but the languages they accept are quite restricted compared to full parsers, let alone the parse dialect. | |
BrianH: 10-Oct-2006 | Sorry, I came to the parse dialect from a history of using and making parser generators. It's annoying that the behavior of parse and the tricks you can use to optimize your parse rules have all of these arcane CS terms referring to them. At least the parse dialect is a lot more flexible than most of those parser generators, and easier to write, use and debug too. | |
james_nak: 10-Oct-2006 | I have an easy one for you gurus. Let's say I want to parse a file and get all the "www..." out of it. The thing is that they end in either a space or a linefeed. How do I do a (written in pseudo parse to give you an idea) "to "www" copy tag to 'either a linefeed or a space'"? I've tried charsets, vars, blocks but the best I can do is one or the other. Note, finding the "www" is the easy part, it's ending the string that is giving me fits. Thanks in advance. | |
Maxim: 27-Oct-2006 | I am almost sure this question is asked many times before... its my turn :-) is there a way for a parse rule to detect situations in which is should fail, because we have a succeeding rule which we know will match? | |
Maxim: 27-Oct-2006 | I have rules to parse ABC explicitely and a fall back which can parse anything. | |
Maxim: 27-Oct-2006 | note... the example is simple and consider each character a different matching condition. | |
Maxim: 27-Oct-2006 | also, in reality, each letter in the above over-simplification is a word... not just one char (and there is overlap) so I can't just match charsets. | |
Maxim: 28-Oct-2006 | the break seems to be what I am looking for,I'll test something out and if its not conclusive I will come back with a better example :-) thanks guys. | |
Graham: 25-Nov-2006 | Posted on reboltalk ... >> parse/case "AAABBBaaaBBBAAAaaa" "A" == ["" "" "" "BBBaaaBBB" "" "" "aaa"] how come there are only two "" after the BBBaaaBBB ? | |
Henrik: 25-Nov-2006 | >> parse/case "AAABBBaaaAAA" "A" == ["" "" "" "BBBaaa" "" ""] >> parse/case "BAAABBBaaaAAA" "A" == ["B" "" "" "BBBaaa" "" ""] >> parse/case "BA" "A" == ["B"] hmmm... | |
Ladislav: 25-Nov-2006 | it's OK, because every A means one closing #"^"". The first A was used to close the "...a" string | |
Ingo: 26-Nov-2006 | This may make it easier for some, just exchange the "A"s for "," and mentally read it like you would read a csv file: >> parse/case ",,,BBBaaaBBB,,,aaa" "," == ["" "" "" "BBBaaaBBB" "" "" "aaa"] | |
Anton: 26-Nov-2006 | It's like cutting a piece of wood. You only cut twice but you end up with three pieces. | |
Maxim: 26-Nov-2006 | huh? not sure get what you mean... how can the above be desired? it mangles symmetricity of data and tokenizing? for example it strips end / of a dir... | |
Maxim: 27-Nov-2006 | the function's doc string doesn't even mention it ! its a special mode ... in the dict it says: There is also a simple parse mode that does not require rules, but takes a string of characters to use for splitting up the input string. so not very explicit. | |
Anton: 27-Nov-2006 | So the problem might be that we don't know how it's supposed to work. Maybe the implementor wasn't too clear how it should work either. From memory there was an "inconsistent case" which actually had a use - for something like splitting command-line args. But anyway, a clearer definition would be good. | |
Anton: 27-Nov-2006 | Better to have a simple and consistent core and enable particular modes for specific uses with refinements. | |
Pekr: 5-Dec-2006 | Just asking, because today I read a bit about ODF and OpenXML (two document formats for office apps). There is probably open space for small apps, parsing some info from inside the documents etc. (meta-data programming) ... just curious ... or will it be better to wait for full-spec XML MLs libs, doing the job given, and link to those libraries? | |
BrianH: 5-Dec-2006 | Such a thing has been on my todo list for a while, but I've been a little busy lately with non-REBOL projects :( | |
Maxim: 8-Dec-2006 | geomol's xml2rebxml handles XML pretty well. one might want to change the parse rules a little to adapt the output, but it actually loads all the xml tags, empty tags and attributes. it even handles utf-8, CDATA chunks, and converts some of the & chars. | |
BrianH: 11-Dec-2006 | You really have to trust your source when using JSON to a browser though. Standard usage is to load with eval - only safe to use on https sites because of script injection. | |
Maxim: 11-Dec-2006 | is there a way to make block parsing case sensitive? this doesn't seem to work: parse/case [A a] [some ['A (print "upper") | 'a (print "lower")]] | |
Gabriele: 11-Dec-2006 | >> strict-equal? 'A 'a == true | |
Gabriele: 11-Dec-2006 | >> alias 'a "aa" == aa >> strict-equal? 'A 'a == false | |
Maxim: 11-Dec-2006 | hehe... I would not want the bug to get too comfortable, less it becomes a feature ;-) | |
Joe: 24-Dec-2006 | i run the above on core 2.6 and it loops forever . This was a bug fixed in 2.3 but it looks like the bug still exists | |
Joe: 24-Dec-2006 | sorry, not a bug. I was inspired by the example in the changes page and it is missing the thru "^/" after the to "^/" |
17201 / 64608 | 1 | 2 | 3 | 4 | 5 | ... | 171 | 172 | [173] | 174 | 175 | ... | 643 | 644 | 645 | 646 | 647 |