AltME groups: search

Help · search scripts · search articles · search mailing list

results summary

world	hits
r4wp	5907
r3wp	58701
total:	64608

results window for this page: [start: 17201 end: 17300]

world-name: r3wp

Group: Parse ... Discussion of PARSE dialect [web-public]
Graham: 1-Jul-2006	there is no whitespace inside a macroname
Tomc: 1-Jul-2006	so there is a seperate extendable file with the macro=expansion
Graham: 1-Jul-2006	actually the file will be saved in a database and loaded when the program starts
Tomc: 1-Jul-2006	as a model orginism
Tomc: 1-Jul-2006	that the macro-expansoion fioe needs to self check for incidental occurances of a "macro" in an "expansion" and protect against
Tomc: 1-Jul-2006	I wouls still sort the macros by longest to shortest so cant glob on to part of a macro ..
Graham: 1-Jul-2006	so, basically you created a single parse rule from the macro list and then parsed the text in one go.
BrianH: 1-Jul-2006	Tomc, that is a good point - I'll fix it. Graham, that's right.
Graham: 1-Jul-2006	We need a masterclass in parse ....
Graham: 1-Jul-2006	it's a local so memory will be released anyway ..
BrianH: 1-Jul-2006	It's a speed optimization. This might change with REBOL 3.
Graham: 1-Jul-2006	memory use is a large with Rebol.
BrianH: 1-Jul-2006	Most of the excessive memory overhead of REBOL is just sloppy (no offense Carl). It's not much of a problem for most, but I have run into memory limits when running on embedded or handheld platforms, or running hundreds of instances on servers.
Tomc: 1-Jul-2006	but that can just be a static rule outside of compose
Graham: 1-Jul-2006	the above macro is supposed to expand into a multiline statement.
BrianH: 1-Jul-2006	Then it is a good thing that the ^/ is in the expansion.
Graham: 1-Jul-2006	No, as it ends up on screen showing ^/ instead of a visual newline.
BrianH: 1-Jul-2006	Are they writing ^/ in the expansion text source data to indicate a newline?
Graham: 1-Jul-2006	They're using ^/ as the macros are being read in from a text file using read/lines
Henrik: 9-Jul-2006	how "local" are variables that are set during a parse? I was looking at Geomol's postscript.r and looked at: coords: [err: (pos: none) [set pos pair! \| set x number! set y number!] ( either pos [ append output compose [(pos/x) " " (pos/y) " lineto^/"] ][ append output compose [(x) " " (y) " lineto^/"] ] ) ]
Anton: 9-Jul-2006	Mmm let me make a few tests.
Henrik: 9-Jul-2006	actually, there is a difference between my code and this, which may be causing it: I need to loop the block with 'any. I suspect the contents is lost after the first run.
Oldes: 9-Jul-2006	if the parse is inside function and you set pos in the function as a local - it will be local
Henrik: 9-Jul-2006	I want to assign a variable to each element so I can process them later
Anton: 9-Jul-2006	add a block to control the evaluation.
Anton: 9-Jul-2006	I'm trying to figure out a simple example to show why.
Henrik: 9-Jul-2006	I wonder what the difference is? If it's only for controlling how global a variable is, it seems a little backwards to me
Henrik: 9-Jul-2006	the brackets would make it a "real" rule, wouldn't it? it would be possible to replace the rule with a variable and have the rule block placed elsewhere in your code
Anton: 9-Jul-2006	You have to think of a rule like this: [ integer! \| ] as equivalent to [ integer! \| none ] or opt [ integer! ]
Anton: 9-Jul-2006	I think he might be using 'test-image in place of a real image! for this example ?
Henrik: 9-Jul-2006	It's also a good thing with these discussions. I've never really grown 100% comfortable with parse.
DideC: 10-Jul-2006	About Layout : parse handles only the layout words (origin, space, at...), see source layout. The face description is handled by a loop, not by parse. See system/view/vid/grow-facets
Pekr: 19-Jul-2006	Hi, need a bit of help ....
Pekr: 19-Jul-2006	I now can create simply a func, which will accept mark name, and do some code-block accordingly - sql query, simple replace of value, whatever (well, it will not work for cases like img tags, so it is not as flexible as full html parser in temple for e.g., but hey, it is meant being simple)
Chris: 19-Jul-2006	Petr, I have a copy with some notes here: http://www.ross-gill.com/techniques/rsp/
JaimeVargas: 31-Aug-2006	Very nice comments. But comparing a parser with a regex is a bit unfair ;-)
Volker: 31-Aug-2006	That scoping is the difference between a closure and doing a "string" here.
BrianH: 31-Aug-2006	REBOL blocks don't reference a context, but they may contain words that reference a context. Still, this distinction makes no difference to the argument that Peters was making - REBOL text processing is more powerful than regex and easier to use. It would be easier to replicate REBOL-style parsing in Python using closures and generators anyway (Peters' real subject), since that is the closest Python gets to Icon-style backtracking.
Volker: 31-Aug-2006	its not important what references the context, but that a variable can find one.
Volker: 31-Aug-2006	result := a > b ifTrue:[ 'greater' ] ifFalse:[ 'less' ]
Ladislav: 31-Aug-2006	besides, Tim was a REBOL 1.x user
Oldes: 15-Sep-2006	Maybe someone find it usefull: remove-tags: func[html /except allowed-tags /local new x tag name tagchars][ new: make string! length? html tagchars: charset [#"a" - #"z" #"A" - #"Z"] parse/all html [ any [ copy x to {<} copy tag thru {>} ( if not none? x [insert tail new x] if all [ except parse tag ["<" opt #"/" copy name some tagchars to end] find allowed-tags name ][ insert tail new tag ] ) ] copy x to end (if not none? x [insert tail new x]) ] new ]
Gregg: 25-Sep-2006	If it were a safe and easy thing to change, I can see some value in it as an option but, since words--and REBOL--are case insensitive, I'm inclined to live with things as they are, and use string parsing if case sensitivity is needed. I think it's Oldes or Rebolek that sometimes requests the ability to parse non-loadable strings, using percentage values as an example. I think loading percentages would be awesome, but then there are other values we might want to load as well; where do you draw the line? I'm waiting to see what R3 holds with custom datatypes and such.
Gregg: 25-Sep-2006	And didn't you suggest that values throwing errors could be coerced to string! or another type? e.g. add an /any refinement to load, and any value in the string that can't be loaded would become a string (or maybe you could say you want them to be tags for easy identification).
Oldes: 25-Sep-2006	I think, load/next can be used to handle invalid datatypes now: >> b: {1 2 3 'x' ,} == "1 2 3 'x' ," >> while [v: load/next b not empty? second v][probe v b: v/2] [1 " 2 3 'x' ,"] [2 " 3 'x' ,"] [3 " 'x' ,"] ['x' " ,"] Syntax Error: Invalid word -- , Near: (line 1) , Just add some hadler to convert the invalid datatype to something else what is loadable and then parse as a block
Oldes: 25-Sep-2006	But such a preloader will slow down:(
Oldes: 26-Sep-2006	(it should be a question - is there such a example?)
Rebolek: 26-Sep-2006	Words should be non-case sensitive, but is it always the case? I've found this today accidentaly: >> a: [small Small] == [small Small] >> find/case a to word! "small" == [small Small] >> find/case a to word! "Small" == [Small]
Gabriele: 26-Sep-2006	well... case insensitivity for words is done via automatic aliasing of words that differ in case only. (i know this because we found a bug related to this :)
Anton: 27-Sep-2006	Here's an idea to toss into the mix: I am thinking of a new notation for strings using underscore (eg. _"hello"_ ) in a parse block, which allows to specify whether they are delimited by whitespace or not. This would allow you to enable/disable the necessity for delimiters per-string. eg: parse input [ _"house"_ ; a complete word surrounded both sides by whitespace _"hous" ; this would match "house", "housing", "housed" or even "housopoly" etc.. but left side must be whitespace "ad"_ ; this would match "ad", "fad", "glad" and right side must be whitespace ] But this would need string datatype to change. On the other hand, I could just set underscore _ to a charset of whitespace, then use that with parse/all eg: _: charset " ^-^/" parse/all input [ [ _ "house" _ ] ] though that wouldn't be as comfortable. Maybe I can create parse rules from a simpler dialect which understands the underscore _. Just an idea...
MikeL: 27-Sep-2006	Anton, Andrew had defined white space patterns in his patterns.r script which seems usable then you can use [ ws* "house" ws] or other combinations as needed without underscore. Andrew's solution for this and a lot of other things have given me some good mileage over the past few years. WS: [some WS] and WS?: [any WS]. It makes for clean parse scripts clear once you adopt it.
Anton: 27-Sep-2006	Oh yes, I've seen Andrew's patterns.r. I was just musing how to make it more concise without even using a short word like WS. Actually the use case which sparked this idea was more of a "regex-level" pattern matcher, just a simple pattern matcher where the user writes the pattern to match filenames and to match strings appearing in file contents.
Anton: 27-Sep-2006	Gregg, + * ? could be a good idea. I'll throw that into my mix-bowl.
Gregg: 28-Sep-2006	I also have a naming convention I've been playing with for a while, where parse rule words have an "=" at the end (e.g. date=) and parse variables--values set during the parse process--have it at the beginning (e.g. =date). The idea is that it's sort of a cross between BNF syntax for production rules and set-word/get-word syntax; the goal being to easily distinguish parse-related words. By using the same word for a rule and an associated variable, with the equal sign at the head or tail, respectively, it also makes it easier to keep track of what gets set where, when you have a lot of rules.
Graham: 29-Sep-2006	This was I thought a simple task .. to parse a csv file....
Graham: 29-Sep-2006	this seems to be a difficult line as there is an embedded quote viz "123 "c" Avenue"
Graham: 29-Sep-2006	this is Gabriele's published parser CSV-parser: make object! [ line-rule: [field any [separator field]] field: [[quoted-string \| string] (insert tail fields any [f-val copy ""])] string: [copy f-val any str-char] quoted-string: [{"} copy f-val any qstr-char {"} (replace/all f-val {""} {"})] str-char: none qstr-char: [{""} \| separator \| str-char] fields: [] f-val: none separator: #";" set 'parse-csv-line func [ "Parses a CSV line (returns a block of strings)" line [string!] /with sep [char!] "The separator between fields" ] [ clear fields separator: any [sep #";"] str-char: complement charset join {"} separator parse/all line line-rule copy fields ] ]
Graham: 29-Sep-2006	this might fix Gabriele's parser .. CSV-parser: make object! [ line-rule: [field any [separator field]] field: [[quoted-string \| string] (insert tail fields any [f-val copy ""])] string: [copy f-val any str-char] quoted-string: [{"} copy f-val any qstr-char {"} (if found? f-val [ replace/all f-val {""} {"}])] str-char: none qstr-char: [{""} \| separator \| str-char] fields: [] f-val: none separator: #";" set 'parse-csv-line func [ "Parses a CSV line (returns a block of strings)" line [string!] /with sep [char!] "The separator between fields" ] [ clear fields separator: any [sep #";"] str-char: complement charset join {"} separator parse/all line line-rule copy fields ] ]
Izkata: 3-Oct-2006	That's a ~very~ good example, Oldes... it should be put in the docs somewhere (if it isn't already.) I didn't understand how get-words and set-words worked in parse, either, before..
Anton: 4-Oct-2006	string: "<good tag><bad tag><other tag><good tag>" entity: "<ENTITY>" parse/all string [ any [ to "<" start: skip to ">" end: skip (if not find copy/part start end "good tag" [ change/part start entity 1 ; fix up END (for when your entity is other than a 1-character long string) end: skip end (length? entity) - 1 change/part end entity 1 ; fix up END again end: skip end (length? entity) - 1 ]) :end skip ] to end ] string ;== {<good tag><ENTITY>bad tag<ENTITY><ENTITY>other tag<ENTITY><good tag>}
Anton: 4-Oct-2006	Such unmatched tags cause a headache for any parser.
Anton: 4-Oct-2006	Ok, give this a burl.
Anton: 4-Oct-2006	string: "<good tag><bad tag> 3 > 5 <other tag><good tag with something inside>" string: " > >> < <<good tag><bad tag> 3 > 5 <other tag><good tag etc> >> > " ; (1) search for end tags >, they are erroneous so replace them ; (2) search for start tags <, if there is more than one, replace all except the last one ; (3) search for end tag >, check tag body and replace if necessary entity: "&entity;" ntag: complement charset "<>" ; non tag parse/all result: copy string [ any [ ; (1) any [ any ntag start: ">" end: ( change/part start entity 1 end: skip start length? entity ;print [1 index? start] ) :end ] ; (2) (start: none stop?: none) any [ any ntag start: "<" end: ;(print [2 mold start]) any ntag "<" ( ;print "found a second start tag" change/part start entity 1 end: skip start length? entity ;(print [2.1 mold copy/part start end]) start: none ) :end ] (if none? start [stop?: 'break]) stop? ; ok, we found at least one start tag ;(print ["OK we found at least one start tag" mold start]) :start skip ; (3) any ntag end: ">" ;(print [3 mold copy/part start end]) (if not find copy/part start end "good tag" [ ;print ["found a bad tag" mold copy/part start end] change/part start entity 1 ; fix up END (for when your entity is other than a 1-character long string) end: skip end (length? entity) - 1 change/part end entity 1 ; fix up END again end: skip end (length? entity) - 1 ]) :end skip ] to end ] result
Anton: 4-Oct-2006	Holy ---- ! where did two and a half hours go ?
Anton: 4-Oct-2006	oh no.. maybe I only spent one and a half hours on it, but still...!
Oldes: 5-Oct-2006	And Rebolek, you can use this my code to remove unwanted tags (It's already here - posted a few days befere - but with a little bug - this should be OK as I'm using it) remove-tags: func[html /except allowed-tags /local new x tag name tagchars][ if not string? html [return html] new: make string! length? html tagchars: charset [#"a" - #"z" #"A" - #"Z"] parse/all html [ any [ copy x to {<} copy tag thru {>} ( if not none? x [insert tail new x] if all [ except parse/all tag ["<" opt #"/" copy name some tagchars to end] find allowed-tags name ][ insert tail new tag ] ) ] copy x to end (if not none? x [insert tail new x]) ] new ]
Oldes: 5-Oct-2006	With such a converter we should theoretically be able to easily parse any language
Oldes: 5-Oct-2006	...There are actually lots of programs that can be given (E)BNF grammars as input and automatically produce code for parsers for the given grammar. In fact, this is the most common way to produce a compiler: by using a so-called compiler-compiler that takes a grammar as input and produces parser code in some programming language....
Anton: 5-Oct-2006	Well, I just spent two days making a matching algorithm for searching file contents, and I was considering making a "compile-rules" function (possibly similar to Gabriele or someone else's). Looks like I don't have to make that for now, but my mind is in this place at the moment. I long for the day when I don't have to use filesystems at all (which obviates the need for file search programs) - hopefully we can stick all our info in a database soon. Probably an associative database.
Anton: 5-Oct-2006	While on this topic - Was it Gregg or Sunanda who made a mini dialect for a file contents matcher ? That's the algorithm I just made, and I'm now interested to review other implementations. While developing I also came to an apparent cross-roads, a choice between a simple, "digital", logical algorithm or a more "fuzzy" algorithm with a ranking system like Google. This reminded me of a discussion a while back where this point was made.
Gregg: 5-Oct-2006	WRT BNF, it should be possible. I think Brett Handley did it, or the reverse, at one point; might be on codeconscious.com, not sure. I've also done something similar, for ABNF. It was built for a client, so I'd have to ask if it could be released. ABNF is what is used in a lot of RFCs, so it could be used on a lot of things for standards interop.
Robert: 9-Oct-2006	The main problem I see is that a "normal" BNF parser checks all rules in parallel and uses the first match. Whereas PARSE uses a sequential approach using the first match. So, the rule to use PARSE is, always have the maximum width matching rule at the beginning. For example you want to parse for: . .. ... You need to put the ... as first. Otherwise the rule will match for a single . first and be fired three times.
BrianH: 10-Oct-2006	Actually Robert, "normal" BNF parsers usually have similar restrictions to the parse dialect, only more so. Shift-reduce parsers like yacc need the maximum width rule first; recursive-descent parsers need to be refactored extensively (in a way that is too complicated to go into now). The parse dialect is recursive-descent with backtracking, which in theory is less restricted than either LR (shift-reduce) or LL (recursive-descent). I tend to do LL refactoring on my parse rules just because that makes them faster, but it's nice that it is not always required, that I can do LR-style rules if I need to.
BrianH: 10-Oct-2006	Perhaps you are thinking of lexers that convert a source syntax with restrictions similar to those of regular expressions into a state machine. Those could be thought to operate in parallel (not really, but close enough), but the languages they accept are quite restricted compared to full parsers, let alone the parse dialect.
BrianH: 10-Oct-2006	Sorry, I came to the parse dialect from a history of using and making parser generators. It's annoying that the behavior of parse and the tricks you can use to optimize your parse rules have all of these arcane CS terms referring to them. At least the parse dialect is a lot more flexible than most of those parser generators, and easier to write, use and debug too.
james_nak: 10-Oct-2006	I have an easy one for you gurus. Let's say I want to parse a file and get all the "www..." out of it. The thing is that they end in either a space or a linefeed. How do I do a (written in pseudo parse to give you an idea) "to "www" copy tag to 'either a linefeed or a space'"? I've tried charsets, vars, blocks but the best I can do is one or the other. Note, finding the "www" is the easy part, it's ending the string that is giving me fits. Thanks in advance.
Maxim: 27-Oct-2006	I am almost sure this question is asked many times before... its my turn :-) is there a way for a parse rule to detect situations in which is should fail, because we have a succeeding rule which we know will match?
Maxim: 27-Oct-2006	I have rules to parse ABC explicitely and a fall back which can parse anything.
Maxim: 27-Oct-2006	note... the example is simple and consider each character a different matching condition.
Maxim: 27-Oct-2006	also, in reality, each letter in the above over-simplification is a word... not just one char (and there is overlap) so I can't just match charsets.
Maxim: 28-Oct-2006	the break seems to be what I am looking for,I'll test something out and if its not conclusive I will come back with a better example :-) thanks guys.
Graham: 25-Nov-2006	Posted on reboltalk ... >> parse/case "AAABBBaaaBBBAAAaaa" "A" == ["" "" "" "BBBaaaBBB" "" "" "aaa"] how come there are only two "" after the BBBaaaBBB ?
Henrik: 25-Nov-2006	>> parse/case "AAABBBaaaAAA" "A" == ["" "" "" "BBBaaa" "" ""] >> parse/case "BAAABBBaaaAAA" "A" == ["B" "" "" "BBBaaa" "" ""] >> parse/case "BA" "A" == ["B"] hmmm...
Ladislav: 25-Nov-2006	it's OK, because every A means one closing #"^"". The first A was used to close the "...a" string
Ingo: 26-Nov-2006	This may make it easier for some, just exchange the "A"s for "," and mentally read it like you would read a csv file: >> parse/case ",,,BBBaaaBBB,,,aaa" "," == ["" "" "" "BBBaaaBBB" "" "" "aaa"]
Anton: 26-Nov-2006	It's like cutting a piece of wood. You only cut twice but you end up with three pieces.
Maxim: 26-Nov-2006	huh? not sure get what you mean... how can the above be desired? it mangles symmetricity of data and tokenizing? for example it strips end / of a dir...
Maxim: 27-Nov-2006	the function's doc string doesn't even mention it ! its a special mode ... in the dict it says: There is also a simple parse mode that does not require rules, but takes a string of characters to use for splitting up the input string. so not very explicit.
Anton: 27-Nov-2006	So the problem might be that we don't know how it's supposed to work. Maybe the implementor wasn't too clear how it should work either. From memory there was an "inconsistent case" which actually had a use - for something like splitting command-line args. But anyway, a clearer definition would be good.
Anton: 27-Nov-2006	Better to have a simple and consistent core and enable particular modes for specific uses with refinements.
Pekr: 5-Dec-2006	Just asking, because today I read a bit about ODF and OpenXML (two document formats for office apps). There is probably open space for small apps, parsing some info from inside the documents etc. (meta-data programming) ... just curious ... or will it be better to wait for full-spec XML MLs libs, doing the job given, and link to those libraries?
BrianH: 5-Dec-2006	Such a thing has been on my todo list for a while, but I've been a little busy lately with non-REBOL projects :(
Maxim: 8-Dec-2006	geomol's xml2rebxml handles XML pretty well. one might want to change the parse rules a little to adapt the output, but it actually loads all the xml tags, empty tags and attributes. it even handles utf-8, CDATA chunks, and converts some of the & chars.
BrianH: 11-Dec-2006	You really have to trust your source when using JSON to a browser though. Standard usage is to load with eval - only safe to use on https sites because of script injection.
Maxim: 11-Dec-2006	is there a way to make block parsing case sensitive? this doesn't seem to work: parse/case [A a] [some ['A (print "upper") \| 'a (print "lower")]]
Gabriele: 11-Dec-2006	>> strict-equal? 'A 'a == true
Gabriele: 11-Dec-2006	>> alias 'a "aa" == aa >> strict-equal? 'A 'a == false
Maxim: 11-Dec-2006	hehe... I would not want the bug to get too comfortable, less it becomes a feature ;-)
Joe: 24-Dec-2006	i run the above on core 2.6 and it loops forever . This was a bug fixed in 2.3 but it looks like the bug still exists
Joe: 24-Dec-2006	sorry, not a bug. I was inspired by the example in the changes page and it is missing the thru "^/" after the to "^/"

17201 / 64608

[173]