AltME groups: search

Help · search scripts · search articles · search mailing list

results summary

world	hits
r4wp	5907
r3wp	58701
total:	64608

results window for this page: [start: 61401 end: 61500]

world-name: r3wp

Group: Parse ... Discussion of PARSE dialect [web-public]
BrianH: 1-May-2011	It seems like an error that is improperly not triggered. SET is supposed to set to a single value, not a series of values - an embedded block is a single value.
Ladislav: 1-May-2011	This is a simple rule: set w rule sets the word 'w to the first value matched. No error.
onetom: 1-May-2011	so, no way to match a complex rule?
BrianH: 1-May-2011	That is not the error I was talking about. This is the error: >> parse [a b] [set w ['a 'b]] == true >> ? w W is a word of value: a It is the attempt to set the value to a complex rule that is the error. It wouldn't be an error to do this: parse [a b] [set w 'a 'b] If we keep the current behavior, there needs to be a lot of strongly worded warnings about the potential gotcha in the PARSE SET docs.
Ladislav: 1-May-2011	You cannot find anywhere a formulation like "set the value to a rule". That is not what can happen.
onetom: 1-May-2011	address! [string! tuple! hash!] parse ["cat" 1.2.3 #4] [set addr1 address!] if im reading this address! looks just like a value reference...
BrianH: 1-May-2011	It could be considered a useful feature, since the whole match needs to match before the SET is performed. However, the docs need to be extremely precise about this because the "set the value to a rule" interpretation is a common misconception among newbie PARSE writers. It would be good for the docs to give an example of the type of code this allows you to do, explaining the difference.
BrianH: 1-May-2011	onetom, It's not "set the variable to a rule", it is instead "match a rule then set the variable to the value at the current position in the data".
Ladislav: 1-May-2011	as stated in the "Idioms" section, I think, that a: [set b c] shall be equivalent to: f: [(set/any [b] if lesser? index? e index? d [e])] a: [and [c d:] e: f :d]
BrianH: 1-May-2011	By "It could be considered a useful feature" I mean that if this were not allowed, you would have to write this: parse data [set x ['a 'b]] like this instead: parse data [and ['a 'b] set x skip skip]
Ladislav: 1-May-2011	Yes, Brian, you just wrote it in a simpler way
BrianH: 1-May-2011	Those full equivalences are great for someone who really needs to know how things work internally (such as when they need to clone PARSE), but you need simple examples first in docs for people who just want to use PARSE properly. Btw, has anyone started a set of full PARSE docs in DocBase? The parse project page could be raided for information, but it really doesn't serve as a full parse manual.
Ladislav: 1-May-2011	This is the formulation used to document the SET directive: If the subexpression match succeeds, the set operation sets the given variable to the first matched value, while the copy operation copies the whole part of the input matched by the given subexpression. For a more detailed description see the Parse idioms section.
BrianH: 1-May-2011	Sounds accurate, if a little intimidating to newbies. I wish we had a really good PARSE manual that could turn newbies into experts.
Geomol: 1-May-2011	PARSE in R2 seems to have less support for combined keywords than R3, as can be seen in this example: >> parse [] [opt some 'a] ** Script Error: Invalid argument: some But there is no error, when combining OPT and THRU: >> parse [] [opt thru 'a] == false Should that trigger an error? If no error, it should return true, right?
Ladislav: 1-May-2011	opt thru 'a means the same as [opt thru] 'a
BrianH: 1-May-2011	Ladislav, why would [opt thru 'a] not mean [opt [thru 'a]]? Isn't that the definition of opt?
Geomol: 1-May-2011	Ladislav, but shouldn't then [opt some 'a] mean [[opt some] 'a] ? It gives an error in R2.
Geomol: 1-May-2011	Kinda same problem, when combining them this way (still in R2): >> parse [] [thru opt 'a] == false >> parse [a] [thru opt 'a] == false
BrianH: 1-May-2011	In R3 that would be an improperly untriggered error, since TO and THRU are defined to not take the full gamut of rules, only a subset. Probably the same for R2, but a different subset.
Geomol: 1-May-2011	Ok, version 1.0.0 of BPARSE is found here: http://www.fys.ku.dk/~niclasen/rebol/libs/bparse.r It's a function version of PARSE, and can only parse blocks for now, not strings. It can do more or less all PARSE in R2 can do when parsing blocks. I've tried to trigger errors, R2 PARSE doesn't. The purpose is to play around with parsing to maybe make a better version than the native version and without bugs.
Geomol: 1-May-2011	It's not as fast as the timings, I gave here earlier with a very early version.
Geomol: 1-May-2011	I've thought some more about [thru end], which return false in the R2 version, but return true in R3. My version return false as R2, but I better understand the R3 way, now I've programmed it. It can be seen as, how THRU should be understood (, as also Ladislav said something about)? Do we think of [thru rule] as [to rule rule] or [to rule skip] ? If the TO keyword can handle complex rules like: parse [a b] [to ['a 'b] ['a 'b]] then the first might make better sense, and [thru end] should return true. But we can't parse like that in R2, so maybe we think more of it as the second, and then [thru end] should return false. But if you look in my version, I have to catch this special case, where END follows THRU, so it takes more code, which isn't good. In any case, Ladislav's suggestion to use [end skip] as a fail rule is much better. If you're not at the end, the first word (end) will give false, else the next will fail, as you can't skip past end.
BrianH: 1-May-2011	END is a zero-length repeatable rule, like NONE, so TO END and THRU END should be equivalent.
BrianH: 1-May-2011	I'd consider that an error in R2's PARSE, but not a fixable one because it would change the semantics.
Geomol: 1-May-2011	Now you have your own function version of parse, that you can make work exactly as you wish. :-) And then maybe, when you're satisfied, give it to Carl. It should now also be easier to make C versions of parse for those, who make alternatives to REBOL. At least you have a REBOL function to start with.
Geomol: 1-May-2011	Yes, parsing a dialect I have to produce PDF output.
Maxim: 1-May-2011	did you start work on a string parser?
Maxim: 1-May-2011	bah, I'd just stick with R3 parsing for Red. it'll be a good incentive for some to upgrade.
Geomol: 1-May-2011	Having skip as a keyword mean, you can't use that word as a variable.
BrianH: 1-May-2011	Most people tend to not use 'skip as a variable anyways, because of the SKIP function.
BrianH: 1-May-2011	For the mezzanine version, two functions might be better, though they can share code in the same module. Maybe just have one exported word for a dispatch function though.
Geomol: 1-May-2011	parse [a b c] ['aAND'bAND'cEND] hmm, yeah, you've got a point.
BrianH: 1-May-2011	We're probably fine with the wording we got. Though strangely enough, \| is the ELSE of the IF operation. ELSE is a more descriptive name for \| than OR in general.
Ladislav: 1-May-2011	Geomol: [to rule skip] does not mean the same as [thru rule] , as can be demonstrated when comparing the behaviour of thru rule for rule = "abc" It is quite a surprise for me, that you don't see the difference.
Geomol: 2-May-2011	In R2 parsing a block: >> parse ["abc"] [to "abc" skip] == true >> parse ["abc"] [thru "abc"] == true I know, it's different when parsing a string instead of a block. My comparison of [thru rule] to the alternatives was meant as a loose comparison, not to be taken literally. So it's easy to think of THRU to work this way, because it does in many cases, therefore the confusion.
Ladislav: 2-May-2011	But, the recursive description: a: [b \| skip a] is quite natural.
Geomol: 2-May-2011	Yes, and that should work in all cases, if the b rule is found, complex or not. And this will return true, if b is END, because END is a repeatable rule (you can't go past it with SKIP). NONE is also repeatable, and if you look in the code, I have to take care of this too separately. This mean, we can't parse none of datatype none! by using the NONE keyword, but we can using a datatype: >> parse reduce [none] [none] == false >> parse reduce [none] [none!] == true So it raises the question, if the NONE keyword should be there? What are the consequences, if we drop NONE as a keyword? And are there other repeatable rules beside END and NONE? In R2 or R3.
Geomol: 2-May-2011	Ok, what is a good source of information to read about parsing in general? The Top Down Parsing Language family etc.?
Geomol: 2-May-2011	Is the "empty string rule" covered by butting a \| without anything after it? Like in: >> parse [] ['a \|] == true >> parse [] ['a \| none] == true
Geomol: 2-May-2011	It could be interesting to creat an absolutely minimal PARSE function, that can handle all we expect from such a function but with as little code as possible (as few keywords as possible).
Ladislav: 2-May-2011	BTW (looks a unlucky to me), do you know, that in REBOL the NONE rule can fail?
Geomol: 2-May-2011	With bparse, this hangs: bparse [a b c] [some [none]] but it can be stopped by hitting <Esc>.
Geomol: 2-May-2011	In parse, NONE is a keyword unless it comes after TO or THRU, then it's looked up. >> parse [#[none!]] [none] ; as a keyword == false >> parse [#[none!]] [thru none] ; looked up == true Same behaviour in R2 and R3.
Geomol: 2-May-2011	Maybe it would be a good idea to make all these combination trigger an invalid argument error? any end some end opt end into end set end ... copy end ... thru end and then only let to end be valid.
BrianH: 2-May-2011	[set var end] sets the var to none; [copy var end] sets to none in R2, the empty string/block in R3; [thru end] doesn't match, so it should just get a warning in case the rules were written to expect that; [opt end] is definitely legit; perhaps [any end] and [some end] should get warnings for R2, but keep in mind that rules like [any [end]] and [some [end]] are much more common, have the same effect, and are more difficult to detect; [into end] properly trigers an error in R2 and R3 because the end is not in a block, while [into [end]] is legit and safe.
BrianH: 2-May-2011	So you want to allow COPY, SET and OPT. Warn about THRU (because of the bug), ANY and SOME, because of R3 compatibility. Trigger an error for INTO if its argument rule isn't a block or a word referring to a block, but nothing special if that rule is END.
Geomol: 4-May-2011	[any end]�and [some end] As we don't have warnings, I suggest these to produce errors. They can produce endless loops, and that should be pointed out in the docs, if they don't produce errors. [opt end] Yes, it's legit, but what's the point of this combination? At best, the programmer knows, what she does, and the combination will do nothing else than slowing the program down. At worst, the programmer misinterpret this combination, and since it doesn't produce an error or anything, it's a source of confusion. I suggest to make it produce an error. [into end] Produces an error today, so fine. [set end ...] and [copy end ...] I wasn't thinking of [set var end], but about setting a var named end to something, like [set end integer!]. Problem with this is, that now the var, end, can be used and looks exactly like the keyword, end, maybe leading to confusion. But after a second thought, maybe this being allowed is ok. [thru end] Making this produce an error will solve the problem with the confusion around, what this combination mean. And in the first place, it's a bad way to produce a 'fail' rule (in R2, in R3 it has the value true, and parsing continues). It's slow compared to e.g. [end skip].
Geomol: 4-May-2011	These are just suggestions to make a better PARSE. I've learnt, it's a good idea to not allow most combinations of keywords in R2 parse. Another example: >> parse [] [opt into ['a]] == false >> bparse [] [opt into ['a]] ** User Error: Invalid argument: into The PARSE result is wrong, as I see it. My BPARSE produce an error. Better?
Ladislav: 4-May-2011	What you suggest is just a bunch of exceptions in the behaviour, which is always bad
Geomol: 4-May-2011	Here: http://www.rebol.com/r3/docs/concepts/parsing-summary.html#section-11 Input position must change . And the solution was to invent a new keyword, WHILE. Hm...
BrianH: 4-May-2011	If you're going to make a better parse, it might be good to take into account the efforts that have already started to improve it in R3. The R3 improvements need a little work in some cases, but the thought that went into the process is quite valuable. [set end ...] or [copy end ...]: In R3, using any PARSE keyword (not just 'end) in a rule for other reasons triggers an error. >> parse [a] [set end skip] ** Script error: PARSE - command cannot be used as variable: end [any end] or [some end]: What Ladislav said. [opt end]: The point of the combination is [opt [end (do something)]]. [opt anything] is no more useless than [opt end]. Don't exclude something that has no effect just for that reason. Remember, [none] has no effect as well, but it's still valuable for making rules more readable.
Geomol: 13-May-2011	Maxim, you asked for a function version of string parse. Was that because of situations like this?
Maxim: 13-May-2011	its because I do A LOT more parsing on strings than on blocks.... one of the reasons is that Carl won't allow us to ignore commas in string data. so the vast majority of data which could be read directly by rebol is incompatible. this is still one of my pet peeves in rebol. trying to be pure, sometimes, just isn't usefull in real life. PARSE is about managing external data, I hate the fact that PARSE isn't trying to be friendly with the vast majority of data out there.
Maxim: 13-May-2011	so a comma would be an exact alias for a space, when its not within a string.
Geomol: 13-May-2011	I almost agree. Here we use comma as decimal point. A few countries does that. So all data with money amounts have numbers with comma as decimal point here.
Geomol: 13-May-2011	But it should be possible to take care of those numbers with commas, and ignore all other commas, I think. As we don't ever write 42, but always something like 42,00 if it's a decimal. So if 42, is seen, it can just be read as integer 42 and ignore the comma (if using load/all for example).
onetom: 13-May-2011	this is exactly the reason why CSV was it a really fucked up idea. comas are there in sentences and multivalued fields, not just numbers. i always use TSV.
onetom: 13-May-2011	it would make sense to settle w some CSV parser, but not as a default behaviour. i was already surprised that parse handles double quotes too...
onetom: 13-May-2011	>> parse/all {"asd qwe" zxc} none == ["asd qwe" " zxc"] >> parse/all {"asd qwe" zxc} " " == ["asd qwe" "zxc"] it's nice, but it also means there is no plain "split-by-a-character" function in rebol, which is just as annoying as missing a join-by-a-character
Tomc: 14-May-2011	Although gerneral happy with the default parse seperators find it neglegent to not permit overriding them. and like Max finds, block parsing ia a rarity when working with real world data streams.
Maxim: 15-May-2011	parse/all string none actually is a CSV loader. its not a split functions. I always found this dumb, but its the way Carl implemented it.
Maxim: 15-May-2011	rule, when given as a string is used to specify the CSV separator.
Steeve: 18-Jun-2011	only the second string is checked. Should be: ['apple some [and string! into ["a" some "b" ]]]
onetom: 4-Aug-2011	Parse (YC S11): A Heroku For Mobile Apps. Great name for a startup... http://techcrunch.com/2011/08/04/yc-funded-parse-a-heroku-for-mobile-apps/
Sunanda: 31-Oct-2011	Can anyone gift me an effecient R2 'parse solution for this problem (I am assuming 'parse will out-perform any other approach): SET UP I have a huge list of HTML named character entities, eg (a very short example): named-entities: ["nbsp" "cent" "agrave" "larr" "rarr" "crarr" ] ;; etc And I have some text that may contain some named entities, eg: text: "To send, press the ← arrow & then press &crarr;." PROBLEM I want to escape every "&" in the text, unless it is part of a named entity, eg (assuming a function called escape-amps): probe escape-amps text entities == "To send, press the ← arrow & then press &crarr;." TO MAKE IT EASY.... You can can assume a different set up for the named-entities block if you want; eg, this may be better for you: named-entities: [" " "¢" "à" "←" "→" "&crarr;" ] ;; etc Any help on this would be much appreciated!
Geomol: 31-Oct-2011	That's strange. My 2nd suggestion gives a different result: ne: ["larr;" \| "crarr;"] s: "To send, press the ← arrow & then press &crarr;." parse s [ any [ thru #"&" [ne \| mark: (insert mark "amp;")] ] ] s == {To send, press the ← arrow & amp;then press &crarr;.} Seems like a bug, or am I just tired?
Sunanda: 31-Oct-2011	Thanks for the quick contributions, geomol. I see a different result too -- a space between the "&" and the "amp"
Ladislav: 31-Oct-2011	'I want to escape every "&" in the text, unless it is part of a named entity' - just to make sure: if the entity is not in the ENTITIES list, like e.g. " and it is encountered in the given TEXT, what exactly should happen?
Sunanda: 31-Oct-2011	Ladislav -- if it is not in the list, then I'd like it escaped, please. Think of it as a whitelist of ecceptable named entities. All others are suspect :)
Ladislav: 31-Oct-2011	I guess, that this should be efficient: alpha: make bitset! [#"a" - #"z" #"A" - #"Z"] escape-amps: func [ text [string!] entities [hash!] /local result pos1 pos2 ][ result: copy "" parse/all text [ pos1: any [ ; find the next amp thru #"&" pos2: [ ; entity check some alpha pos3: #";" ( ; entity candidate unless find entities copy/part pos2 pos3 [ ; not an entity insert insert tail result copy/part pos1 pos2 "amp;" pos1: pos2 ] ) \| ( ; not an entity insert insert tail result copy/part pos1 pos2 "amp;" pos1: pos2 ) ] \| (insert tail result pos1) end skip ; no amp found ] ] result ]
Ladislav: 31-Oct-2011	Err: pos3 should be added as a local
Sunanda: 31-Oct-2011	Thanks Ladislav and Geomol. Both your solutions work with my test data -- that's always a good sign :) I'll do some timing tests with large entity lists ..... But I won't be able to do that for 24 hours. Other approaches still welcome!
Andreas: 31-Oct-2011	Two suggestions: - store your named entities as a hash! (order of magnitude speedup for FIND) - if you have loooong "words", restrict Ladislav's `some alpha` to the maximum length of a valid entity
Ladislav: 31-Oct-2011	This alternative does not use the COPY call, so, it has to be faster: alpha: make bitset! [#"a" - #"z" #"A" - #"Z"] escape-amps: func [ text [string!] entities [hash!] /local result pos1 pos2 pos3 ][ result: copy "" parse/all text [ pos1: any [ ; find the next amp thru #"&" pos2: [ ; entity check some alpha pos3: #";" ( ; entity candidate unless find entities copy/part pos2 pos3 [ ; not an entity insert insert/part tail result pos1 pos2 "amp;" pos1: pos2 ] ) \| ( ; not an entity insert insert/part tail result pos1 pos2 "amp;" pos1: pos2 ) ] \| (insert tail result pos1) end skip ; no amp found ] ] result ]
PeterWood: 1-Nov-2011	Perhaps building a parse rule from the list of entities may be faster if there is a lot of text to process: This assumes the entities are provided as strings in a block. escape-amps: func [ text [string!] entities [block!] ][ skip-it: complement charset [#"&"] entity: copy [] foreach ent entities [ insert entity compose [(ent) \|]] head remove back tail entity parse/all text [ any [ entity \| "&" pos: (insert pos "amp;" pos: skip pos 4) :pos \| some skip-it ] ] head tex t ]
PeterWood: 1-Nov-2011	Also I feel using skip could be very slow if the text contains a lot of "non-matching text". The "skip-it" technique could also be applied to Ladislav's code.
Sunanda: 1-Nov-2011	Wow -- thanks Gabriele. For me, your powermezz is a much overlooked gem. I fear I have, in effect, badly implemented chunks of your functionality over the past few months while I've worked on an application that takes unconstrained text and constrains it to look okay in a web page and when printed via LaTeX. I should have read the documentation first!
Sunanda: 1-Nov-2011	I've put aside looking at the powermezz for now, and simply decided to use one of the three case-specific solutions offered here. I made some tweaks to ensure the comparisons I was making were fair (and met a previously unstated condition). -- each in a func -- each works case sensitively (as previously unstated) -- use the complete entity set as defined by the WC3 -- changed Ladislav's Charset as some named entities have digits in their names -- moved Peter's set-up of his entity list out of the function and into one-off init code. It's been a fun hour of twiddling other people's code.....If you want your modifed code -- please kust ask. Timing results next .....
Sunanda: 1-Nov-2011	My test data was heavily weighted towards the live conditions I expect to encounter (average text length 2000. Most texts are unlikely to have more than 1 named entity). All three scripts produced the same results -- so top marks for meeting the spec! Under my test conditions, Ladislav was fastest, followed by Geomol, followed by Peter. Other test conditions changed those rankings....So nothing is absolute. Using a Hash! contributed a lot to Ladislav's speed -- when I tried it as a Block! it was only slightly faster than Geomol's.....What a pity R3 removes hash! Thanks for contributing these solutions -- I've enjoyed looking at your code and marvelling at the different approaches REBOL makes possible.
Ladislav: 1-Nov-2011	Using a Hash! contributed a lot to Ladislav's speed -- when I tried it as a Block! it was only slightly faster than Geomol's.....What a pity R3 removes hash! - no problem, in R3 you can use map!
Sunanda: 1-Nov-2011	That's true, but map! isa bit awkward for just looking up an item in a list.....Map! is optimised for retrieving a value associated with a key.
Ladislav: 1-Nov-2011	Another solution is to use a sorted block and a binary search, which should be about the same speed as hash
Sunanda: 1-Nov-2011	Yes, it is doable with map! -- but, as I said awkward. Another issue (or perhaps just unfixed bug) is the lack of case sensitivity with map! select/case make map! ["A" true] "a" == true The current work-around is to use binary rather than string data: select make map! reduce [to-binary "A" true] to-binary "a" == none
Ladislav: 1-Nov-2011	BTW, I think, that there is a possible optimization not using the charset you mention
Ladislav: 14-Nov-2011	Sorry for not continuing with it, Sunanda, but when I gave it a second thought, it did not look like a possible speed-up could be worth the source code complication.
Ladislav: 14-Nov-2011	Another Parse discussion subject: It looked to me like a good idea to be able in one Parse pass to sometimes match some strings in a case-sensitive way and other strings in a case-insensitive way. This is not possible using the /CASE refinement, since the refinement makes all comparison case sensitive, or if not used, all comparisons are case insensitive. Wouldn't it be good to be able to adjust the comparison sensitivity on-the-fly during parsing?
Ladislav: 14-Nov-2011	I think, that it should not be overly complicated to achieve the goal e.g. by using a CASE keyword in PARSE.
Ladislav: 14-Nov-2011	(for switching to case-sensitive mode, and e.g. a NO-CASE for switching to case-insensitive mode)
BrianH: 14-Nov-2011	How about a CASE operation that applies to the next rule, which could be a block? No NO-CASE operation required, and better to integrate with backtracking.
BrianH: 14-Nov-2011	It would be a modifier, like OPT or 1.
BrianH: 14-Nov-2011	While we're at it, the KEEP operation from Topaz would be useful. I use PARSE wrapped in COLLECT, calling KEEP in parens, quite a bit.
Ladislav: 14-Nov-2011	How about a CASE operation that applies to the next rule, which could be a block? No NO-CASE operation required - that is an error, even in that case you would need NO-CASE
BrianH: 14-Nov-2011	OK, but you wouldn't need NO-CASE to end a CASE. It would be another modifier, not a mode. Modes like that don't work with backtracking very well. So it would be like this: case ["a" no-case "b" "c"] not like this: case "a" no-case "b" case "c" no-case The two directives would be implemented as flags, like NOT.
Ladislav: 14-Nov-2011	OK, but you wouldn't need NO-CASE to end a CASE. - What I did propose was just the existence of such keywords, the exact implementation should be the one that is the simplest to implement, which may well be the one you mention.
Ladislav: 14-Nov-2011	But, CASE should be a simpler case ;-)
Ladislav: 14-Nov-2011	Will have a look, and, will also use one ticket to let Carl know.
BrianH: 14-Nov-2011	What do you think of the KEEP operation from Topaz? A good idea, or out of scope for PARSE?
Ladislav: 14-Nov-2011	Regarding a KEEP keyword: may be a reasonable addition. I surely prefer KEEP, when choosing between KEEP and CHANGE.
BrianH: 14-Nov-2011	I would definitely not make that choice. I need CHANGE too, and the full version with the value you're changing to be an expression in a paren - the last part of the proposal that isn't implemented yet. That's at the top of my list.

61401 / 64608

[615]