AltME groups: search
Help · search scripts · search articles · search mailing listresults summary
world | hits |
r4wp | 5907 |
r3wp | 58701 |
total: | 64608 |
results window for this page: [start: 61401 end: 61500]
world-name: r3wp
Group: Parse ... Discussion of PARSE dialect [web-public] | ||
BrianH: 1-May-2011 | It seems like an error that is improperly not triggered. SET is supposed to set to a single value, not a series of values - an embedded block is a single value. | |
Ladislav: 1-May-2011 | This is a simple rule: set w rule sets the word 'w to the first value matched. No error. | |
onetom: 1-May-2011 | so, no way to match a complex rule? | |
BrianH: 1-May-2011 | That is not the error I was talking about. This is the error: >> parse [a b] [set w ['a 'b]] == true >> ? w W is a word of value: a It is the attempt to set the value to a complex rule that is the error. It wouldn't be an error to do this: parse [a b] [set w 'a 'b] If we keep the current behavior, there needs to be a lot of strongly worded warnings about the potential gotcha in the PARSE SET docs. | |
Ladislav: 1-May-2011 | You cannot find anywhere a formulation like "set the value to a rule". That is not what can happen. | |
onetom: 1-May-2011 | address! [string! tuple! hash!] parse ["cat" 1.2.3 #4] [set addr1 address!] if im reading this address! looks just like a value reference... | |
BrianH: 1-May-2011 | It could be considered a useful feature, since the whole match needs to match before the SET is performed. However, the docs need to be *extremely* precise about this because the "set the value to a rule" interpretation is a common misconception among newbie PARSE writers. It would be good for the docs to give an example of the type of code this allows you to do, explaining the difference. | |
BrianH: 1-May-2011 | onetom, It's not "set the variable to a rule", it is instead "match a rule then set the variable to the value at the current position in the data". | |
Ladislav: 1-May-2011 | as stated in the "Idioms" section, I think, that a: [set b c] shall be equivalent to: f: [(set/any [b] if lesser? index? e index? d [e])] a: [and [c d:] e: f :d] | |
BrianH: 1-May-2011 | By "It could be considered a useful feature" I mean that if this were not allowed, you would have to write this: parse data [set x ['a 'b]] like this instead: parse data [and ['a 'b] set x skip skip] | |
Ladislav: 1-May-2011 | Yes, Brian, you just wrote it in a simpler way | |
BrianH: 1-May-2011 | Those full equivalences are great for someone who really needs to know how things work internally (such as when they need to clone PARSE), but you need simple examples first in docs for people who just want to use PARSE properly. Btw, has anyone started a set of full PARSE docs in DocBase? The parse project page could be raided for information, but it really doesn't serve as a full parse manual. | |
Ladislav: 1-May-2011 | This is the formulation used to document the SET directive: If the subexpression match succeeds, the set operation sets the given variable to the first matched value, while the copy operation copies the whole part of the input matched by the given subexpression. For a more detailed description see the Parse idioms section. | |
BrianH: 1-May-2011 | Sounds accurate, if a little intimidating to newbies. I wish we had a really good PARSE manual that could turn newbies into experts. | |
Geomol: 1-May-2011 | PARSE in R2 seems to have less support for combined keywords than R3, as can be seen in this example: >> parse [] [opt some 'a] ** Script Error: Invalid argument: some But there is no error, when combining OPT and THRU: >> parse [] [opt thru 'a] == false Should that trigger an error? If no error, it should return true, right? | |
Ladislav: 1-May-2011 | opt thru 'a means the same as [opt thru] 'a | |
BrianH: 1-May-2011 | Ladislav, why would [opt thru 'a] not mean [opt [thru 'a]]? Isn't that the definition of opt? | |
Geomol: 1-May-2011 | Ladislav, but shouldn't then [opt some 'a] mean [[opt some] 'a] ? It gives an error in R2. | |
Geomol: 1-May-2011 | Kinda same problem, when combining them this way (still in R2): >> parse [] [thru opt 'a] == false >> parse [a] [thru opt 'a] == false | |
BrianH: 1-May-2011 | In R3 that would be an improperly untriggered error, since TO and THRU are defined to not take the full gamut of rules, only a subset. Probably the same for R2, but a different subset. | |
Geomol: 1-May-2011 | Ok, version 1.0.0 of BPARSE is found here: http://www.fys.ku.dk/~niclasen/rebol/libs/bparse.r It's a function version of PARSE, and can only parse blocks for now, not strings. It can do more or less all PARSE in R2 can do when parsing blocks. I've tried to trigger errors, R2 PARSE doesn't. The purpose is to play around with parsing to maybe make a better version than the native version and without bugs. | |
Geomol: 1-May-2011 | It's not as fast as the timings, I gave here earlier with a very early version. | |
Geomol: 1-May-2011 | I've thought some more about [thru end], which return false in the R2 version, but return true in R3. My version return false as R2, but I better understand the R3 way, now I've programmed it. It can be seen as, how THRU should be understood (, as also Ladislav said something about)? Do we think of [thru rule] as [to rule rule] or [to rule skip] ? If the TO keyword can handle complex rules like: parse [a b] [to ['a 'b] ['a 'b]] then the first might make better sense, and [thru end] should return true. But we can't parse like that in R2, so maybe we think more of it as the second, and then [thru end] should return false. But if you look in my version, I have to catch this special case, where END follows THRU, so it takes more code, which isn't good. In any case, Ladislav's suggestion to use [end skip] as a fail rule is much better. If you're not at the end, the first word (end) will give false, else the next will fail, as you can't skip past end. | |
BrianH: 1-May-2011 | END is a zero-length repeatable rule, like NONE, so TO END and THRU END should be equivalent. | |
BrianH: 1-May-2011 | I'd consider that an error in R2's PARSE, but not a fixable one because it would change the semantics. | |
Geomol: 1-May-2011 | Now you have your own function version of parse, that you can make work exactly as you wish. :-) And then maybe, when you're satisfied, give it to Carl. It should now also be easier to make C versions of parse for those, who make alternatives to REBOL. At least you have a REBOL function to start with. | |
Geomol: 1-May-2011 | Yes, parsing a dialect I have to produce PDF output. | |
Maxim: 1-May-2011 | did you start work on a string parser? | |
Maxim: 1-May-2011 | bah, I'd just stick with R3 parsing for Red. it'll be a good incentive for some to upgrade. | |
Geomol: 1-May-2011 | Having skip as a keyword mean, you can't use that word as a variable. | |
BrianH: 1-May-2011 | Most people tend to not use 'skip as a variable anyways, because of the SKIP function. | |
BrianH: 1-May-2011 | For the mezzanine version, two functions might be better, though they can share code in the same module. Maybe just have one exported word for a dispatch function though. | |
Geomol: 1-May-2011 | parse [a b c] ['aAND'bAND'cEND] hmm, yeah, you've got a point. | |
BrianH: 1-May-2011 | We're probably fine with the wording we got. Though strangely enough, | is the ELSE of the IF operation. ELSE is a more descriptive name for | than OR in general. | |
Ladislav: 1-May-2011 | Geomol: [to rule skip] does not mean the same as [thru rule] , as can be demonstrated when comparing the behaviour of thru rule for rule = "abc" It is quite a surprise for me, that you don't see the difference. | |
Geomol: 2-May-2011 | In R2 parsing a block: >> parse ["abc"] [to "abc" skip] == true >> parse ["abc"] [thru "abc"] == true I know, it's different when parsing a string instead of a block. My comparison of [thru rule] to the alternatives was meant as a loose comparison, not to be taken literally. So it's easy to think of THRU to work this way, because it does in many cases, therefore the confusion. | |
Ladislav: 2-May-2011 | But, the recursive description: a: [b | skip a] is quite natural. | |
Geomol: 2-May-2011 | Yes, and that should work in all cases, if the b rule is found, complex or not. And this will return true, if b is END, because END is a repeatable rule (you can't go past it with SKIP). NONE is also repeatable, and if you look in the code, I have to take care of this too separately. This mean, we can't parse none of datatype none! by using the NONE keyword, but we can using a datatype: >> parse reduce [none] [none] == false >> parse reduce [none] [none!] == true So it raises the question, if the NONE keyword should be there? What are the consequences, if we drop NONE as a keyword? And are there other repeatable rules beside END and NONE? In R2 or R3. | |
Geomol: 2-May-2011 | Ok, what is a good source of information to read about parsing in general? The Top Down Parsing Language family etc.? | |
Geomol: 2-May-2011 | Is the "empty string rule" covered by butting a | without anything after it? Like in: >> parse [] ['a |] == true >> parse [] ['a | none] == true | |
Geomol: 2-May-2011 | It could be interesting to creat an absolutely minimal PARSE function, that can handle all we expect from such a function but with as little code as possible (as few keywords as possible). | |
Ladislav: 2-May-2011 | BTW (looks a unlucky to me), do you know, that in REBOL the NONE rule can fail? | |
Geomol: 2-May-2011 | With bparse, this hangs: bparse [a b c] [some [none]] but it can be stopped by hitting <Esc>. | |
Geomol: 2-May-2011 | In parse, NONE is a keyword unless it comes after TO or THRU, then it's looked up. >> parse [#[none!]] [none] ; as a keyword == false >> parse [#[none!]] [thru none] ; looked up == true Same behaviour in R2 and R3. | |
Geomol: 2-May-2011 | Maybe it would be a good idea to make all these combination trigger an invalid argument error? any end some end opt end into end set end ... copy end ... thru end and then only let to end be valid. | |
BrianH: 2-May-2011 | [set var end] sets the var to none; [copy var end] sets to none in R2, the empty string/block in R3; [thru end] doesn't match, so it should just get a warning in case the rules were written to expect that; [opt end] is definitely legit; perhaps [any end] and [some end] should get warnings for R2, but keep in mind that rules like [any [end]] and [some [end]] are much more common, have the same effect, and are more difficult to detect; [into end] properly trigers an error in R2 and R3 because the end is not in a block, while [into [end]] is legit and safe. | |
BrianH: 2-May-2011 | So you want to allow COPY, SET and OPT. Warn about THRU (because of the bug), ANY and SOME, because of R3 compatibility. Trigger an error for INTO if its argument rule isn't a block or a word referring to a block, but nothing special if that rule is END. | |
Geomol: 4-May-2011 | [any end]Êand [some end] As we don't have warnings, I suggest these to produce errors. They can produce endless loops, and that should be pointed out in the docs, if they don't produce errors. [opt end] Yes, it's legit, but what's the point of this combination? At best, the programmer knows, what she does, and the combination will do nothing else than slowing the program down. At worst, the programmer misinterpret this combination, and since it doesn't produce an error or anything, it's a source of confusion. I suggest to make it produce an error. [into end] Produces an error today, so fine. [set end ...] and [copy end ...] I wasn't thinking of [set var end], but about setting a var named end to something, like [set end integer!]. Problem with this is, that now the var, end, can be used and looks exactly like the keyword, end, maybe leading to confusion. But after a second thought, maybe this being allowed is ok. [thru end] Making this produce an error will solve the problem with the confusion around, what this combination mean. And in the first place, it's a bad way to produce a 'fail' rule (in R2, in R3 it has the value true, and parsing continues). It's slow compared to e.g. [end skip]. | |
Geomol: 4-May-2011 | These are just suggestions to make a better PARSE. I've learnt, it's a good idea to not allow most combinations of keywords in R2 parse. Another example: >> parse [] [opt into ['a]] == false >> bparse [] [opt into ['a]] ** User Error: Invalid argument: into The PARSE result is wrong, as I see it. My BPARSE produce an error. Better? | |
Ladislav: 4-May-2011 | What you suggest is just a bunch of exceptions in the behaviour, which is always bad | |
Geomol: 4-May-2011 | Here: http://www.rebol.com/r3/docs/concepts/parsing-summary.html#section-11 Input position must change . And the solution was to invent a new keyword, WHILE. Hm... | |
BrianH: 4-May-2011 | If you're going to make a better parse, it might be good to take into account the efforts that have already started to improve it in R3. The R3 improvements need a little work in some cases, but the thought that went into the process is quite valuable. [set end ...] or [copy end ...]: In R3, using any PARSE keyword (not just 'end) in a rule for other reasons triggers an error. >> parse [a] [set end skip] ** Script error: PARSE - command cannot be used as variable: end [any end] or [some end]: What Ladislav said. [opt end]: The point of the combination is [opt [end (do something)]]. [opt anything] is no more useless than [opt end]. Don't exclude something that has no effect just for that reason. Remember, [none] has no effect as well, but it's still valuable for making rules more readable. | |
Geomol: 13-May-2011 | Maxim, you asked for a function version of string parse. Was that because of situations like this? | |
Maxim: 13-May-2011 | its because I do A LOT more parsing on strings than on blocks.... one of the reasons is that Carl won't allow us to ignore commas in string data. so the vast majority of data which could be read directly by rebol is incompatible. this is still one of my pet peeves in rebol. trying to be pure, sometimes, just isn't usefull in real life. PARSE is about managing external data, I hate the fact that PARSE isn't trying to be friendly with the vast majority of data out there. | |
Maxim: 13-May-2011 | so a comma would be an exact alias for a space, when its not within a string. | |
Geomol: 13-May-2011 | I almost agree. Here we use comma as decimal point. A few countries does that. So all data with money amounts have numbers with comma as decimal point here. | |
Geomol: 13-May-2011 | But it should be possible to take care of those numbers with commas, and ignore all other commas, I think. As we don't ever write 42, but always something like 42,00 if it's a decimal. So if 42, is seen, it can just be read as integer 42 and ignore the comma (if using load/all for example). | |
onetom: 13-May-2011 | this is exactly the reason why CSV was it a really fucked up idea. comas are there in sentences and multivalued fields, not just numbers. i always use TSV. | |
onetom: 13-May-2011 | it would make sense to settle w some CSV parser, but not as a default behaviour. i was already surprised that parse handles double quotes too... | |
onetom: 13-May-2011 | >> parse/all {"asd qwe" zxc} none == ["asd qwe" " zxc"] >> parse/all {"asd qwe" zxc} " " == ["asd qwe" "zxc"] it's nice, but it also means there is no plain "split-by-a-character" function in rebol, which is just as annoying as missing a join-by-a-character | |
Tomc: 14-May-2011 | Although gerneral happy with the default parse seperators find it neglegent to not permit overriding them. and like Max finds, block parsing ia a rarity when working with real world data streams. | |
Maxim: 15-May-2011 | parse/all string none actually is a CSV loader. its not a split functions. I always found this dumb, but its the way Carl implemented it. | |
Maxim: 15-May-2011 | rule, when given as a string is used to specify the CSV separator. | |
Steeve: 18-Jun-2011 | only the second string is checked. Should be: ['apple some [and string! into ["a" some "b" ]]] | |
onetom: 4-Aug-2011 | Parse (YC S11): A Heroku For Mobile Apps. Great name for a startup... http://techcrunch.com/2011/08/04/yc-funded-parse-a-heroku-for-mobile-apps/ | |
Sunanda: 31-Oct-2011 | Can anyone gift me an effecient R2 'parse solution for this problem (I am assuming 'parse will out-perform any other approach): SET UP I have a huge list of HTML named character entities, eg (a very short example): named-entities: ["nbsp" "cent" "agrave" "larr" "rarr" "crarr" ] ;; etc And I have some text that may contain some named entities, eg: text: "To send, press the ← arrow & then press ↵." PROBLEM I want to escape every "&" in the text, unless it is part of a named entity, eg (assuming a function called escape-amps): probe escape-amps text entities == "To send, press the ← arrow & then press ↵." TO MAKE IT EASY.... You can can assume a different set up for the named-entities block if you want; eg, this may be better for you: named-entities: [" " "¢" "à" "←" "→" "↵" ] ;; etc Any help on this would be much appreciated! | |
Geomol: 31-Oct-2011 | That's strange. My 2nd suggestion gives a different result: ne: ["larr;" | "crarr;"] s: "To send, press the ← arrow & then press ↵." parse s [ any [ thru #"&" [ne | mark: (insert mark "amp;")] ] ] s == {To send, press the ← arrow & amp;then press ↵.} Seems like a bug, or am I just tired? | |
Sunanda: 31-Oct-2011 | Thanks for the quick contributions, geomol. I see a different result too -- a space between the "&" and the "amp" | |
Ladislav: 31-Oct-2011 | 'I want to escape every "&" in the text, unless it is part of a named entity' - just to make sure: if the entity is not in the ENTITIES list, like e.g. " and it is encountered in the given TEXT, what exactly should happen? | |
Sunanda: 31-Oct-2011 | Ladislav -- if it is not in the list, then I'd like it escaped, please. Think of it as a whitelist of ecceptable named entities. All others are suspect :) | |
Ladislav: 31-Oct-2011 | I guess, that this should be efficient: alpha: make bitset! [#"a" - #"z" #"A" - #"Z"] escape-amps: func [ text [string!] entities [hash!] /local result pos1 pos2 ][ result: copy "" parse/all text [ pos1: any [ ; find the next amp thru #"&" pos2: [ ; entity check some alpha pos3: #";" ( ; entity candidate unless find entities copy/part pos2 pos3 [ ; not an entity insert insert tail result copy/part pos1 pos2 "amp;" pos1: pos2 ] ) | ( ; not an entity insert insert tail result copy/part pos1 pos2 "amp;" pos1: pos2 ) ] | (insert tail result pos1) end skip ; no amp found ] ] result ] | |
Ladislav: 31-Oct-2011 | Err: pos3 should be added as a local | |
Sunanda: 31-Oct-2011 | Thanks Ladislav and Geomol. Both your solutions work with my test data -- that's always a good sign :) I'll do some timing tests with large entity lists ..... But I won't be able to do that for 24 hours. Other approaches still welcome! | |
Andreas: 31-Oct-2011 | Two suggestions: - store your named entities as a hash! (order of magnitude speedup for FIND) - if you have loooong "words", restrict Ladislav's `some alpha` to the maximum length of a valid entity | |
Ladislav: 31-Oct-2011 | This alternative does not use the COPY call, so, it has to be faster: alpha: make bitset! [#"a" - #"z" #"A" - #"Z"] escape-amps: func [ text [string!] entities [hash!] /local result pos1 pos2 pos3 ][ result: copy "" parse/all text [ pos1: any [ ; find the next amp thru #"&" pos2: [ ; entity check some alpha pos3: #";" ( ; entity candidate unless find entities copy/part pos2 pos3 [ ; not an entity insert insert/part tail result pos1 pos2 "amp;" pos1: pos2 ] ) | ( ; not an entity insert insert/part tail result pos1 pos2 "amp;" pos1: pos2 ) ] | (insert tail result pos1) end skip ; no amp found ] ] result ] | |
PeterWood: 1-Nov-2011 | Perhaps building a parse rule from the list of entities may be faster if there is a lot of text to process: This assumes the entities are provided as strings in a block. escape-amps: func [ text [string!] entities [block!] ][ skip-it: complement charset [#"&"] entity: copy [] foreach ent entities [ insert entity compose [(ent) |]] head remove back tail entity parse/all text [ any [ entity | "&" pos: (insert pos "amp;" pos: skip pos 4) :pos | some skip-it ] ] head tex t ] | |
PeterWood: 1-Nov-2011 | Also I feel using skip could be very slow if the text contains a lot of "non-matching text". The "skip-it" technique could also be applied to Ladislav's code. | |
Sunanda: 1-Nov-2011 | Wow -- thanks Gabriele. For me, your powermezz is a much overlooked gem. I fear I have, in effect, badly implemented chunks of your functionality over the past few months while I've worked on an application that takes unconstrained text and constrains it to look okay in a web page and when printed via LaTeX. I should have read the documentation first! | |
Sunanda: 1-Nov-2011 | I've put aside looking at the powermezz for now, and simply decided to use one of the three case-specific solutions offered here. I made some tweaks to ensure the comparisons I was making were fair (and met a previously unstated condition). -- each in a func -- each works case sensitively (as previously unstated) -- use the complete entity set as defined by the WC3 -- changed Ladislav's Charset as some named entities have digits in their names -- moved Peter's set-up of his entity list out of the function and into one-off init code. It's been a fun hour of twiddling other people's code.....If you want your modifed code -- please kust ask. Timing results next ..... | |
Sunanda: 1-Nov-2011 | My test data was heavily weighted towards the live conditions I expect to encounter (average text length 2000. Most texts are unlikely to have more than 1 named entity). All three scripts produced the same results -- so top marks for meeting the spec! Under my test conditions, Ladislav was fastest, followed by Geomol, followed by Peter. Other test conditions changed those rankings....So nothing is absolute. Using a Hash! contributed a lot to Ladislav's speed -- when I tried it as a Block! it was only slightly faster than Geomol's.....What a pity R3 removes hash! Thanks for contributing these solutions -- I've enjoyed looking at your code and marvelling at the different approaches REBOL makes possible. | |
Ladislav: 1-Nov-2011 | Using a Hash! contributed a lot to Ladislav's speed -- when I tried it as a Block! it was only slightly faster than Geomol's.....What a pity R3 removes hash! - no problem, in R3 you can use map! | |
Sunanda: 1-Nov-2011 | That's true, but map! isa bit awkward for just looking up an item in a list.....Map! is optimised for retrieving a value associated with a key. | |
Ladislav: 1-Nov-2011 | Another solution is to use a sorted block and a binary search, which should be about the same speed as hash | |
Sunanda: 1-Nov-2011 | Yes, it is doable with map! -- but, as I said awkward. Another issue (or perhaps just unfixed bug) is the lack of case sensitivity with map! select/case make map! ["A" true] "a" == true The current work-around is to use binary rather than string data: select make map! reduce [to-binary "A" true] to-binary "a" == none | |
Ladislav: 1-Nov-2011 | BTW, I think, that there is a possible optimization not using the charset you mention | |
Ladislav: 14-Nov-2011 | Sorry for not continuing with it, Sunanda, but when I gave it a second thought, it did not look like a possible speed-up could be worth the source code complication. | |
Ladislav: 14-Nov-2011 | Another Parse discussion subject: It looked to me like a good idea to be able in one Parse pass to sometimes match some strings in a case-sensitive way and other strings in a case-insensitive way. This is not possible using the /CASE refinement, since the refinement makes all comparison case sensitive, or if not used, all comparisons are case insensitive. Wouldn't it be good to be able to adjust the comparison sensitivity on-the-fly during parsing? | |
Ladislav: 14-Nov-2011 | I think, that it should not be overly complicated to achieve the goal e.g. by using a CASE keyword in PARSE. | |
Ladislav: 14-Nov-2011 | (for switching to case-sensitive mode, and e.g. a NO-CASE for switching to case-insensitive mode) | |
BrianH: 14-Nov-2011 | How about a CASE operation that applies to the next rule, which could be a block? No NO-CASE operation required, and better to integrate with backtracking. | |
BrianH: 14-Nov-2011 | It would be a modifier, like OPT or 1. | |
BrianH: 14-Nov-2011 | While we're at it, the KEEP operation from Topaz would be useful. I use PARSE wrapped in COLLECT, calling KEEP in parens, quite a bit. | |
Ladislav: 14-Nov-2011 | How about a CASE operation that applies to the next rule, which could be a block? No NO-CASE operation required - that is an error, even in that case you *would* need NO-CASE | |
BrianH: 14-Nov-2011 | OK, but you wouldn't need NO-CASE to end a CASE. It would be another modifier, not a mode. Modes like that don't work with backtracking very well. So it would be like this: case ["a" no-case "b" "c"] not like this: case "a" no-case "b" case "c" no-case The two directives would be implemented as flags, like NOT. | |
Ladislav: 14-Nov-2011 | OK, but you wouldn't need NO-CASE to end a CASE. - What I did propose was just the existence of such keywords, the exact implementation should be the one that is the simplest to implement, which may well be the one you mention. | |
Ladislav: 14-Nov-2011 | But, CASE should be a simpler case ;-) | |
Ladislav: 14-Nov-2011 | Will have a look, and, will also use one ticket to let Carl know. | |
BrianH: 14-Nov-2011 | What do you think of the KEEP operation from Topaz? A good idea, or out of scope for PARSE? | |
Ladislav: 14-Nov-2011 | Regarding a KEEP keyword: may be a reasonable addition. I surely prefer KEEP, when choosing between KEEP and CHANGE. | |
BrianH: 14-Nov-2011 | I would definitely not make that choice. I need CHANGE too, and the full version with the value you're changing to be an expression in a paren - the last part of the proposal that isn't implemented yet. That's at the top of my list. |
61401 / 64608 | 1 | 2 | 3 | 4 | 5 | ... | 613 | 614 | [615] | 616 | 617 | ... | 643 | 644 | 645 | 646 | 647 |