AltME groups: search
Help · search scripts · search articles · search mailing listresults summary
world | hits |
r4wp | 4382 |
r3wp | 44224 |
total: | 48606 |
results window for this page: [start: 13001 end: 13100]
world-name: r3wp
Group: Parse ... Discussion of PARSE dialect [web-public] | ||
Gregg: 28-Apr-2006 | I think that's where string parsing comes in, and where having rules for REBOL datatypes would ease the pain. | |
Volker: 1-May-2006 | How about another way: integrate datatypes in string-parser. Basically a load/next and check for type. Then we can write (note i parse a string): parse "1 a , #2" [ integer! word! "," issue! ] | |
Ashley: 24-May-2006 | Quick question for the parse experts. If I have a parse rule that looks like this: parse spec [ any [ set arg string! (...) | set arg tuple! (...) | ... ] ] How would I add a rule like: set arg paren! (reduce arg) that always occurred prior to the any rule and evaluated parenthesized expressions (i.e. I want parenthesized expressions to be reduced to a REBOL value that can be handled by the remainder of the parse rule). | |
Graham: 27-Jun-2006 | My brain is still asleep. How to go thru a document and add <strong> </strong> around every word that is in capitals and is more than a few characters long? | |
Graham: 27-Jun-2006 | pattern search on capitals, mark, copy to space, mark, count length of copy, if long, insert at mark2, and then at mark1, continue ?? | |
Graham: 27-Jun-2006 | that won't work because file is just text and not a block. | |
Graham: 27-Jun-2006 | search for a series of capitalised words and strong them | |
Graham: 27-Jun-2006 | Actually I would like to add a parse problem to the weeklyblog and get people to submit answers :) | |
Graham: 27-Jun-2006 | And give a prize for the shortest answer | |
Graham: 27-Jun-2006 | shortest .. I mean the least number of words, and operators - not in length | |
BrianH: 27-Jun-2006 | Seriously though, three charsets and two temporary variables, there's got to be a more efficient way. | |
Volker: 27-Jun-2006 | and "g" is none | |
Volker: 27-Jun-2006 | because " a: 5 capitals any capitals b:" stops at "g" and friends. | |
BrianH: 27-Jun-2006 | More importantly, it fails at "g" and friends, backtracks and proceeds to the next alternate action, some alpha. | |
BrianH: 27-Jun-2006 | No, that would break out of the enclosing all loop. The end skip will always fail and proceed to the next alternate. | |
BrianH: 28-Jun-2006 | Of course mine doesn't handle words with apostrophes or hyphens in them either. Easy fix though, just add ' and - to the capitals charset. | |
BrianH: 28-Jun-2006 | And do what? | |
Graham: 28-Jun-2006 | A person is writing a text file. It has headings which are denoted by caps, and terminating in ":". | |
BrianH: 29-Jun-2006 | To use the simpler of the CS terms: Parse is a rule-based, recursive-descent string and structure parser with backtracking. It is not a parser generator (like Lex/Yacc) or compiler (like most regex engines) - the engine follows the rules directly. Since Parse is recursive-descent it can handle patterns that regular expressions wouldn't be able to. Since Parse backtracks it can handle patterns that ordinary recursive-descent parsers can't. Basically, it puts the text and structure processing abilities of Perl 5 to shame, let alone those of the lesser regex engines. In theory, Perl 6 has caught up with REBOL, but Perl 6 only exists in theory for now. By the time it becomes actual REBOL should surpass it (especially if I have anything to say about it). | |
BrianH: 29-Jun-2006 | It's pretty easy to demonstrate patterns that regular expressions can't handle. It's only somewhat difficult to demonstrate patterns that can't be handled by a recursive descent parser without backtracking or unlimited lookahead. I have never run into a pattern that can't be handled by Parse in theory - its only limits are in implementation (available memory and recursion depth). I am not qualified to describe its limits. Still, you have to be careful about how you write the rules or they will trip you up. | |
BrianH: 29-Jun-2006 | Volker, it's more like it can do what a compiler-compiler can do without needing to compile :) And backtracking is about the same as unlimited lookahead, but more powerful. | |
[unknown: 9]: 29-Jun-2006 | Thanks Brian, but as is the theme with questions I ask, I don't ask for myself, but rather that the "world" can learn what "we" know. So perhaps you should add your 2 cents to Henriks, and Tom's in a public forum of the Wikibook. | |
Volker: 29-Jun-2006 | the compiling is no big argument as compiler-compilers are for compiled languages anyway ;) the point is, you can mix a grammar and actions for semantics easy. | |
BrianH: 29-Jun-2006 | Reichart, I figured as much (hence the "dry" comment). I'll look over the Wikibook and see if I can help. | |
Volker: 29-Jun-2006 | i guess that depends on the coco. the point is, a bnf by default, and code inside therules, instead of putting things in vars andprocess later. IMHO. | |
BrianH: 29-Jun-2006 | Volker, I've used a lot of compiler-compilers before and reviewed many more, and unlimited lookup or backtracking are rare. | |
Volker: 29-Jun-2006 | then the advantages of parse are beeing like a compiler-compiler and habving unlimited lookup etc? | |
Volker: 29-Jun-2006 | and you can use parse to tokenize first? | |
BrianH: 29-Jun-2006 | Two rounds of parsing, one for tokenizing and one to parse? Interesting. That would work if you don't have control over the source syntax - otherwise load works pretty well for simple languages. | |
Volker: 29-Jun-2006 | Thats where i got the idea: tokenize first and use block-parser :) | |
BrianH: 29-Jun-2006 | My next personal project is to go through the XML/XSL/REST specs and create exactly that. I already have an efficient structure, I just need to fill out the semantics to support the complete logical model of XML. | |
BrianH: 29-Jun-2006 | Still, "run away" is a common and sensible reaction to XML. | |
Gordon: 29-Jun-2006 | I'm a bit stuck because this parse stop after the first iteration. Can anyone give me a hint as to why it stops after one line. Here is some code: data: read to-file Readfile print length? data 224921 d: parse/all data [thru QuoteStr copy Note to QuoteStr thru QuoteStr thru quotestr copy Category to QuoteStr thru QuoteStr thru quotestr copy Flag to QuoteStr thru newline (print index? data)] 1 == false Data contains hundreds of "memos" in a csv file with three fields: Memo, Category and Flag ("0"|"1") all fileds are enclosed in quotes and separated by commas. It would be real simple if the Memo field didn't contain double quoted words; then parse data none would even work; but alas many memos contain other "words". It would even be simple if the memos didn't contain commas, then parse data "," or parse/all data "," would work; but alas many memos contain commas in the body. | |
Gordon: 29-Jun-2006 | I'm pretty sure that you are right in that I have to loop throught the "Data". That was my big stumbling block and the rest is just logic to figure out. Thanks a bunch. | |
Izkata: 29-Jun-2006 | Not sure - I remember seeing it in others' parse rules, so I just put it there and it worked '^^ Take it out and see what happens lol | |
Gordon: 29-Jun-2006 | Hi BrianH; Yes I did try that and the problem was that even though I specified the "," as the delimiter, it came across an embedded quote #"^"" and split the input at the quote. Rebol Shouldn't have split it up that way, to my understanding. I will post some simple data to test. | |
Gordon: 29-Jun-2006 | Tomc: Do I understand that :word would be like "get word" and needed in a parse sentence but you can just use the shortcut 'word' most everywhere else? | |
BrianH: 29-Jun-2006 | The colon before the word prevents the interpreter from evaluating active values like functions and parens. It's a safety thing. | |
Tomc: 29-Jun-2006 | and that would be get 'word not get word | |
Gordon: 29-Jun-2006 | Thanks Tomc and BrianH. I'll chew on it for a while. Meanwhile I'm working on building some test data for the first problem. | |
BrianH: 30-Jun-2006 | That's interesting. Parens and paths used to be active - oh yeah, that was changeda while ago. Still, there are some value types that are active (function types, lit-path, lit-word) and if you think you will get one of these you should disable their evaluation by referencing them with a get-word, unless you intend them to be evaluated. | |
Anton: 30-Jun-2006 | both parens and paths changed between View 1.2.1 and 1.2.5, actually. | |
Gordon: 30-Jun-2006 | DideC: Thanks. I've copied and pasted it for review and added it to my local public library. This script should be useful especially with the html help page. Documentation on a script is very rare and much appreciated. Graham: Did a search using "librarian" and search term of "sql cvs" and didn't come up with anything. Although, I think we've got it covered now anyhow. | |
Graham: 1-Jul-2006 | What I was trying to do above is to look for the macro text preceded by a space or newline, and ending in a space or newline. | |
Graham: 1-Jul-2006 | and then replace in situ. | |
Tomc: 1-Jul-2006 | at the macros and expansons single tokens | |
BrianH: 1-Jul-2006 | HTML/XML entities begin with & and end with ; for just this reason. What kind of text? can you give us an example? | |
Graham: 1-Jul-2006 | Heart: Heart regular rate and rhythm, no rubs, murmurs, or gallops noted. A: Abdomen: soft, nontender, no mass, no hernia, no guarding, no rebound tenderness, no ascites, non obese Hbp Hypertension (high blood pressure) #401.9. Ii Type II Diabetes #250.00 | |
Graham: 1-Jul-2006 | I would have to intercept the keyboard handler to do this .. so I want to try and just do the replacement after he's finished typing. | |
Tomc: 1-Jul-2006 | hmm I am in the bussiness of sharing biological information and I got to say please strongly consider creating an ontology if one does not exist already | |
Graham: 1-Jul-2006 | and there's the proprietary MEDCIN | |
Tomc: 1-Jul-2006 | and the macros should also be part of that ontology | |
Graham: 1-Jul-2006 | actually the file will be saved in a database and loaded when the program starts | |
Graham: 1-Jul-2006 | and the ontologically controlled programs are very expensive due to licensing fees | |
Graham: 1-Jul-2006 | The AMA charge to use their codes, the American College of Pathologists charge to use their SMOMED-CT codes .. and so it goes on. | |
Tomc: 1-Jul-2006 | that the macro-expansoion fioe needs to self check for incidental occurances of a "macro" in an "expansion" and protect against | |
BrianH: 1-Jul-2006 | And it won't have the problem you mention Tomc. | |
Graham: 1-Jul-2006 | so, basically you created a single parse rule from the macro list and then parsed the text in one go. | |
Tomc: 1-Jul-2006 | I am glad to see someone else using here and there ;) | |
BrianH: 1-Jul-2006 | Gabriele and I have worked extensively on such submissions. | |
Graham: 1-Jul-2006 | Yes. So, somehow I need to force the area field to recognise them as newlines and reformat the screen. | |
Henrik: 9-Jul-2006 | how "local" are variables that are set during a parse? I was looking at Geomol's postscript.r and looked at: coords: [err: (pos: none) [set pos pair! | set x number! set y number!] ( either pos [ append output compose [(pos/x) " " (pos/y) " lineto^/"] ][ append output compose [(x) " " (y) " lineto^/"] ] ) ] | |
Henrik: 9-Jul-2006 | yes, I try to print the variable and it just returns none. | |
Henrik: 9-Jul-2006 | actually, there is a difference between my code and this, which may be causing it: I need to loop the block with 'any. I suspect the contents is lost after the first run. | |
Anton: 9-Jul-2006 | And to answer your question, the variables are just regular rebol words, so they are as local as you make them. | |
Oldes: 9-Jul-2006 | and how looks the code you parse? | |
Oldes: 9-Jul-2006 | if the parse is inside function and you set pos in the function as a local - it will be local | |
Henrik: 9-Jul-2006 | where 'image is always first and the remaining items may come in random order | |
Oldes: 9-Jul-2006 | there is no pair and no numbers - the pos must be none | |
Oldes: 9-Jul-2006 | and what exactly do you want? | |
Henrik: 9-Jul-2006 | it works the exact opposite :-) Only the outer 'txt is set, and I can't reach the variable inside the block | |
Henrik: 9-Jul-2006 | the brackets would make it a "real" rule, wouldn't it? it would be possible to replace the rule with a variable and have the rule block placed elsewhere in your code | |
Henrik: 9-Jul-2006 | and it makes the parse scalable, so I can add options later | |
Pekr: 19-Jul-2006 | I tried doing myself small template "engine", which will simply look-up for some marks, and replace values accordingly. I decided to look for the end of the marks and my friend suggested me, that I should name even ending marks, to be clear there is not an error. My parse looks like this: | |
Pekr: 19-Jul-2006 | I now can create simply a func, which will accept mark name, and do some code-block accordingly - sql query, simple replace of value, whatever (well, it will not work for cases like img tags, so it is not as flexible as full html parser in temple for e.g., but hey, it is meant being simple) | |
Pekr: 19-Jul-2006 | ... but should not be simpler, so I wonder - so far, as you can see, mark-x is not finished, so it is ignored. How to catch this case properly and eventually generate error, send email, write to log, whatever? | |
Pekr: 19-Jul-2006 | Maarten - now looking into build-markup - sorry, it is just strange was of doing things .... noone will place rebol code into template, that will not work ... btw - the code is 'done? What happens if someone uploads template with its own code? I want presentation and code separation. | |
Pekr: 19-Jul-2006 | I looked into rsp some time ago, and I liked it, especially as it was complete, with session support etc., but later on I found shlik.org being unavailable ... | |
BrianH: 31-Aug-2006 | Hey, locals and arguments (practically the same thing in REBOL) are the most important difference between closures and plain blocks. The difference is significant but Peters' background with Smalltalk made him miss it - Smalltalk "blocks" look like REBOL blocks but act like functions. | |
Volker: 31-Aug-2006 | No, the main point is, easy definitions of code and referencing the original context. Rebol-blocks do that. | |
Volker: 31-Aug-2006 | The highlights he mentions is: lexically-scoped, code and data, freely mix computations in | |
Volker: 31-Aug-2006 | That scoping is the difference between a closure and doing a "string" here. | |
BrianH: 31-Aug-2006 | REBOL blocks don't reference a context, but they may contain words that reference a context. Still, this distinction makes no difference to the argument that Peters was making - REBOL text processing is more powerful than regex and easier to use. It would be easier to replicate REBOL-style parsing in Python using closures and generators anyway (Peters' real subject), since that is the closest Python gets to Icon-style backtracking. | |
Geomol: 25-Sep-2006 | I would like the functionality, when parsing things like TeX. There the greek letter gamma is called gamma, and the same in capital is called Gamma. Now I have to invent the word capgamma or something. | |
Gregg: 25-Sep-2006 | If it were a safe and easy thing to change, I can see some value in it as an option but, since words--and REBOL--are case insensitive, I'm inclined to live with things as they are, and use string parsing if case sensitivity is needed. I think it's Oldes or Rebolek that sometimes requests the ability to parse non-loadable strings, using percentage values as an example. I think loading percentages would be awesome, but then there are other values we might want to load as well; where do you draw the line? I'm waiting to see what R3 holds with custom datatypes and such. | |
Gregg: 25-Sep-2006 | And didn't you suggest that values throwing errors could be coerced to string! or another type? e.g. add an /any refinement to load, and any value in the string that can't be loaded would become a string (or maybe you could say you want them to be tags for easy identification). | |
Oldes: 25-Sep-2006 | I think, load/next can be used to handle invalid datatypes now: >> b: {1 2 3 'x' ,} == "1 2 3 'x' ," >> while [v: load/next b not empty? second v][probe v b: v/2] [1 " 2 3 'x' ,"] [2 " 3 'x' ,"] [3 " 'x' ,"] ['x' " ,"] ** Syntax Error: Invalid word -- , ** Near: (line 1) , Just add some hadler to convert the invalid datatype to something else what is loadable and then parse as a block | |
Geomol: 25-Sep-2006 | Gabriele, yes it works with strings. But I have words! Thing is, I parse the string input from the user and produce words in an internal format. Then I parse those words for the final output, which can be different formats. I would expect parse/case to be case-sensitive, when parsing words, but parse/case is only for strings, therefore my suggestion. | |
Oldes: 26-Sep-2006 | And there is some parse example how to deal with recursions while parsing strings? If you parse block, it's easy detect, what is string! and what is other type, but if you need to parse string, it's not so easy to detect for example strings like {some text {other "text"}} | |
Anton: 27-Sep-2006 | Here's an idea to toss into the mix: I am thinking of a new notation for strings using underscore (eg. _"hello"_ ) in a parse block, which allows to specify whether they are delimited by whitespace or not. This would allow you to enable/disable the necessity for delimiters per-string. eg: parse input [ _"house"_ ; a complete word surrounded both sides by whitespace _"hous" ; this would match "house", "housing", "housed" or even "housopoly" etc.. but left side must be whitespace "ad"_ ; this would match "ad", "fad", "glad" and right side must be whitespace ] But this would need string datatype to change. On the other hand, I could just set underscore _ to a charset of whitespace, then use that with parse/all eg: _: charset " ^-^/" parse/all input [ [ _ "house" _ ] ] though that wouldn't be as comfortable. Maybe I can create parse rules from a simpler dialect which understands the underscore _. Just an idea... | |
MikeL: 27-Sep-2006 | Anton, Andrew had defined white space patterns in his patterns.r script which seems usable then you can use [ ws* "house" ws*] or other combinations as needed without underscore. Andrew's solution for this and a lot of other things have given me some good mileage over the past few years. WS*: [some WS] and WS?: [any WS]. It makes for clean parse scripts clear once you adopt it. | |
Gregg: 27-Sep-2006 | I think either approach above can work well. I like the "look" of the underscore, and have done similar things with standard function names. For SOME, ANY, and OPT, the tag chars I prefer are +, *, and ? resepctively; which are EBNF standard. | |
Anton: 27-Sep-2006 | Oh yes, I've seen Andrew's patterns.r. I was just musing how to make it more concise without even using a short word like WS. Actually the use case which sparked this idea was more of a "regex-level" pattern matcher, just a simple pattern matcher where the user writes the pattern to match filenames and to match strings appearing in file contents. | |
Gregg: 28-Sep-2006 | I also have a naming convention I've been playing with for a while, where parse rule words have an "=" at the end (e.g. date=) and parse variables--values set during the parse process--have it at the beginning (e.g. =date). The idea is that it's sort of a cross between BNF syntax for production rules and set-word/get-word syntax; the goal being to easily distinguish parse-related words. By using the same word for a rule and an associated variable, with the equal sign at the head or tail, respectively, it also makes it easier to keep track of what gets set where, when you have a lot of rules. | |
Maxim: 28-Sep-2006 | simple and clean, good idea! | |
Maxim: 28-Sep-2006 | so many years of reboling (since core 1.2) , and still parse remains largely untaimed by myself. | |
Izkata: 3-Oct-2006 | That's a ~very~ good example, Oldes... it should be put in the docs somewhere (if it isn't already.) I didn't understand how get-words and set-words worked in parse, either, before.. | |
Rebolek: 4-Oct-2006 | I've got following PARSE problem: I've got string - "<good tag><bad tag><other tag><good tag>" and I want to keep "good tag" and "<>" in other tags change to let's say "X" (I need to change it to HTML entities but that doesn't matter now). So result will look like: "<good tag>Xbad tagXXother tagX<good tag>" I'm working on it for last few hours but still not found sollution. Is there any? | |
Rebolek: 4-Oct-2006 | I'll probable replace everything and then just revert the "good tag" back. It's not very elegant, but... | |
Anton: 4-Oct-2006 | <, and > ? |
13001 / 48606 | 1 | 2 | 3 | 4 | 5 | ... | 129 | 130 | [131] | 132 | 133 | ... | 483 | 484 | 485 | 486 | 487 |