World: r3wp
[Parse] Discussion of PARSE dialect
older newer | first last |
Geomol 25-Sep-2006 [1412] | I would like the functionality, when parsing things like TeX. There the greek letter gamma is called gamma, and the same in capital is called Gamma. Now I have to invent the word capgamma or something. |
Gabriele 25-Sep-2006 [1413] | >> parse ["Gamma"] ["gamma"] == true >> parse/case ["Gamma"] ["gamma"] == false |
Gregg 25-Sep-2006 [1414] | If it were a safe and easy thing to change, I can see some value in it as an option but, since words--and REBOL--are case insensitive, I'm inclined to live with things as they are, and use string parsing if case sensitivity is needed. I think it's Oldes or Rebolek that sometimes requests the ability to parse non-loadable strings, using percentage values as an example. I think loading percentages would be awesome, but then there are other values we might want to load as well; where do you draw the line? I'm waiting to see what R3 holds with custom datatypes and such. |
Oldes 25-Sep-2006 [1415] | Yes, it's me who is calling to add posibility to load anything what is now throwing invalid datatype error. |
Gregg 25-Sep-2006 [1416x2] | And didn't you suggest that values throwing errors could be coerced to string! or another type? e.g. add an /any refinement to load, and any value in the string that can't be loaded would become a string (or maybe you could say you want them to be tags for easy identification). |
I'm not sure how custom datatype lexing would work, unless it did something similar, calling custom lexers when running up against values the standard lexer doesn't understand. I can't remember how Gabriele's custom type mezz code works either; need to look at that. | |
Oldes 25-Sep-2006 [1418x3] | I think, load/next can be used to handle invalid datatypes now: >> b: {1 2 3 'x' ,} == "1 2 3 'x' ," >> while [v: load/next b not empty? second v][probe v b: v/2] [1 " 2 3 'x' ,"] [2 " 3 'x' ,"] [3 " 'x' ,"] ['x' " ,"] ** Syntax Error: Invalid word -- , ** Near: (line 1) , Just add some hadler to convert the invalid datatype to something else what is loadable and then parse as a block |
But such a preloader will slow down:( | |
I would like to know if string based parsing witch would handle all curent rebol datatypes can be faster or same fast as block parsing | |
Geomol 25-Sep-2006 [1421] | Gabriele, yes it works with strings. But I have words! Thing is, I parse the string input from the user and produce words in an internal format. Then I parse those words for the final output, which can be different formats. I would expect parse/case to be case-sensitive, when parsing words, but parse/case is only for strings, therefore my suggestion. |
Gabriele 25-Sep-2006 [1422] | what i'd suggest is - if case is important, don't make them into words :) |
Geomol 25-Sep-2006 [1423] | :D But it makes so much sense to work with words. |
Gabriele 26-Sep-2006 [1424] | sure, but you can only have 8k or them (unless you make sure they never end up in system/words), so if you also counted case... |
Maxim 26-Sep-2006 [1425] | another way to counter the word limit is to use #issue datatype. |
Oldes 26-Sep-2006 [1426x2] | And there is some parse example how to deal with recursions while parsing strings? If you parse block, it's easy detect, what is string! and what is other type, but if you need to parse string, it's not so easy to detect for example strings like {some text {other "text"}} |
(it should be a question - is there such a example?) | |
Rebolek 26-Sep-2006 [1428x2] | Words should be non-case sensitive, but is it always the case? I've found this today accidentaly: >> a: [small Small] == [small Small] >> find/case a to word! "small" == [small Small] >> find/case a to word! "Small" == [Small] |
so /case with words works, at least in 'find | |
Oldes 26-Sep-2006 [1430] | if it's working in find, it should be working on parse as well |
Gabriele 26-Sep-2006 [1431] | well... case insensitivity for words is done via automatic aliasing of words that differ in case only. (i know this because we found a bug related to this :) |
Rebolek 26-Sep-2006 [1432] | so internally, words are case-sensitive? |
Ladislav 26-Sep-2006 [1433] | yes |
Anton 27-Sep-2006 [1434] | Here's an idea to toss into the mix: I am thinking of a new notation for strings using underscore (eg. _"hello"_ ) in a parse block, which allows to specify whether they are delimited by whitespace or not. This would allow you to enable/disable the necessity for delimiters per-string. eg: parse input [ _"house"_ ; a complete word surrounded both sides by whitespace _"hous" ; this would match "house", "housing", "housed" or even "housopoly" etc.. but left side must be whitespace "ad"_ ; this would match "ad", "fad", "glad" and right side must be whitespace ] But this would need string datatype to change. On the other hand, I could just set underscore _ to a charset of whitespace, then use that with parse/all eg: _: charset " ^-^/" parse/all input [ [ _ "house" _ ] ] though that wouldn't be as comfortable. Maybe I can create parse rules from a simpler dialect which understands the underscore _. Just an idea... |
MikeL 27-Sep-2006 [1435] | Anton, Andrew had defined white space patterns in his patterns.r script which seems usable then you can use [ ws* "house" ws*] or other combinations as needed without underscore. Andrew's solution for this and a lot of other things have given me some good mileage over the past few years. WS*: [some WS] and WS?: [any WS]. It makes for clean parse scripts clear once you adopt it. |
Gregg 27-Sep-2006 [1436] | I think either approach above can work well. I like the "look" of the underscore, and have done similar things with standard function names. For SOME, ANY, and OPT, the tag chars I prefer are +, *, and ? resepctively; which are EBNF standard. |
Anton 27-Sep-2006 [1437x2] | Oh yes, I've seen Andrew's patterns.r. I was just musing how to make it more concise without even using a short word like WS. Actually the use case which sparked this idea was more of a "regex-level" pattern matcher, just a simple pattern matcher where the user writes the pattern to match filenames and to match strings appearing in file contents. |
Gregg, + * ? could be a good idea. I'll throw that into my mix-bowl. | |
Gregg 28-Sep-2006 [1439] | I also have a naming convention I've been playing with for a while, where parse rule words have an "=" at the end (e.g. date=) and parse variables--values set during the parse process--have it at the beginning (e.g. =date). The idea is that it's sort of a cross between BNF syntax for production rules and set-word/get-word syntax; the goal being to easily distinguish parse-related words. By using the same word for a rule and an associated variable, with the equal sign at the head or tail, respectively, it also makes it easier to keep track of what gets set where, when you have a lot of rules. |
Maxim 28-Sep-2006 [1440x3] | simple and clean, good idea! |
I'm just starting to be able to actually USE parse for dialecting. So far I've been almost solely using it to replace regexp functionality. | |
so many years of reboling (since core 1.2) , and still parse remains largely untaimed by myself. | |
Graham 29-Sep-2006 [1443x9] | This was I thought a simple task .. to parse a csv file.... |
COHEN ,"WILLIAM ",""," 305782","123 "C" AVENUE","CORONADO ","CA","92118","560456788","(619)555-2730","( ) - 0","08/22/1927","M","SHARP CORONADO/MISSI","","","","","POLLICK","JAMES ","","MOUNTAIN","RODERICK ","", | |
this seems to be a difficult line as there is an embedded quote viz "123 "c" Avenue" | |
this is Gabriele's published parser CSV-parser: make object! [ line-rule: [field any [separator field]] field: [[quoted-string | string] (insert tail fields any [f-val copy ""])] string: [copy f-val any str-char] quoted-string: [{"} copy f-val any qstr-char {"} (replace/all f-val {""} {"})] str-char: none qstr-char: [{""} | separator | str-char] fields: [] f-val: none separator: #";" set 'parse-csv-line func [ "Parses a CSV line (returns a block of strings)" line [string!] /with sep [char!] "The separator between fields" ] [ clear fields separator: any [sep #";"] str-char: complement charset join {"} separator parse/all line line-rule copy fields ] ] | |
which was written to cope with embedded quotes, but fails where there is an empty field eg , "" , | |
This is Joel Neely's from the same day ... readcsv: make object! [ all-records: copy [] one-record: copy [] one-segment: copy "" one-field: copy "" noncomma: complement charset "," nonquote: complement charset {"} segment: [ copy one-segment any nonquote (if found? one-segment [append one-field one-segment]) ] quoted: [ {"} (one-field: copy "") segment any [{""} (append one-field {"}) segment] {"} ] unquoted: [copy one-field any noncomma] field: [[quoted | unquoted] (append one-record one-field)] record: [field any ["," field]] run: func [f [file!] /local line] [ all-records: copy [] foreach line read/lines f [ one-record: copy [] either parse/all line record [ append/only all-records one-record ][ print ["parse failed:" line] ] ] all-records ] ] | |
which reports an error with this line. | |
this might fix Gabriele's parser .. CSV-parser: make object! [ line-rule: [field any [separator field]] field: [[quoted-string | string] (insert tail fields any [f-val copy ""])] string: [copy f-val any str-char] quoted-string: [{"} copy f-val any qstr-char {"} (if found? f-val [ replace/all f-val {""} {"}])] str-char: none qstr-char: [{""} | separator | str-char] fields: [] f-val: none separator: #";" set 'parse-csv-line func [ "Parses a CSV line (returns a block of strings)" line [string!] /with sep [char!] "The separator between fields" ] [ clear fields separator: any [sep #";"] str-char: complement charset join {"} separator parse/all line line-rule copy fields ] ] | |
perhaps not. | |
sqlab 29-Sep-2006 [1452] | Why you do not use split? |
Gabriele 29-Sep-2006 [1453x2] | graham, iirc my version is meant to handle embedded quotes when properly escaped, i.e. you should have "123 ""C"" AVENUE" there for it to work. |
i actually wonder why are quotes used in that line. they are only needed if the field contains the separator. | |
Graham 29-Sep-2006 [1455] | split will work if there are no embedded commas I guess |
Anton 3-Oct-2006 [1456] | What's the parse rule to go backwards ? -1 skip ? |
Oldes 3-Oct-2006 [1457x2] | maybe this will help: x: [1 2 3 4 5] parse x [any [x: set d number! (probe x probe d x: next x) :x]] |
you can set the x to another position if you need | |
Anton 3-Oct-2006 [1459] | Ah yes - very good :) |
Maxim 3-Oct-2006 [1460x2] | my god, I think I finally -get- Parse... call me the village idiot. I used to use parse, now I also understand subconciously it ;-) |
that should read "... I also understand it subconciously" | |
older newer | first last |