AltME groups: search
Help · search scripts · search articles · search mailing listresults summary
world | hits |
r4wp | 5907 |
r3wp | 58701 |
total: | 64608 |
results window for this page: [start: 17101 end: 17200]
world-name: r3wp
Group: Parse ... Discussion of PARSE dialect [web-public] | ||
Anton: 5-Mar-2006 | And do you want to avoid putting them into a block first ? | |
Geomol: 6-Mar-2006 | 'parse' is the path to great explorations and inventions - and also to great confusion and maybe despair. ;-) No really, it can be a bit confusing at times, but I guess, it can't be done otherwise to have such great functionality. There's no short cut with 'parse'. Learning by doing is the way to go. And it's a brilliant tool! | |
Oldes: 7-Mar-2006 | count-word-frequency: func[ "Counts word frequency from the given text" text [string!] "text to analyse" /exclude ex [block!] "words which should not be counted" /local counts f wordchars nonwordchars ][ counts: make hash! 100000 wordchars: charset [#"a" - #"z" #"A" - #"Z" "̊؎ύѪ"] nonwordchars: complement wordchars parse/all text [ any nonwordchars any [ copy word some wordchars ( ;probe word if any [not exclude none? find ex word][ either none? f: find/tail counts word [ repend counts [ word 1 ] ][ change f (f/1 + 1) ] ] ) any nonwordchars ] ] counts: to-block counts sort/skip/compare/reverse counts 2 2 new-line/skip counts true 2 ] | |
Oldes: 7-Mar-2006 | found missing czech chars-> wordchars: charset [#"a" - #"z" #"A" - #"Z" "̊؎ύѪ"] | |
Oldes: 13-Mar-2006 | Is this a bug? parse/all {"some words"} {" } ;== ["some words"] parse/all {and "some words"} {" } ;== ["and" "some words"] parse {and "some words"} {" } ;== ["and" "some" "words"] parse {"some words"} {" } ;== ["some words"] | |
Geomol: 13-Mar-2006 | Good question! It's in a tough corner of REBOL - parsing. REBOL is in many ways more like a human language, than a computer language. Strictly speaking, you can argue, that those examples have a bug or two, but can you live with it? The behaviour might make it difficult to parse input strings, written by humans, because people write all sorts of things. (If it can go wrong, it will.) Try change the quotation marks to something else and see the results change, like: >> parse/all {Xsome wordsX}{X } == ["" "some" "words"] | |
Gabriele: 13-Mar-2006 | parse, without a rule, treats quotes specially. this is to allow parse to be used directly with things like csv data. | |
Oldes: 14-Mar-2006 | I think it's a bug! I was trying to use this to divide large string to words and found that I have all sentences inside , instead of just words. It's problem only if you have the divider on the edge. | |
Gabriele: 14-Mar-2006 | this behavior is the one intended by Carl. so, it's so by design, and not a bug. but, you may try to convince Carl that you don't like it. ;) | |
Oldes: 14-Mar-2006 | I still think it's a bug - I cannot see the diference between parse and parse/all in this example. If Carl don't want to fix it, no problem for me, I used more complicated rule to do the same thing, just still think, it's a bug and it will confuse more people in the future as well. | |
Oldes: 14-Mar-2006 | and parse {,"a b, d" ,d} {,} == ["" "a b, d" "d"] (so probably Carl has true;-) | |
Oldes: 14-Mar-2006 | But it should be in documentation, that the quotes are very special characters for such a type of parsing! | |
JaimeVargas: 28-Apr-2006 | Oldes a regex context will be a good addition. Where regex are the basic rules for numbers, white space, *words* and their negations. | |
Oldes: 28-Apr-2006 | anton: I think, that any parse rule which don have to be global variable, but you can still the name used in parse block. But probably it would be a security issue | |
Gregg: 28-Apr-2006 | I've thought about that as well. There are some base charsets we could probably standardize on, and that would be good (IMO). Beyond a few basics, though, consensus gets tough. | |
Gregg: 28-Apr-2006 | The singular/plural argument seems easy, but isn't (IMO); DIGITS could be done as SOME DIGIT, and you could argue that things like 2 DIGITS reads better, though 1 DIGITS does not. You could double-define it, but that gets ugly too. So, what about DIG? That doesn't imply any singularity, though it's a bit terse, and not a full word (or, rather, the wrong full word). | |
Sunanda: 28-Apr-2006 | II was sure I'd posted this just after Oldes' message.....But it ain't there now.....Maybe it's in the wrong group) Andrew has a nice starter set: http://www.rebol.org/cgi-bin/cgiwrap/rebol/view-script.r?script=common-parse-values.r And I know he has extended that list extensively to include things like email address and URL | |
Graham: 28-Apr-2006 | Maybe there should be no invalid datatypes .... everything can be converted to a datatype | |
Graham: 28-Apr-2006 | if the parser thinks a datatype is invalid, well, let's call it an invalid! datatype!! | |
Graham: 28-Apr-2006 | have a catchall for stuff it thinks is wrong | |
Volker: 1-May-2006 | How about another way: integrate datatypes in string-parser. Basically a load/next and check for type. Then we can write (note i parse a string): parse "1 a , #2" [ integer! word! "," issue! ] | |
Volker: 1-May-2006 | 'invalite! has a problem: its easy to recognize where the wrong part starts, but harder to recognize where the wrong part ends. | |
Ashley: 24-May-2006 | Quick question for the parse experts. If I have a parse rule that looks like this: parse spec [ any [ set arg string! (...) | set arg tuple! (...) | ... ] ] How would I add a rule like: set arg paren! (reduce arg) that always occurred prior to the any rule and evaluated parenthesized expressions (i.e. I want parenthesized expressions to be reduced to a REBOL value that can be handled by the remainder of the parse rule). | |
Ashley: 25-May-2006 | Thanks both, works a treat. | |
Graham: 27-Jun-2006 | My brain is still asleep. How to go thru a document and add <strong> </strong> around every word that is in capitals and is more than a few characters long? | |
Pekr: 27-Jun-2006 | hmm, quite a challenge ... | |
Gordon: 27-Jun-2006 | I agree - a bit much to ask. A more specific question would get a more specific answer :) Something like: file: read filename2parse newfile: "" Foreach word file [ if Is-Capitals Word [ newfile: join newfile ["<strong> " word " </strong> "] ] The Is-Capitals function would have to be defined Is-Capitals func [Word2Check] [ some code here ] | |
Graham: 27-Jun-2006 | that won't work because file is just text and not a block. | |
Volker: 27-Jun-2006 | ;thinking loud: capitals: charset["#"A" - #"Z"] capital: [5 capitals any capitals] | |
BrianH: 27-Jun-2006 | Yes, give me a minute... | |
JaimeVargas: 27-Jun-2006 | capitalize-word: func [ s [string!] /local len ][ either 5 < len: length? s [ s: rejoin ["<strong>" uppercase s/1 next s </strong>] ][ s ] ] capitalize-text: func [ s [string!] /local result word-rule alpha non-alpha w c ][ result: copy {} alpha: charset [#"A" - #"Z" #"a" - #"z"] non-alpha: complement alpha word-rule: [copy w [some alpha] (insert tail result capitalize-word w)] other-rule: [copy c non-alpha (insert tail result c)] parse/all s [some [word-rule | other-rule] end] result ] | |
Graham: 27-Jun-2006 | search for a series of capitalised words and strong them | |
Graham: 27-Jun-2006 | bolden-word: func [ s [string!] /local len ][ either 5 < len: length? s [ s: rejoin ["<strong>" s </strong>] ][ s ] ] enhance-text: func [ s [string!] /local result word-rule alpha non-alpha w c ][ result: copy {} alpha: charset [#"A" - #"Z"] non-alpha: complement alpha word-rule: [copy w [some alpha] (insert tail result bolden-word w)] other-rule: [copy c non-alpha (insert tail result c)] parse/all s [some [word-rule | other-rule] end] result ] | |
BrianH: 27-Jun-2006 | capitals: charset ["#"A" - #"Z"] alpha: charset ["#"A" - #"Z" #"a" - #"z"] non-alpha: complement alpha parse/all/case [any non-alpha any [ a: 5 capitals any capitals b: non-alpha ( b: change/part a rejoin ["<strong>" copy/part a b "</strong>"] b ) :b | some alpha any non-alpha ] to end] | |
BrianH: 27-Jun-2006 | ; A few fixes capitals: charset ["#"A" - #"Z"] alpha: charset ["#"A" - #"Z" #"a" - #"z"] non-alpha: complement alpha parse/all/case [any non-alpha any [ a: 5 capitals any capitals b: [non-alpha | end] ( b: change/part a rejoin ["<strong>" copy/part a b "</strong>"] b ) :b | some alpha any non-alpha ] to end] | |
Graham: 27-Jun-2006 | capitals: charset ["#"A" - #"Z"] ... remove leading " | |
Graham: 27-Jun-2006 | Yeah ... it was a way to mark up text wherever a sequence of CAPS occurs | |
BrianH: 27-Jun-2006 | ; Sorry, more fixes capitals: charset ["#"A" - #"Z"] alpha: charset ["#"A" - #"Z" #"a" - #"z"] non-alpha: complement alpha parse/all/case [any [ any non-alpha a: 5 capitals any capitals b: [non-alpha | end] ( b: change/part a rejoin ["<strong>" copy/part a b "</strong>"] b ) :b | some alpha ] to end] | |
Graham: 27-Jun-2006 | Actually I would like to add a parse problem to the weeklyblog and get people to submit answers :) | |
BrianH: 27-Jun-2006 | I use parse quite a bit. It's funny, I've never needed the GUI of View, but I use parse daily. | |
Graham: 27-Jun-2006 | And give a prize for the shortest answer | |
Graham: 27-Jun-2006 | say a copy of Microsoft VB :) | |
BrianH: 27-Jun-2006 | Seriously though, three charsets and two temporary variables, there's got to be a more efficient way. | |
BrianH: 27-Jun-2006 | ; Sorry, more fixes capitals: charset [#"A" - #"Z"] alpha: charset [#"A" - #"Z" #"a" - #"z"] non-alpha: complement alpha parse/all/case [any [to alpha [ a: 5 capitals any capitals b: [non-alpha | end] ( b: change/part a rejoin ["<strong>" copy/part a b "</strong>"] b ) :b | some alpha ]] to end] | |
Volker: 27-Jun-2006 | because " a: 5 capitals any capitals b:" stops at "g" and friends. | |
BrianH: 27-Jun-2006 | The inserts are a nice touch though. | |
Graham: 28-Jun-2006 | I think that punctuation is part of a word | |
Graham: 28-Jun-2006 | A person is writing a text file. It has headings which are denoted by caps, and terminating in ":". | |
Graham: 28-Jun-2006 | Anyway, i have a working version now :) | |
[unknown: 9]: 28-Jun-2006 | Since you wrote one, do you know of a better one? This is not a reflection on yours, but it is a great way to know what you considered the next best thing. | |
BrianH: 29-Jun-2006 | To use the simpler of the CS terms: Parse is a rule-based, recursive-descent string and structure parser with backtracking. It is not a parser generator (like Lex/Yacc) or compiler (like most regex engines) - the engine follows the rules directly. Since Parse is recursive-descent it can handle patterns that regular expressions wouldn't be able to. Since Parse backtracks it can handle patterns that ordinary recursive-descent parsers can't. Basically, it puts the text and structure processing abilities of Perl 5 to shame, let alone those of the lesser regex engines. In theory, Perl 6 has caught up with REBOL, but Perl 6 only exists in theory for now. By the time it becomes actual REBOL should surpass it (especially if I have anything to say about it). | |
BrianH: 29-Jun-2006 | It's pretty easy to demonstrate patterns that regular expressions can't handle. It's only somewhat difficult to demonstrate patterns that can't be handled by a recursive descent parser without backtracking or unlimited lookahead. I have never run into a pattern that can't be handled by Parse in theory - its only limits are in implementation (available memory and recursion depth). I am not qualified to describe its limits. Still, you have to be careful about how you write the rules or they will trip you up. | |
BrianH: 29-Jun-2006 | A little dry as explanations go, I suppose. You may get better luck by showing some magic parse code tricks :) | |
Volker: 29-Jun-2006 | Somewhat buzzy: Its a simplified compiler-compiler. Could be used to build a java-compiler (eg such complex syntax), but its also as easy as regex for simpler things. But still readable. (less buzzy: not always that easy due to the poorer lockahead). | |
BrianH: 29-Jun-2006 | Volker, it's more like it can do what a compiler-compiler can do without needing to compile :) And backtracking is about the same as unlimited lookahead, but more powerful. | |
[unknown: 9]: 29-Jun-2006 | Thanks Brian, but as is the theme with questions I ask, I don't ask for myself, but rather that the "world" can learn what "we" know. So perhaps you should add your 2 cents to Henriks, and Tom's in a public forum of the Wikibook. | |
Volker: 29-Jun-2006 | the compiling is no big argument as compiler-compilers are for compiled languages anyway ;) the point is, you can mix a grammar and actions for semantics easy. | |
BrianH: 29-Jun-2006 | Volker, it still might be a good point that you can skip a step with parse, depending on the listener. Parse is more of a compiler-interpreter really. The real point I was making was about the lookahead. | |
Volker: 29-Jun-2006 | aah. a compiler-compiler produces sourcecode to be compiled, but you can interpret data with it. | |
Volker: 29-Jun-2006 | i guess that depends on the coco. the point is, a bnf by default, and code inside therules, instead of putting things in vars andprocess later. IMHO. | |
BrianH: 29-Jun-2006 | Jaimie, I meant that parse is itself an interpreter, not a compiler. It interprets compiler specs (or interpreter specs, etc.). | |
BrianH: 29-Jun-2006 | Volker, I've used a lot of compiler-compilers before and reviewed many more, and unlimited lookup or backtracking are rare. | |
Volker: 29-Jun-2006 | then the advantages of parse are beeing like a compiler-compiler and habving unlimited lookup etc? | |
BrianH: 29-Jun-2006 | I'm not sure whether not having a seperate tokenizer is a plus or a minus, though. | |
BrianH: 29-Jun-2006 | I guess you could think of block parsing as using load as a tokenizer. | |
Volker: 29-Jun-2006 | sounds good. if one finds a good tokenized representation. I am not an xml-guru :( | |
BrianH: 29-Jun-2006 | Still, "run away" is a common and sensible reaction to XML. | |
Gordon: 29-Jun-2006 | I'm a bit stuck because this parse stop after the first iteration. Can anyone give me a hint as to why it stops after one line. Here is some code: data: read to-file Readfile print length? data 224921 d: parse/all data [thru QuoteStr copy Note to QuoteStr thru QuoteStr thru quotestr copy Category to QuoteStr thru QuoteStr thru quotestr copy Flag to QuoteStr thru newline (print index? data)] 1 == false Data contains hundreds of "memos" in a csv file with three fields: Memo, Category and Flag ("0"|"1") all fileds are enclosed in quotes and separated by commas. It would be real simple if the Memo field didn't contain double quoted words; then parse data none would even work; but alas many memos contain other "words". It would even be simple if the memos didn't contain commas, then parse data "," or parse/all data "," would work; but alas many memos contain commas in the body. | |
MikeL: 29-Jun-2006 | Gordon, can you post a copy of short lines of the data? | |
Izkata: 29-Jun-2006 | if QuoteStr = "\"", then this looks like it to me: Note , "Category", "Flag" Note , "Category", "Flag" But you don't have a loop or anything - try this: d: parse/all data [ some [ thru QuoteStr copy Note to QuoteStr thru QuoteStr thru quotestr copy Category to QuoteStr thru QuoteStr thru quotestr copy Flag to QuoteStr thru newline (print index? data) ] ] | |
Gordon: 29-Jun-2006 | Okay, trying it now. I see that the phrase: "print index? data" stays stuck on "1". I see that you have posted a new example. I'll try that. Be right back. | |
Gordon: 29-Jun-2006 | I'm pretty sure that you are right in that I have to loop throught the "Data". That was my big stumbling block and the rest is just logic to figure out. Thanks a bunch. | |
Gordon: 29-Jun-2006 | In the phrase. "Print index :x", what does putting a colon before a variable do again? | |
Gordon: 29-Jun-2006 | This data was exported by PalmOS. I like the Palm desktop for keeping track on notes/,memos addresses but the search engine sucks badly. Therefore I wanted to export the data to allow a nice Rebol search on it.. Therefore, the PalmOS export function does "escape" an embedded quote by quoting it again. Ex: Press the "Home" button becomes Press the Home button. | |
Tomc: 29-Jun-2006 | truth (as far as i know) is: word is is a shortcut for :word but there are a few places such as inside parse where the shortcut does not work so you need to make it explicit | |
Gordon: 29-Jun-2006 | I will get some troubleshooting data posted in a minute. | |
Gordon: 29-Jun-2006 | Tomc: Do I understand that :word would be like "get word" except in a parse sentence? | |
Gordon: 29-Jun-2006 | Tomc: Do I understand that :word would be like "get word" and needed in a parse sentence but you can just use the shortcut 'word' most everywhere else? | |
BrianH: 29-Jun-2006 | The colon before the word prevents the interpreter from evaluating active values like functions and parens. It's a safety thing. | |
BrianH: 29-Jun-2006 | Except when you want an active value assigned to the word to be evaluated, like when you are calling a function. | |
Gordon: 29-Jun-2006 | Thanks Tomc and BrianH. I'll chew on it for a while. Meanwhile I'm working on building some test data for the first problem. | |
Gordon: 29-Jun-2006 | okay so in the parse rules (except in a parenthesized code block) it means "be here now"? | |
BrianH: 30-Jun-2006 | That's interesting. Parens and paths used to be active - oh yeah, that was changeda while ago. Still, there are some value types that are active (function types, lit-path, lit-word) and if you think you will get one of these you should disable their evaluation by referencing them with a get-word, unless you intend them to be evaluated. | |
DideC: 30-Jun-2006 | Gordon: I did not read this thread in a whole but as for converting CSV string to/from Rebol blocks, here is some fully functionnal functions : | |
DideC: 30-Jun-2006 | ;***** Conversion function from/to CSV format csv-to-block: func [ "Convert a string of CSV formated data to a Rebol block. First line is header." csv-data [string!] "CSV data." /separator separ [char!] "Separator to use if different of comma (,)." /without-header "Do not include header in the result." /local out line start end this-string header record value data chars spaces chars-but-space ; CSV format information http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm ] [ out: copy [] separ: any [separ #","] ; This function handle replacement of dual double-quote by quote while copying substring this-string: func [s e] [replace/all copy/part s e {""} {"}] ; CSV parsing rules header: [(line: copy []) value any [separ value] (if not without-header [append/only out line])] record: [(line: copy []) value any [separ value] (append/only out line)] value: [any spaces data any spaces (append line this-string start end)] data: [start: some chars-but-space end: | #"^"" start: any [some chars | {""} | #"," | newline] end: #"^""] chars: complement charset rejoin [ {"} separ newline] spaces: charset exclude { ^-} form separ chars-but-space: exclude chars spaces parse/all csv-data [header any [newline record] any newline end] out ] block-to-csv: func [ "Convert a block of blocks to a CSV formated string." blk-data [block!] "block of data to convert" /separator separ "Separator to use if different of comma (,)." /local out csv-string record value v ] [ out: copy "" separ: any [separ #","] ; This function convert a string to a CSV formated one csv-string: func [val] [head insert next copy {""} replace/all copy val {"} {""} ] record: [into [some [value (append out #",")]]] value: [set v string! (append out csv-string v) | set v any-type! (append out form v)] parse/all blk-data [any [record (remove back tail out append out newline)]] out ] | |
Gordon: 30-Jun-2006 | DideC: Thanks. I've copied and pasted it for review and added it to my local public library. This script should be useful especially with the html help page. Documentation on a script is very rare and much appreciated. Graham: Did a search using "librarian" and search term of "sql cvs" and didn't come up with anything. Although, I think we've got it covered now anyhow. | |
Graham: 1-Jul-2006 | What I was trying to do above is to look for the macro text preceded by a space or newline, and ending in a space or newline. | |
Graham: 1-Jul-2006 | Heart: Heart regular rate and rhythm, no rubs, murmurs, or gallops noted. A: Abdomen: soft, nontender, no mass, no hernia, no guarding, no rebound tenderness, no ascites, non obese Hbp Hypertension (high blood pressure) #401.9. Ii Type II Diabetes #250.00 | |
Graham: 1-Jul-2006 | So, someone might type heart: A: with striae | |
BrianH: 1-Jul-2006 | That A: isn't delimited by whitespace. | |
Graham: 1-Jul-2006 | it is .. it's preceded by a newline character | |
Graham: 1-Jul-2006 | so, a: is a macro, whereas "a" is not. | |
Tomc: 1-Jul-2006 | isn't there a controled vocabulary for this sort of thing | |
BrianH: 1-Jul-2006 | Is a macro always the first word in a line? | |
Graham: 1-Jul-2006 | no, it can be anywhere in a line. | |
BrianH: 1-Jul-2006 | Is there a seperate syntax for defining macros? | |
Graham: 1-Jul-2006 | no, it's just a text file which is read in at start up. | |
BrianH: 1-Jul-2006 | So in use, a macro a: will always be a: in the text. Will it be A: sometimes, or "a:"? | |
Graham: 1-Jul-2006 | so, personal shorthand should expand into a controlled vocab ideally | |
Graham: 1-Jul-2006 | Ii Type II Diabetes #250.00 here the macro expansion includes a code (CPT) from the AMA. |
17101 / 64608 | 1 | 2 | 3 | 4 | 5 | ... | 170 | 171 | [172] | 173 | 174 | ... | 643 | 644 | 645 | 646 | 647 |