AltME groups: search

Help · search scripts · search articles · search mailing list

results summary

world	hits
r4wp	5907
r3wp	58701
total:	64608

results window for this page: [start: 17101 end: 17200]

world-name: r3wp

Group: Parse ... Discussion of PARSE dialect [web-public]
Anton: 5-Mar-2006	And do you want to avoid putting them into a block first ?
Geomol: 6-Mar-2006	'parse' is the path to great explorations and inventions - and also to great confusion and maybe despair. ;-) No really, it can be a bit confusing at times, but I guess, it can't be done otherwise to have such great functionality. There's no short cut with 'parse'. Learning by doing is the way to go. And it's a brilliant tool!
Oldes: 7-Mar-2006	count-word-frequency: func[ "Counts word frequency from the given text" text [string!] "text to analyse" /exclude ex [block!] "words which should not be counted" /local counts f wordchars nonwordchars ][ counts: make hash! 100000 wordchars: charset [#"a" - #"z" #"A" - #"Z" "��̊�؎��ύ�Ѫ�"] nonwordchars: complement wordchars parse/all text [ any nonwordchars any [ copy word some wordchars ( ;probe word if any [not exclude none? find ex word][ either none? f: find/tail counts word [ repend counts [ word 1 ] ][ change f (f/1 + 1) ] ] ) any nonwordchars ] ] counts: to-block counts sort/skip/compare/reverse counts 2 2 new-line/skip counts true 2 ]
Oldes: 7-Mar-2006	found missing czech chars-> wordchars: charset [#"a" - #"z" #"A" - #"Z" "��̊�؎��ύ�Ѫ��"]
Oldes: 13-Mar-2006	Is this a bug? parse/all {"some words"} {" } ;== ["some words"] parse/all {and "some words"} {" } ;== ["and" "some words"] parse {and "some words"} {" } ;== ["and" "some" "words"] parse {"some words"} {" } ;== ["some words"]
Geomol: 13-Mar-2006	Good question! It's in a tough corner of REBOL - parsing. REBOL is in many ways more like a human language, than a computer language. Strictly speaking, you can argue, that those examples have a bug or two, but can you live with it? The behaviour might make it difficult to parse input strings, written by humans, because people write all sorts of things. (If it can go wrong, it will.) Try change the quotation marks to something else and see the results change, like: >> parse/all {Xsome wordsX}�{X } == ["" "some" "words"]
Gabriele: 13-Mar-2006	parse, without a rule, treats quotes specially. this is to allow parse to be used directly with things like csv data.
Oldes: 14-Mar-2006	I think it's a bug! I was trying to use this to divide large string to words and found that I have all sentences inside , instead of just words. It's problem only if you have the divider on the edge.
Gabriele: 14-Mar-2006	this behavior is the one intended by Carl. so, it's so by design, and not a bug. but, you may try to convince Carl that you don't like it. ;)
Oldes: 14-Mar-2006	I still think it's a bug - I cannot see the diference between parse and parse/all in this example. If Carl don't want to fix it, no problem for me, I used more complicated rule to do the same thing, just still think, it's a bug and it will confuse more people in the future as well.
Oldes: 14-Mar-2006	and parse {,"a b, d" ,d} {,} == ["" "a b, d" "d"] (so probably Carl has true;-)
Oldes: 14-Mar-2006	But it should be in documentation, that the quotes are very special characters for such a type of parsing!
JaimeVargas: 28-Apr-2006	Oldes a regex context will be a good addition. Where regex are the basic rules for numbers, white space, words and their negations.
Oldes: 28-Apr-2006	anton: I think, that any parse rule which don have to be global variable, but you can still the name used in parse block. But probably it would be a security issue
Gregg: 28-Apr-2006	I've thought about that as well. There are some base charsets we could probably standardize on, and that would be good (IMO). Beyond a few basics, though, consensus gets tough.
Gregg: 28-Apr-2006	The singular/plural argument seems easy, but isn't (IMO); DIGITS could be done as SOME DIGIT, and you could argue that things like 2 DIGITS reads better, though 1 DIGITS does not. You could double-define it, but that gets ugly too. So, what about DIG? That doesn't imply any singularity, though it's a bit terse, and not a full word (or, rather, the wrong full word).
Sunanda: 28-Apr-2006	II was sure I'd posted this just after Oldes' message.....But it ain't there now.....Maybe it's in the wrong group) Andrew has a nice starter set: http://www.rebol.org/cgi-bin/cgiwrap/rebol/view-script.r?script=common-parse-values.r And I know he has extended that list extensively to include things like email address and URL
Graham: 28-Apr-2006	Maybe there should be no invalid datatypes .... everything can be converted to a datatype
Graham: 28-Apr-2006	if the parser thinks a datatype is invalid, well, let's call it an invalid! datatype!!
Graham: 28-Apr-2006	have a catchall for stuff it thinks is wrong
Volker: 1-May-2006	How about another way: integrate datatypes in string-parser. Basically a load/next and check for type. Then we can write (note i parse a string): parse "1 a , #2" [ integer! word! "," issue! ]
Volker: 1-May-2006	'invalite! has a problem: its easy to recognize where the wrong part starts, but harder to recognize where the wrong part ends.
Ashley: 24-May-2006	Quick question for the parse experts. If I have a parse rule that looks like this: parse spec [ any [ set arg string! (...) \| set arg tuple! (...) \| ... ] ] How would I add a rule like: set arg paren! (reduce arg) that always occurred prior to the any rule and evaluated parenthesized expressions (i.e. I want parenthesized expressions to be reduced to a REBOL value that can be handled by the remainder of the parse rule).
Ashley: 25-May-2006	Thanks both, works a treat.
Graham: 27-Jun-2006	My brain is still asleep. How to go thru a document and add <strong> </strong> around every word that is in capitals and is more than a few characters long?
Pekr: 27-Jun-2006	hmm, quite a challenge ...
Gordon: 27-Jun-2006	I agree - a bit much to ask. A more specific question would get a more specific answer :) Something like: file: read filename2parse newfile: "" Foreach word file [ if Is-Capitals Word [ newfile: join newfile ["<strong> " word " </strong> "] ] The Is-Capitals function would have to be defined Is-Capitals func [Word2Check] [ some code here ]
Graham: 27-Jun-2006	that won't work because file is just text and not a block.
Volker: 27-Jun-2006	;thinking loud: capitals: charset["#"A" - #"Z"] capital: [5 capitals any capitals]
BrianH: 27-Jun-2006	Yes, give me a minute...
JaimeVargas: 27-Jun-2006	capitalize-word: func [ s [string!] /local len ][ either 5 < len: length? s [ s: rejoin ["<strong>" uppercase s/1 next s </strong>] ][ s ] ] capitalize-text: func [ s [string!] /local result word-rule alpha non-alpha w c ][ result: copy {} alpha: charset [#"A" - #"Z" #"a" - #"z"] non-alpha: complement alpha word-rule: [copy w [some alpha] (insert tail result capitalize-word w)] other-rule: [copy c non-alpha (insert tail result c)] parse/all s [some [word-rule \| other-rule] end] result ]
Graham: 27-Jun-2006	search for a series of capitalised words and strong them
Graham: 27-Jun-2006	bolden-word: func [ s [string!] /local len ][ either 5 < len: length? s [ s: rejoin ["<strong>" s </strong>] ][ s ] ] enhance-text: func [ s [string!] /local result word-rule alpha non-alpha w c ][ result: copy {} alpha: charset [#"A" - #"Z"] non-alpha: complement alpha word-rule: [copy w [some alpha] (insert tail result bolden-word w)] other-rule: [copy c non-alpha (insert tail result c)] parse/all s [some [word-rule \| other-rule] end] result ]
BrianH: 27-Jun-2006	capitals: charset ["#"A" - #"Z"] alpha: charset ["#"A" - #"Z" #"a" - #"z"] non-alpha: complement alpha parse/all/case [any non-alpha any [ a: 5 capitals any capitals b: non-alpha ( b: change/part a rejoin ["<strong>" copy/part a b "</strong>"] b ) :b \| some alpha any non-alpha ] to end]
BrianH: 27-Jun-2006	; A few fixes capitals: charset ["#"A" - #"Z"] alpha: charset ["#"A" - #"Z" #"a" - #"z"] non-alpha: complement alpha parse/all/case [any non-alpha any [ a: 5 capitals any capitals b: [non-alpha \| end] ( b: change/part a rejoin ["<strong>" copy/part a b "</strong>"] b ) :b \| some alpha any non-alpha ] to end]
Graham: 27-Jun-2006	capitals: charset ["#"A" - #"Z"] ... remove leading "
Graham: 27-Jun-2006	Yeah ... it was a way to mark up text wherever a sequence of CAPS occurs
BrianH: 27-Jun-2006	; Sorry, more fixes capitals: charset ["#"A" - #"Z"] alpha: charset ["#"A" - #"Z" #"a" - #"z"] non-alpha: complement alpha parse/all/case [any [ any non-alpha a: 5 capitals any capitals b: [non-alpha \| end] ( b: change/part a rejoin ["<strong>" copy/part a b "</strong>"] b ) :b \| some alpha ] to end]
Graham: 27-Jun-2006	Actually I would like to add a parse problem to the weeklyblog and get people to submit answers :)
BrianH: 27-Jun-2006	I use parse quite a bit. It's funny, I've never needed the GUI of View, but I use parse daily.
Graham: 27-Jun-2006	And give a prize for the shortest answer
Graham: 27-Jun-2006	say a copy of Microsoft VB :)
BrianH: 27-Jun-2006	Seriously though, three charsets and two temporary variables, there's got to be a more efficient way.
BrianH: 27-Jun-2006	; Sorry, more fixes capitals: charset [#"A" - #"Z"] alpha: charset [#"A" - #"Z" #"a" - #"z"] non-alpha: complement alpha parse/all/case [any [to alpha [ a: 5 capitals any capitals b: [non-alpha \| end] ( b: change/part a rejoin ["<strong>" copy/part a b "</strong>"] b ) :b \| some alpha ]] to end]
Volker: 27-Jun-2006	because " a: 5 capitals any capitals b:" stops at "g" and friends.
BrianH: 27-Jun-2006	The inserts are a nice touch though.
Graham: 28-Jun-2006	I think that punctuation is part of a word
Graham: 28-Jun-2006	A person is writing a text file. It has headings which are denoted by caps, and terminating in ":".
Graham: 28-Jun-2006	Anyway, i have a working version now :)
[unknown: 9]: 28-Jun-2006	Since you wrote one, do you know of a better one? This is not a reflection on yours, but it is a great way to know what you considered the next best thing.
BrianH: 29-Jun-2006	To use the simpler of the CS terms: Parse is a rule-based, recursive-descent string and structure parser with backtracking. It is not a parser generator (like Lex/Yacc) or compiler (like most regex engines) - the engine follows the rules directly. Since Parse is recursive-descent it can handle patterns that regular expressions wouldn't be able to. Since Parse backtracks it can handle patterns that ordinary recursive-descent parsers can't. Basically, it puts the text and structure processing abilities of Perl 5 to shame, let alone those of the lesser regex engines. In theory, Perl 6 has caught up with REBOL, but Perl 6 only exists in theory for now. By the time it becomes actual REBOL should surpass it (especially if I have anything to say about it).
BrianH: 29-Jun-2006	It's pretty easy to demonstrate patterns that regular expressions can't handle. It's only somewhat difficult to demonstrate patterns that can't be handled by a recursive descent parser without backtracking or unlimited lookahead. I have never run into a pattern that can't be handled by Parse in theory - its only limits are in implementation (available memory and recursion depth). I am not qualified to describe its limits. Still, you have to be careful about how you write the rules or they will trip you up.
BrianH: 29-Jun-2006	A little dry as explanations go, I suppose. You may get better luck by showing some magic parse code tricks :)
Volker: 29-Jun-2006	Somewhat buzzy: Its a simplified compiler-compiler. Could be used to build a java-compiler (eg such complex syntax), but its also as easy as regex for simpler things. But still readable. (less buzzy: not always that easy due to the poorer lockahead).
BrianH: 29-Jun-2006	Volker, it's more like it can do what a compiler-compiler can do without needing to compile :) And backtracking is about the same as unlimited lookahead, but more powerful.
[unknown: 9]: 29-Jun-2006	Thanks Brian, but as is the theme with questions I ask, I don't ask for myself, but rather that the "world" can learn what "we" know. So perhaps you should add your 2 cents to Henriks, and Tom's in a public forum of the Wikibook.
Volker: 29-Jun-2006	the compiling is no big argument as compiler-compilers are for compiled languages anyway ;) the point is, you can mix a grammar and actions for semantics easy.
BrianH: 29-Jun-2006	Volker, it still might be a good point that you can skip a step with parse, depending on the listener. Parse is more of a compiler-interpreter really. The real point I was making was about the lookahead.
Volker: 29-Jun-2006	aah. a compiler-compiler produces sourcecode to be compiled, but you can interpret data with it.
Volker: 29-Jun-2006	i guess that depends on the coco. the point is, a bnf by default, and code inside therules, instead of putting things in vars andprocess later. IMHO.
BrianH: 29-Jun-2006	Jaimie, I meant that parse is itself an interpreter, not a compiler. It interprets compiler specs (or interpreter specs, etc.).
BrianH: 29-Jun-2006	Volker, I've used a lot of compiler-compilers before and reviewed many more, and unlimited lookup or backtracking are rare.
Volker: 29-Jun-2006	then the advantages of parse are beeing like a compiler-compiler and habving unlimited lookup etc?
BrianH: 29-Jun-2006	I'm not sure whether not having a seperate tokenizer is a plus or a minus, though.
BrianH: 29-Jun-2006	I guess you could think of block parsing as using load as a tokenizer.
Volker: 29-Jun-2006	sounds good. if one finds a good tokenized representation. I am not an xml-guru :(
BrianH: 29-Jun-2006	Still, "run away" is a common and sensible reaction to XML.
Gordon: 29-Jun-2006	I'm a bit stuck because this parse stop after the first iteration. Can anyone give me a hint as to why it stops after one line. Here is some code: data: read to-file Readfile print length? data 224921 d: parse/all data [thru QuoteStr copy Note to QuoteStr thru QuoteStr thru quotestr copy Category to QuoteStr thru QuoteStr thru quotestr copy Flag to QuoteStr thru newline (print index? data)] 1 == false Data contains hundreds of "memos" in a csv file with three fields: Memo, Category and Flag ("0"\|"1") all fileds are enclosed in quotes and separated by commas. It would be real simple if the Memo field didn't contain double quoted words; then parse data none would even work; but alas many memos contain other "words". It would even be simple if the memos didn't contain commas, then parse data "," or parse/all data "," would work; but alas many memos contain commas in the body.
MikeL: 29-Jun-2006	Gordon, can you post a copy of short lines of the data?
Izkata: 29-Jun-2006	if QuoteStr = "\"", then this looks like it to me: Note , "Category", "Flag" Note , "Category", "Flag" But you don't have a loop or anything - try this: d: parse/all data [ some [ thru QuoteStr copy Note to QuoteStr thru QuoteStr thru quotestr copy Category to QuoteStr thru QuoteStr thru quotestr copy Flag to QuoteStr thru newline (print index? data) ] ]
Gordon: 29-Jun-2006	Okay, trying it now. I see that the phrase: "print index? data" stays stuck on "1". I see that you have posted a new example. I'll try that. Be right back.
Gordon: 29-Jun-2006	I'm pretty sure that you are right in that I have to loop throught the "Data". That was my big stumbling block and the rest is just logic to figure out. Thanks a bunch.
Gordon: 29-Jun-2006	In the phrase. "Print index :x", what does putting a colon before a variable do again?
Gordon: 29-Jun-2006	This data was exported by PalmOS. I like the Palm desktop for keeping track on notes/,memos addresses but the search engine sucks badly. Therefore I wanted to export the data to allow a nice Rebol search on it.. Therefore, the PalmOS export function does "escape" an embedded quote by quoting it again. Ex: Press the "Home" button becomes Press the Home button.
Tomc: 29-Jun-2006	truth (as far as i know) is: word is is a shortcut for :word but there are a few places such as inside parse where the shortcut does not work so you need to make it explicit
Gordon: 29-Jun-2006	I will get some troubleshooting data posted in a minute.
Gordon: 29-Jun-2006	Tomc: Do I understand that :word would be like "get word" except in a parse sentence?
Gordon: 29-Jun-2006	Tomc: Do I understand that :word would be like "get word" and needed in a parse sentence but you can just use the shortcut 'word' most everywhere else?
BrianH: 29-Jun-2006	The colon before the word prevents the interpreter from evaluating active values like functions and parens. It's a safety thing.
BrianH: 29-Jun-2006	Except when you want an active value assigned to the word to be evaluated, like when you are calling a function.
Gordon: 29-Jun-2006	Thanks Tomc and BrianH. I'll chew on it for a while. Meanwhile I'm working on building some test data for the first problem.
Gordon: 29-Jun-2006	okay so in the parse rules (except in a parenthesized code block) it means "be here now"?
BrianH: 30-Jun-2006	That's interesting. Parens and paths used to be active - oh yeah, that was changeda while ago. Still, there are some value types that are active (function types, lit-path, lit-word) and if you think you will get one of these you should disable their evaluation by referencing them with a get-word, unless you intend them to be evaluated.
DideC: 30-Jun-2006	Gordon: I did not read this thread in a whole but as for converting CSV string to/from Rebol blocks, here is some fully functionnal functions :
DideC: 30-Jun-2006	;***** Conversion function from/to CSV format csv-to-block: func [ "Convert a string of CSV formated data to a Rebol block. First line is header." csv-data [string!] "CSV data." /separator separ [char!] "Separator to use if different of comma (,)." /without-header "Do not include header in the result." /local out line start end this-string header record value data chars spaces chars-but-space ; CSV format information http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm ] [ out: copy [] separ: any [separ #","] ; This function handle replacement of dual double-quote by quote while copying substring this-string: func [s e] [replace/all copy/part s e {""} {"}] ; CSV parsing rules header: [(line: copy []) value any [separ value] (if not without-header [append/only out line])] record: [(line: copy []) value any [separ value] (append/only out line)] value: [any spaces data any spaces (append line this-string start end)] data: [start: some chars-but-space end: \| #"^"" start: any [some chars \| {""} \| #"," \| newline] end: #"^""] chars: complement charset rejoin [ {"} separ newline] spaces: charset exclude { ^-} form separ chars-but-space: exclude chars spaces parse/all csv-data [header any [newline record] any newline end] out ] block-to-csv: func [ "Convert a block of blocks to a CSV formated string." blk-data [block!] "block of data to convert" /separator separ "Separator to use if different of comma (,)." /local out csv-string record value v ] [ out: copy "" separ: any [separ #","] ; This function convert a string to a CSV formated one csv-string: func [val] [head insert next copy {""} replace/all copy val {"} {""} ] record: [into [some [value (append out #",")]]] value: [set v string! (append out csv-string v) \| set v any-type! (append out form v)] parse/all blk-data [any [record (remove back tail out append out newline)]] out ]
Gordon: 30-Jun-2006	DideC: Thanks. I've copied and pasted it for review and added it to my local public library. This script should be useful especially with the html help page. Documentation on a script is very rare and much appreciated. Graham: Did a search using "librarian" and search term of "sql cvs" and didn't come up with anything. Although, I think we've got it covered now anyhow.
Graham: 1-Jul-2006	What I was trying to do above is to look for the macro text preceded by a space or newline, and ending in a space or newline.
Graham: 1-Jul-2006	Heart: Heart regular rate and rhythm, no rubs, murmurs, or gallops noted. A: Abdomen: soft, nontender, no mass, no hernia, no guarding, no rebound tenderness, no ascites, non obese Hbp Hypertension (high blood pressure) #401.9. Ii Type II Diabetes #250.00
Graham: 1-Jul-2006	So, someone might type heart: A: with striae
BrianH: 1-Jul-2006	That A: isn't delimited by whitespace.
Graham: 1-Jul-2006	it is .. it's preceded by a newline character
Graham: 1-Jul-2006	so, a: is a macro, whereas "a" is not.
Tomc: 1-Jul-2006	isn't there a controled vocabulary for this sort of thing
BrianH: 1-Jul-2006	Is a macro always the first word in a line?
Graham: 1-Jul-2006	no, it can be anywhere in a line.
BrianH: 1-Jul-2006	Is there a seperate syntax for defining macros?
Graham: 1-Jul-2006	no, it's just a text file which is read in at start up.
BrianH: 1-Jul-2006	So in use, a macro a: will always be a: in the text. Will it be A: sometimes, or "a:"?
Graham: 1-Jul-2006	so, personal shorthand should expand into a controlled vocab ideally
Graham: 1-Jul-2006	Ii Type II Diabetes #250.00 here the macro expansion includes a code (CPT) from the AMA.

17101 / 64608

[172]