• Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

AltME groups: search

Help · search scripts · search articles · search mailing list

results summary

worldhits
r4wp5907
r3wp58701
total:64608

results window for this page: [start: 17101 end: 17200]

world-name: r3wp

Group: Parse ... Discussion of PARSE dialect [web-public]
Anton:
5-Mar-2006
And do you want to avoid putting them into a block first ?
Geomol:
6-Mar-2006
'parse' is the path to great explorations and inventions - and also 
to great confusion and maybe despair. ;-)


No really, it can be a bit confusing at times, but I guess, it can't 
be done otherwise to have such great functionality. There's no short 
cut with 'parse'. Learning by doing is the way to go. And it's a 
brilliant tool!
Oldes:
7-Mar-2006
count-word-frequency: func[
	"Counts word frequency from the given text"
	text [string!] "text to analyse"
	/exclude ex [block!] "words which should not be counted"
	/local counts f wordchars nonwordchars
][
	counts: make hash! 100000

 wordchars: charset [#"a" - #"z" #"A" - #"Z" "̊؎ύѪ"]
	nonwordchars: complement wordchars
	parse/all text [
		any nonwordchars
		any [
			copy word some wordchars (
				;probe word
				if any [not exclude none? find ex word][
					either none? f: find/tail counts word [
						repend counts [ word 1 ]
					][
						change f (f/1 + 1)
					]
				]
			)
			any nonwordchars
		]
	]
	counts: to-block counts
	sort/skip/compare/reverse counts 2 2
	new-line/skip counts true 2
]
Oldes:
7-Mar-2006
found missing czech chars->  wordchars: charset [#"a" - #"z" #"A" 
- #"Z" "̊؎ύѪ"]
Oldes:
13-Mar-2006
Is this a bug?
parse/all {"some words"} {" }
;== ["some words"]
parse/all {and "some words"} {" }
;== ["and" "some words"]
parse {and "some words"} {" }
;== ["and" "some" "words"]
parse {"some words"} {" }
;== ["some words"]
Geomol:
13-Mar-2006
Good question! It's in a tough corner of REBOL - parsing. REBOL is 
in many ways more like a human language, than a computer language. 
Strictly speaking, you can argue, that those examples have a bug 
or two, but can you live with it? The behaviour might make it difficult 
to parse input strings, written by humans, because people write all 
sorts of things. (If it can go wrong, it will.)


Try change the quotation marks to something else and see the results 
change, like:

>> parse/all {Xsome wordsX}{X }
== ["" "some" "words"]
Gabriele:
13-Mar-2006
parse, without a rule, treats quotes specially. this is to allow 
parse to be used directly with things like csv data.
Oldes:
14-Mar-2006
I think it's a bug! I was trying to use this to divide large string 
to words and found that I have all sentences inside , instead of 
just words. It's problem only if you have the divider on the edge.
Gabriele:
14-Mar-2006
this behavior is the one intended by Carl. so, it's so by design, 
and not a bug. but, you may try to convince Carl that you don't like 
it. ;)
Oldes:
14-Mar-2006
I still think it's a bug - I cannot see the diference between parse 
and parse/all in this example. If Carl don't want to fix it, no problem 
for me, I used more complicated rule to do the same thing, just still 
think, it's a bug and it will confuse more people in the future as 
well.
Oldes:
14-Mar-2006
and parse {,"a b, d"  ,d} {,} == ["" "a b, d" "d"]  (so probably 
Carl has true;-)
Oldes:
14-Mar-2006
But it should be in documentation, that the quotes are very special 
characters for such a type of parsing!
JaimeVargas:
28-Apr-2006
Oldes a regex context will be a good addition. Where regex are the 
basic rules for numbers, white space, *words* and their negations.
Oldes:
28-Apr-2006
anton: I think, that any parse rule which don have to be global variable, 
but you can still the name used in parse block. But probably it would 
be a security issue
Gregg:
28-Apr-2006
I've thought about that as well. There are some base charsets we 
could probably standardize on, and that would be good (IMO). Beyond 
a few basics, though, consensus gets tough.
Gregg:
28-Apr-2006
The singular/plural argument seems easy, but isn't (IMO); DIGITS 
could be done as SOME DIGIT, and you could argue that things like 
2 DIGITS reads better, though 1 DIGITS does not. You could double-define 
it, but that gets ugly too. So, what about DIG? That doesn't imply 
any singularity, though it's a bit terse, and not a full word (or, 
rather, the wrong full word).
Sunanda:
28-Apr-2006
II was sure I'd posted this just after Oldes' message.....But it 
ain't there now.....Maybe it's in the wrong group)
Andrew has a nice starter set:

http://www.rebol.org/cgi-bin/cgiwrap/rebol/view-script.r?script=common-parse-values.r

And I know he has extended that list extensively to include things 
like email address and URL
Graham:
28-Apr-2006
Maybe there should be no invalid datatypes .... everything can be 
converted to a datatype
Graham:
28-Apr-2006
if the parser thinks a datatype is invalid, well, let's call it an 
invalid! datatype!!
Graham:
28-Apr-2006
have a catchall for stuff it thinks is wrong
Volker:
1-May-2006
How about another way: integrate datatypes in string-parser. Basically 
a  load/next and check for type.
Then we can write (note i parse a string): 
parse "1 a , #2" [ integer! word! "," issue! ]
Volker:
1-May-2006
'invalite! has a problem: its easy to recognize where the wrong part 
starts, but harder to recognize where the wrong part ends.
Ashley:
24-May-2006
Quick question for the parse experts. If I have a parse rule that 
looks like this:

parse spec [
	any [
		set arg string! (...) | set arg tuple! (...) | ...
	]
]

How would I add a rule like:

	set arg paren! (reduce arg)


that always occurred prior to the any rule and evaluated parenthesized 
expressions (i.e. I want parenthesized expressions to be reduced 
to a REBOL value that can be handled by the remainder of the parse 
rule).
Ashley:
25-May-2006
Thanks both, works a treat.
Graham:
27-Jun-2006
My brain is still asleep.  How to go thru a document and add <strong> 
</strong> around every word that is in capitals and is more than 
a few characters long?
Pekr:
27-Jun-2006
hmm, quite a challenge ...
Gordon:
27-Jun-2006
I agree - a bit much to ask.  A more specific question would get 
a more specific answer :)

Something like:

file: read filename2parse
newfile: ""
Foreach word file [
   if Is-Capitals Word [
      newfile: join newfile ["<strong> " word " </strong> "]
]

The Is-Capitals function would have to be defined
Is-Capitals func [Word2Check] [
   some code here
]
Graham:
27-Jun-2006
that won't work because file is just text and not a block.
Volker:
27-Jun-2006
;thinking loud:
capitals: charset["#"A" - #"Z"]
capital: [5 capitals any capitals]
BrianH:
27-Jun-2006
Yes, give me a minute...
JaimeVargas:
27-Jun-2006
capitalize-word: func [

    s [string!]
    /local len

][

    either 5 < len: length? s [

        s: rejoin ["<strong>" uppercase s/1 next s </strong>]

    ][

        s

    ]
 
]



capitalize-text: func [
    s [string!]

    /local result word-rule alpha non-alpha w c
][

    result: copy {}
    alpha: charset [#"A" - #"Z" #"a" - #"z"]

    non-alpha: complement alpha

     word-rule: [copy w [some alpha] (insert tail result capitalize-word 
     w)]
    other-rule: [copy c non-alpha (insert tail result c)]

    parse/all s [some [word-rule | other-rule] end]
    result

]
Graham:
27-Jun-2006
search for a series of capitalised words and strong them
Graham:
27-Jun-2006
bolden-word: func [
    s [string!]
    /local len
][
    either 5 < len: length? s [
        s: rejoin ["<strong>" s </strong>]
    ][
        s
    ]
 ]

enhance-text: func [
    s [string!]
    /local result word-rule alpha non-alpha w c
][
    result: copy {}
    alpha: charset [#"A" - #"Z"]
    non-alpha: complement alpha

    word-rule: [copy w [some alpha] (insert tail result bolden-word w)]
    other-rule: [copy c non-alpha (insert tail result c)]
    parse/all s [some [word-rule | other-rule] end]
    result
]
BrianH:
27-Jun-2006
capitals: charset ["#"A" - #"Z"]
alpha: charset ["#"A" - #"Z" #"a" - #"z"]
non-alpha: complement alpha
parse/all/case [any non-alpha any [
    a: 5 capitals any capitals b: non-alpha (

        b: change/part a rejoin ["<strong>" copy/part a b "</strong>"] b
    ) :b |
    some alpha any non-alpha
] to end]
BrianH:
27-Jun-2006
; A few fixes
capitals: charset ["#"A" - #"Z"]
alpha: charset ["#"A" - #"Z" #"a" - #"z"]
non-alpha: complement alpha
parse/all/case [any non-alpha any [
    a: 5 capitals any capitals b: [non-alpha | end] (

        b: change/part a rejoin ["<strong>" copy/part a b "</strong>"] b
    ) :b |
    some alpha any non-alpha
] to end]
Graham:
27-Jun-2006
capitals: charset ["#"A" - #"Z"] ... remove  leading "
Graham:
27-Jun-2006
Yeah ... it was a way to mark up text wherever a sequence of CAPS 
occurs
BrianH:
27-Jun-2006
; Sorry, more fixes
capitals: charset ["#"A" - #"Z"]
alpha: charset ["#"A" - #"Z" #"a" - #"z"]
non-alpha: complement alpha
parse/all/case [any [

    any non-alpha a: 5 capitals any capitals b: [non-alpha | end] (

        b: change/part a rejoin ["<strong>" copy/part a b "</strong>"] b
    ) :b |
    some alpha
] to end]
Graham:
27-Jun-2006
Actually I would like to add a parse problem to the weeklyblog and 
get people to submit answers :)
BrianH:
27-Jun-2006
I use parse quite a bit. It's funny, I've never needed the GUI of 
View, but I use parse daily.
Graham:
27-Jun-2006
And give a prize for the shortest answer
Graham:
27-Jun-2006
say a copy of Microsoft VB :)
BrianH:
27-Jun-2006
Seriously though, three charsets and two temporary variables, there's 
got to be a more efficient way.
BrianH:
27-Jun-2006
; Sorry, more fixes
capitals: charset [#"A" - #"Z"]
alpha: charset [#"A" - #"Z" #"a" - #"z"]
non-alpha: complement alpha
parse/all/case [any [to alpha [
    a: 5 capitals any capitals b: [non-alpha | end] (

        b: change/part a rejoin ["<strong>" copy/part a b "</strong>"] b
    ) :b |
    some alpha
]] to end]
Volker:
27-Jun-2006
because " a: 5 capitals any capitals b:" stops at "g" and friends.
BrianH:
27-Jun-2006
The inserts are a nice touch though.
Graham:
28-Jun-2006
I think that punctuation is part of a word
Graham:
28-Jun-2006
A person is writing a text file.  It has headings which are denoted 
by caps, and terminating in ":".
Graham:
28-Jun-2006
Anyway, i have a working version now :)
[unknown: 9]:
28-Jun-2006
Since you wrote one, do you know of a better one?  This is not a 
reflection on yours, but it is a great way to know what you considered 
the next best thing.
BrianH:
29-Jun-2006
To use the simpler of the CS terms:


Parse is a rule-based, recursive-descent string and structure parser 
with backtracking. It is not a parser generator (like Lex/Yacc) or 
compiler (like most regex engines) - the engine follows the rules 
directly. Since Parse is recursive-descent it can handle patterns 
that regular expressions wouldn't be able to. Since Parse backtracks 
it can handle patterns that ordinary recursive-descent parsers can't.


Basically, it puts the text and structure processing abilities of 
Perl 5 to shame, let alone those of the lesser regex engines.


In theory, Perl 6 has caught up with REBOL, but Perl 6 only exists 
in theory for now. By the time it becomes actual REBOL should surpass 
it (especially if I have anything to say about it).
BrianH:
29-Jun-2006
It's pretty easy to demonstrate patterns that regular expressions 
can't handle. It's only somewhat difficult to demonstrate patterns 
that can't be handled by a recursive descent parser without backtracking 
or unlimited lookahead.


I have never run into a pattern that can't be handled by Parse in 
theory - its only limits are in implementation (available memory 
and recursion depth). I am not qualified to describe its limits. 
Still, you have to be careful about how you write the rules or they 
will trip you up.
BrianH:
29-Jun-2006
A little dry as explanations go, I suppose. You may get better luck 
by showing some magic parse code tricks :)
Volker:
29-Jun-2006
Somewhat buzzy: Its a simplified compiler-compiler. Could be used 
to build a java-compiler (eg such complex syntax), but its also as 
easy as regex for simpler things. But still readable. (less buzzy: 
not always that easy due to the poorer lockahead).
BrianH:
29-Jun-2006
Volker, it's more like it can do what a compiler-compiler can do 
without needing to compile :)

And backtracking is about the same as unlimited lookahead, but more 
powerful.
[unknown: 9]:
29-Jun-2006
Thanks Brian, but as is the theme with questions I ask, I don't ask 
for myself, but rather that the "world" can learn what "we" know. 
 So perhaps you should add your 2 cents to Henriks, and Tom's in 
a public forum of the Wikibook.
Volker:
29-Jun-2006
the compiling is no big argument  as compiler-compilers are for compiled 
languages anyway ;) the point is, you can mix a grammar and actions 
for semantics easy.
BrianH:
29-Jun-2006
Volker, it still might be a good point that you can skip a step with 
parse, depending on the listener. Parse is more of a compiler-interpreter 
really. The real point I was making was about the lookahead.
Volker:
29-Jun-2006
aah. a compiler-compiler produces sourcecode to be compiled, but 
you can interpret data with it.
Volker:
29-Jun-2006
i guess that depends on the coco. the point is, a bnf by default, 
and code inside therules, instead of putting things in vars andprocess 
later. IMHO.
BrianH:
29-Jun-2006
Jaimie, I meant that parse is itself an interpreter, not a compiler. 
It interprets compiler specs (or interpreter specs, etc.).
BrianH:
29-Jun-2006
Volker, I've used a lot of compiler-compilers before and reviewed 
many more, and unlimited lookup or backtracking are rare.
Volker:
29-Jun-2006
then the advantages of parse are beeing like a compiler-compiler 
and habving unlimited lookup etc?
BrianH:
29-Jun-2006
I'm not sure whether not having a seperate tokenizer is a plus or 
a minus, though.
BrianH:
29-Jun-2006
I guess you could think of block parsing as using load as a tokenizer.
Volker:
29-Jun-2006
sounds good. if one finds a good tokenized representation. I am not 
an xml-guru :(
BrianH:
29-Jun-2006
Still, "run away" is a common and sensible reaction to XML.
Gordon:
29-Jun-2006
I'm a bit stuck because this parse stop after the first iteration. 
 Can anyone give me a hint as to why it stops after one line.

Here is some code:

data: read to-file Readfile

print length? data
224921


d: parse/all data [thru QuoteStr copy Note to QuoteStr thru QuoteStr 
thru quotestr

    copy Category to QuoteStr thru QuoteStr thru quotestr copy Flag to 
    QuoteStr
    thru newline (print index? data)]
1
== false


Data contains hundreds of "memos" in a csv file with three fields: 

 Memo, Category and Flag ("0"|"1")  all fileds are enclosed in quotes 
 and separated by commas.
  

It would be real simple if the Memo field didn't contain double quoted 
words; then 
parse data none
would even work; but alas many memos contain other "words".
It would even be simple if the memos didn't contain commas, then
parse data "," or parse/all data ","
would work; but alas many memos contain commas in the body.
MikeL:
29-Jun-2006
Gordon, can you post a copy of short lines of the data?
Izkata:
29-Jun-2006
if QuoteStr = "\"", then this looks like it to me:
Note
, "Category", "Flag" 
Note
, "Category", "Flag"

But you don't have a loop or anything - try this:
d: parse/all data [
   some [

      thru QuoteStr copy Note to QuoteStr thru QuoteStr thru quotestr

      copy Category to QuoteStr thru QuoteStr thru quotestr copy Flag to 
      QuoteStr
      thru newline (print index? data)
   ]
]
Gordon:
29-Jun-2006
Okay, trying it now.  I see that the phrase: "print index? data" 
stays stuck on "1".  


I see that you have posted a new example.  I'll try that.  Be right 
back.
Gordon:
29-Jun-2006
I'm pretty sure that you are right in that I have to loop throught 
the "Data".  That was my big stumbling block and the rest is just 
logic to figure out.  Thanks a bunch.
Gordon:
29-Jun-2006
In the phrase.  "Print index :x", what does putting a colon before 
a variable do again?
Gordon:
29-Jun-2006
This data was exported by PalmOS.  I like the Palm desktop for keeping 
track on notes/,memos addresses but the  search engine sucks badly. 
 Therefore I wanted to export the data to allow a nice Rebol search 
on it..  Therefore, the PalmOS export function does "escape" an embedded 
quote by quoting it again.  Ex:
Press the "Home" button
becomes
Press the 
Home
 button.
Tomc:
29-Jun-2006
truth (as far as i know) is:   word  is is a shortcut for :word  
but there are a few places such as inside parse where the shortcut 
does not work so you need to make it explicit
Gordon:
29-Jun-2006
I will get some troubleshooting data posted in a minute.
Gordon:
29-Jun-2006
Tomc:  Do I understand that :word would be like "get word" except 
in a parse sentence?
Gordon:
29-Jun-2006
Tomc:  Do I understand that :word would be like "get word" and needed 
in a parse sentence but you can just use the shortcut 'word' most 
everywhere else?
BrianH:
29-Jun-2006
The colon before the word prevents the interpreter from evaluating 
active values like functions and parens. It's a safety thing.
BrianH:
29-Jun-2006
Except when you want an active value assigned to the word to be evaluated, 
like when you are calling a function.
Gordon:
29-Jun-2006
Thanks Tomc and BrianH.  I'll chew on it for a while.  Meanwhile 
I'm working on building some test data for the first problem.
Gordon:
29-Jun-2006
okay so in the parse rules (except in a parenthesized code block) 
it means "be here now"?
BrianH:
30-Jun-2006
That's interesting. Parens and paths used to be active - oh yeah, 
that was changeda while ago. Still, there are some value types that 
are active (function types, lit-path, lit-word) and if you think 
you will get one of these you should disable their evaluation by 
referencing them with a get-word, unless you intend them to be evaluated.
DideC:
30-Jun-2006
Gordon: I did not read this thread in a whole but as for converting 
CSV string to/from Rebol blocks, here is some fully functionnal functions 
:
DideC:
30-Jun-2006
;***** Conversion function from/to CSV format
csv-to-block: func [

 "Convert a string of CSV formated data to a Rebol block. First line 
 is header."
	csv-data [string!] "CSV data."

 /separator separ [char!] "Separator to use if different of comma 
 (,)."
	/without-header "Do not include header in the result."

 /local out line start end this-string header record value data chars 
 spaces chars-but-space

 ; CSV format information http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm
] [
	out: copy []
	separ: any [separ #","]
	

 ; This function handle replacement of dual double-quote by quote 
 while copying substring
	this-string: func [s e] [replace/all copy/part s e {""} {"}]
	; CSV parsing rules

 header: [(line: copy []) value any [separ value] (if not without-header 
 [append/only out line])]

 record: [(line: copy []) value any [separ value] (append/only out 
 line)]

 value: [any spaces data any spaces (append line this-string start 
 end)]

 data: [start: some chars-but-space end: | #"^"" start: any [some 
 chars | {""} | #"," | newline] end: #"^""]
	chars: complement charset rejoin [ {"} separ newline]
	spaces: charset exclude { ^-} form separ
	chars-but-space: exclude chars spaces
	
	parse/all csv-data [header any [newline record] any newline end]
	out
]

block-to-csv: func [
	"Convert a block of blocks to a CSV formated string." 
	blk-data [block!] "block of data to convert"
	/separator separ "Separator to use if different of comma (,)."
	/local out csv-string record value v
] [
	out: copy ""
	separ: any [separ #","]
	; This function convert a string to a CSV formated one

 csv-string: func [val] [head insert next copy {""} replace/all copy 
 val {"} {""} ]
	record: [into [some [value (append out #",")]]]

 value: [set v string! (append out csv-string v) | set v any-type! 
 (append out form v)]
	

 parse/all blk-data [any [record (remove back tail out append out 
 newline)]]
	out
]
Gordon:
30-Jun-2006
DideC: Thanks.  I've copied and pasted it for review and added it 
to my local public library.  This script should be useful especially 
with the html help page.   Documentation on a script is very rare 
and much appreciated.


Graham: Did a search using  "librarian" and search term of  "sql 
cvs" and didn't come up with anything.  Although, I think we've got 
it covered now anyhow.
Graham:
1-Jul-2006
What I was trying to do above is to look for the macro text preceded 
by a space or newline, and ending in a space or newline.
Graham:
1-Jul-2006
Heart: Heart regular rate and rhythm, no rubs, murmurs, or gallops 
noted. 

A: Abdomen:  soft, nontender, no mass, no hernia, no guarding, no 
rebound tenderness, no ascites, non obese 
Hbp Hypertension (high blood pressure) #401.9.  
Ii Type II Diabetes #250.00
Graham:
1-Jul-2006
So, someone might type

heart:
A: with striae
BrianH:
1-Jul-2006
That A: isn't delimited by whitespace.
Graham:
1-Jul-2006
it is .. it's preceded by a newline character
Graham:
1-Jul-2006
so, a: is a macro, whereas "a" is not.
Tomc:
1-Jul-2006
isn't there a controled vocabulary for this sort of thing
BrianH:
1-Jul-2006
Is a macro always the first word in a line?
Graham:
1-Jul-2006
no, it can be anywhere in a line.
BrianH:
1-Jul-2006
Is there a seperate syntax for defining macros?
Graham:
1-Jul-2006
no, it's just a text file which is read in at start up.
BrianH:
1-Jul-2006
So in use, a macro a: will always be a: in the text. Will it be A: 
sometimes, or "a:"?
Graham:
1-Jul-2006
so, personal shorthand should expand into a controlled vocab ideally
Graham:
1-Jul-2006
Ii Type II Diabetes #250.00
here the macro expansion includes a code (CPT) from the AMA.
17101 / 6460812345...170171[172] 173174...643644645646647