• Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

AltME groups: search

Help · search scripts · search articles · search mailing list

results summary

worldhits
r4wp4382
r3wp44224
total:48606

results window for this page: [start: 13001 end: 13100]

world-name: r3wp

Group: Parse ... Discussion of PARSE dialect [web-public]
Gregg:
28-Apr-2006
I think that's where string parsing comes in, and where having rules 
for REBOL datatypes would ease the pain.
Volker:
1-May-2006
How about another way: integrate datatypes in string-parser. Basically 
a  load/next and check for type.
Then we can write (note i parse a string): 
parse "1 a , #2" [ integer! word! "," issue! ]
Ashley:
24-May-2006
Quick question for the parse experts. If I have a parse rule that 
looks like this:

parse spec [
	any [
		set arg string! (...) | set arg tuple! (...) | ...
	]
]

How would I add a rule like:

	set arg paren! (reduce arg)


that always occurred prior to the any rule and evaluated parenthesized 
expressions (i.e. I want parenthesized expressions to be reduced 
to a REBOL value that can be handled by the remainder of the parse 
rule).
Graham:
27-Jun-2006
My brain is still asleep.  How to go thru a document and add <strong> 
</strong> around every word that is in capitals and is more than 
a few characters long?
Graham:
27-Jun-2006
pattern search on capitals, mark, copy to space, mark, count length 
of copy, if long, insert at mark2, and then at mark1, continue ??
Graham:
27-Jun-2006
that won't work because file is just text and not a block.
Graham:
27-Jun-2006
search for a series of capitalised words and strong them
Graham:
27-Jun-2006
Actually I would like to add a parse problem to the weeklyblog and 
get people to submit answers :)
Graham:
27-Jun-2006
And give a prize for the shortest answer
Graham:
27-Jun-2006
shortest .. I mean the least number of words, and operators - not 
in length
BrianH:
27-Jun-2006
Seriously though, three charsets and two temporary variables, there's 
got to be a more efficient way.
Volker:
27-Jun-2006
and "g" is none
Volker:
27-Jun-2006
because " a: 5 capitals any capitals b:" stops at "g" and friends.
BrianH:
27-Jun-2006
More importantly, it fails at "g" and friends, backtracks and proceeds 
to the next alternate action, some alpha.
BrianH:
27-Jun-2006
No, that would break out of the enclosing all loop. The end skip 
will always fail and proceed to the next alternate.
BrianH:
28-Jun-2006
Of course mine doesn't handle words with apostrophes or hyphens in 
them either. Easy fix though, just add ' and - to the capitals charset.
BrianH:
28-Jun-2006
And do what?
Graham:
28-Jun-2006
A person is writing a text file.  It has headings which are denoted 
by caps, and terminating in ":".
BrianH:
29-Jun-2006
To use the simpler of the CS terms:


Parse is a rule-based, recursive-descent string and structure parser 
with backtracking. It is not a parser generator (like Lex/Yacc) or 
compiler (like most regex engines) - the engine follows the rules 
directly. Since Parse is recursive-descent it can handle patterns 
that regular expressions wouldn't be able to. Since Parse backtracks 
it can handle patterns that ordinary recursive-descent parsers can't.


Basically, it puts the text and structure processing abilities of 
Perl 5 to shame, let alone those of the lesser regex engines.


In theory, Perl 6 has caught up with REBOL, but Perl 6 only exists 
in theory for now. By the time it becomes actual REBOL should surpass 
it (especially if I have anything to say about it).
BrianH:
29-Jun-2006
It's pretty easy to demonstrate patterns that regular expressions 
can't handle. It's only somewhat difficult to demonstrate patterns 
that can't be handled by a recursive descent parser without backtracking 
or unlimited lookahead.


I have never run into a pattern that can't be handled by Parse in 
theory - its only limits are in implementation (available memory 
and recursion depth). I am not qualified to describe its limits. 
Still, you have to be careful about how you write the rules or they 
will trip you up.
BrianH:
29-Jun-2006
Volker, it's more like it can do what a compiler-compiler can do 
without needing to compile :)

And backtracking is about the same as unlimited lookahead, but more 
powerful.
[unknown: 9]:
29-Jun-2006
Thanks Brian, but as is the theme with questions I ask, I don't ask 
for myself, but rather that the "world" can learn what "we" know. 
 So perhaps you should add your 2 cents to Henriks, and Tom's in 
a public forum of the Wikibook.
Volker:
29-Jun-2006
the compiling is no big argument  as compiler-compilers are for compiled 
languages anyway ;) the point is, you can mix a grammar and actions 
for semantics easy.
BrianH:
29-Jun-2006
Reichart, I figured as much (hence the "dry" comment). I'll look 
over the Wikibook and see if I can help.
Volker:
29-Jun-2006
i guess that depends on the coco. the point is, a bnf by default, 
and code inside therules, instead of putting things in vars andprocess 
later. IMHO.
BrianH:
29-Jun-2006
Volker, I've used a lot of compiler-compilers before and reviewed 
many more, and unlimited lookup or backtracking are rare.
Volker:
29-Jun-2006
then the advantages of parse are beeing like a compiler-compiler 
and habving unlimited lookup etc?
Volker:
29-Jun-2006
and you can use parse to tokenize first?
BrianH:
29-Jun-2006
Two rounds of parsing, one for tokenizing and one to parse? Interesting. 
That would work if you don't have control over the source syntax 
- otherwise load works pretty well for simple languages.
Volker:
29-Jun-2006
Thats where i got the idea: tokenize first and use block-parser :)
BrianH:
29-Jun-2006
My next personal project is to go through the XML/XSL/REST specs 
and create exactly that. I already have an efficient structure, I 
just need to fill out the semantics to support the complete logical 
model of XML.
BrianH:
29-Jun-2006
Still, "run away" is a common and sensible reaction to XML.
Gordon:
29-Jun-2006
I'm a bit stuck because this parse stop after the first iteration. 
 Can anyone give me a hint as to why it stops after one line.

Here is some code:

data: read to-file Readfile

print length? data
224921


d: parse/all data [thru QuoteStr copy Note to QuoteStr thru QuoteStr 
thru quotestr

    copy Category to QuoteStr thru QuoteStr thru quotestr copy Flag to 
    QuoteStr
    thru newline (print index? data)]
1
== false


Data contains hundreds of "memos" in a csv file with three fields: 

 Memo, Category and Flag ("0"|"1")  all fileds are enclosed in quotes 
 and separated by commas.
  

It would be real simple if the Memo field didn't contain double quoted 
words; then 
parse data none
would even work; but alas many memos contain other "words".
It would even be simple if the memos didn't contain commas, then
parse data "," or parse/all data ","
would work; but alas many memos contain commas in the body.
Gordon:
29-Jun-2006
I'm pretty sure that you are right in that I have to loop throught 
the "Data".  That was my big stumbling block and the rest is just 
logic to figure out.  Thanks a bunch.
Izkata:
29-Jun-2006
Not sure - I remember seeing it in others' parse rules, so I just 
put it there and it worked  '^^
Take it out and see what happens lol
Gordon:
29-Jun-2006
Hi BrianH;

  Yes I did try that and the problem was that even though I specified 
  the "," as the delimiter, it came across an embedded quote #"^"" 
  and split the input at the quote.  Rebol Shouldn't have split it 
  up that way, to my understanding.  I will post some simple data to 
  test.
Gordon:
29-Jun-2006
Tomc:  Do I understand that :word would be like "get word" and needed 
in a parse sentence but you can just use the shortcut 'word' most 
everywhere else?
BrianH:
29-Jun-2006
The colon before the word prevents the interpreter from evaluating 
active values like functions and parens. It's a safety thing.
Tomc:
29-Jun-2006
and that would be get 'word  not get word
Gordon:
29-Jun-2006
Thanks Tomc and BrianH.  I'll chew on it for a while.  Meanwhile 
I'm working on building some test data for the first problem.
BrianH:
30-Jun-2006
That's interesting. Parens and paths used to be active - oh yeah, 
that was changeda while ago. Still, there are some value types that 
are active (function types, lit-path, lit-word) and if you think 
you will get one of these you should disable their evaluation by 
referencing them with a get-word, unless you intend them to be evaluated.
Anton:
30-Jun-2006
both parens and paths changed between  View 1.2.1 and 1.2.5, actually.
Gordon:
30-Jun-2006
DideC: Thanks.  I've copied and pasted it for review and added it 
to my local public library.  This script should be useful especially 
with the html help page.   Documentation on a script is very rare 
and much appreciated.


Graham: Did a search using  "librarian" and search term of  "sql 
cvs" and didn't come up with anything.  Although, I think we've got 
it covered now anyhow.
Graham:
1-Jul-2006
What I was trying to do above is to look for the macro text preceded 
by a space or newline, and ending in a space or newline.
Graham:
1-Jul-2006
and then replace in situ.
Tomc:
1-Jul-2006
at  the macros and expansons single tokens
BrianH:
1-Jul-2006
HTML/XML entities begin with & and end with ; for just this reason. 
What kind of text? can you give us an example?
Graham:
1-Jul-2006
Heart: Heart regular rate and rhythm, no rubs, murmurs, or gallops 
noted. 

A: Abdomen:  soft, nontender, no mass, no hernia, no guarding, no 
rebound tenderness, no ascites, non obese 
Hbp Hypertension (high blood pressure) #401.9.  
Ii Type II Diabetes #250.00
Graham:
1-Jul-2006
I would have to intercept the keyboard handler to do this .. so I 
want to try and just do the replacement after he's finished typing.
Tomc:
1-Jul-2006
hmm I am in the bussiness of sharing biological information and I 
got to say  please strongly consider  creating  an ontology if one 
does not exist already
Graham:
1-Jul-2006
and there's the proprietary MEDCIN
Tomc:
1-Jul-2006
and the macros should  also be part of that  ontology
Graham:
1-Jul-2006
actually  the file will be saved in a database and loaded when the 
program starts
Graham:
1-Jul-2006
and the ontologically controlled programs are very expensive due 
to licensing fees
Graham:
1-Jul-2006
The AMA charge to use their codes, the American College of Pathologists 
charge to use their SMOMED-CT codes .. and so it goes on.
Tomc:
1-Jul-2006
that the macro-expansoion fioe needs to self check for incidental 
occurances of  a "macro" in an "expansion" and protect against
BrianH:
1-Jul-2006
And it won't have the problem you mention Tomc.
Graham:
1-Jul-2006
so, basically you created a single parse rule from the macro list 
and then parsed the text in one go.
Tomc:
1-Jul-2006
I am glad to see someone else using here and there ;)
BrianH:
1-Jul-2006
Gabriele and I have worked extensively on such submissions.
Graham:
1-Jul-2006
Yes.  So, somehow I need to force the area field to recognise them 
as newlines and reformat the screen.
Henrik:
9-Jul-2006
how "local" are variables that are set during a parse? I was looking 
at Geomol's postscript.r and looked at:

coords: [err:
	(pos: none) [set pos pair! | set x number! set y number!] (
		either pos [
			append output compose [(pos/x) " " (pos/y) " lineto^/"]
		][
			append output compose [(x) " " (y) " lineto^/"]
		]
	)
]
Henrik:
9-Jul-2006
yes, I try to print the variable and it just returns none.
Henrik:
9-Jul-2006
actually, there is a difference between my code and this, which may 
be causing it:


I need to loop the block with 'any. I suspect the contents is lost 
after the first run.
Anton:
9-Jul-2006
And to answer your question, the variables are just regular rebol 
words, so they are as local as you make them.
Oldes:
9-Jul-2006
and how looks the code you parse?
Oldes:
9-Jul-2006
if the parse is inside function and you set pos in the function as 
a local - it will be local
Henrik:
9-Jul-2006
where 'image is always first and the remaining items may come in 
random order
Oldes:
9-Jul-2006
there is no pair and no numbers - the pos must be none
Oldes:
9-Jul-2006
and what exactly do you want?
Henrik:
9-Jul-2006
it works the exact opposite :-) Only the outer 'txt is set, and I 
can't reach the variable inside the block
Henrik:
9-Jul-2006
the brackets would make it a "real" rule, wouldn't it? it would be 
possible to replace the rule with a variable and have the rule block 
placed elsewhere in your code
Henrik:
9-Jul-2006
and it makes the parse scalable, so I can add options later
Pekr:
19-Jul-2006
I tried doing myself small template "engine", which will simply look-up 
for some marks, and replace values accordingly. I decided to look 
for the end of the marks and my friend suggested me, that I should 
name even ending marks, to be clear there is not an error. My parse 
looks like this:
Pekr:
19-Jul-2006
I now can create simply a func, which will accept mark name, and 
do some code-block accordingly - sql query, simple replace of value, 
whatever (well, it will not work for cases like img tags, so it is 
not as flexible as full html parser in temple for e.g., but hey, 
it is meant being simple)
Pekr:
19-Jul-2006
... but should not be simpler, so I wonder - so far, as you can see, 
mark-x is not finished, so it is ignored. How to catch this case 
properly and eventually generate error, send email, write to log, 
whatever?
Pekr:
19-Jul-2006
Maarten - now looking into build-markup - sorry, it is just strange 
was of doing things .... noone will place rebol code into template, 
that will not work ... btw - the code is 'done? What happens if someone 
uploads template with its own code? I want presentation and code 
separation.
Pekr:
19-Jul-2006
I looked into rsp some time ago, and I liked it, especially as it 
was complete, with session support etc., but later on I found shlik.org 
being unavailable ...
BrianH:
31-Aug-2006
Hey, locals and arguments (practically the same thing in REBOL) are 
the most important difference between closures and plain blocks. 
The difference is significant but Peters' background with Smalltalk 
made him miss it - Smalltalk "blocks" look like REBOL blocks but 
act like functions.
Volker:
31-Aug-2006
No, the main point is, easy definitions of code and referencing the 
original context. Rebol-blocks do that.
Volker:
31-Aug-2006
The highlights he mentions is: lexically-scoped, code and data,  
freely mix computations in
Volker:
31-Aug-2006
That scoping is the difference between a closure and doing a "string" 
here.
BrianH:
31-Aug-2006
REBOL blocks don't reference a context, but they may contain words 
that reference a context. Still, this distinction makes no difference 
to the argument that Peters was making - REBOL text processing is 
more powerful than regex and easier to use. It would be easier to 
replicate REBOL-style parsing in Python using closures and generators 
anyway (Peters' real subject), since that is the closest Python gets 
to Icon-style backtracking.
Geomol:
25-Sep-2006
I would like the functionality, when parsing things like TeX. There 
the greek letter gamma is called gamma, and the same in capital is 
called Gamma. Now I have to invent the word capgamma or something.
Gregg:
25-Sep-2006
If it were a safe and easy thing to change, I can see some value 
in it as an option but, since words--and REBOL--are case insensitive, 
I'm inclined to live with things as they are, and use string parsing 
if case sensitivity is needed. I think it's Oldes or Rebolek that 
sometimes requests the ability to parse non-loadable strings, using 
percentage values as an example. I think loading percentages would 
be awesome, but then there are other values we might want to load 
as well; where do you draw the line? I'm waiting to see what R3 holds 
with custom datatypes and such.
Gregg:
25-Sep-2006
And didn't you suggest that values throwing errors could be coerced 
to string! or another type? e.g. add an /any refinement to load, 
and any value in the string that can't be loaded would become a string 
(or maybe you could say you want them to be tags for easy identification).
Oldes:
25-Sep-2006
I think, load/next can be used to handle invalid datatypes now:
>> b: {1 2 3 'x' ,}
== "1 2 3 'x' ,"
>> while [v: load/next b not empty? second v][probe v b: v/2]
[1 " 2 3 'x' ,"]
[2 " 3 'x' ,"]
[3 " 'x' ,"]
['x' " ,"]
** Syntax Error: Invalid word -- ,
** Near: (line 1) ,

Just add some hadler to convert the invalid datatype to something 
else what is loadable and then parse as a block
Geomol:
25-Sep-2006
Gabriele, yes it works with strings. But I have words! Thing is, 
I parse the string input from the user and produce words in an internal 
format. Then I parse those words for the final output, which can 
be different formats. I would expect parse/case to be case-sensitive, 
when parsing words, but parse/case is only for strings, therefore 
my suggestion.
Oldes:
26-Sep-2006
And there is some parse example how to deal with recursions while 
parsing strings? If you parse block, it's easy detect, what is string! 
and what is other type, but if you need to parse string, it's not 
so easy to detect for example strings like {some text {other "text"}}
Anton:
27-Sep-2006
Here's an idea to toss into the mix:

I am thinking of a new notation for strings using underscore (eg. 
 _"hello"_  ) in a parse block, which allows to specify whether they 
are delimited by whitespace or not. This would allow you to enable/disable 
the necessity for delimiters per-string. eg:

parse input [

 _"house"_   ; a complete word surrounded both sides by whitespace

 _"hous"   ;  this would match "house", "housing", "housed" or even 
 "housopoly" etc.. but left side must be whitespace

 "ad"_ ; this would match "ad", "fad", "glad" and right side must 
 be whitespace
]

But this would need string datatype to change.

On the other hand, I could just set underscore _ to a charset of 
whitespace, then use that with parse/all eg:

	_: charset " ^-^/"

parse/all input [
	[ _ "house" _ ]
]


though that wouldn't be as comfortable. Maybe I can create parse 
rules from a simpler dialect which understands the underscore _.
Just an idea...
MikeL:
27-Sep-2006
Anton, Andrew had defined white space patterns in his patterns.r 
script which seems usable then you can use [ ws* "house" ws*] or 
other combinations as needed without underscore.  Andrew's solution 
for this and a lot of other things have given me some good mileage 
over the past few years.   WS*: [some WS]   and WS?: [any WS].   
 It makes for clean parse scripts clear once you adopt it.
Gregg:
27-Sep-2006
I think either approach above can work well. I like the "look" of 
the underscore, and have done similar things with standard function 
names. For SOME, ANY, and OPT, the tag chars I prefer are +, *, and 
? resepctively; which are EBNF standard.
Anton:
27-Sep-2006
Oh yes, I've seen Andrew's patterns.r. I was just musing how to make 
it more concise without even using a short word like WS.  Actually 
the use case which sparked this idea was more of a "regex-level" 
pattern matcher, just a simple pattern matcher where the user writes 
the pattern to match filenames and to match strings appearing in 
file contents.
Gregg:
28-Sep-2006
I also have a naming convention I've been playing with for a while, 
where parse rule words have an "=" at the end (e.g. date=) and parse 
variables--values set during the parse process--have it at the beginning 
(e.g. =date). The idea is that it's sort of a cross between BNF syntax 
for production rules and set-word/get-word syntax; the goal being 
to easily distinguish parse-related words. By using the same word 
for a rule and an associated variable, with the equal sign at the 
head or tail, respectively, it also makes it easier to keep track 
of what gets set where, when you have a lot of rules.
Maxim:
28-Sep-2006
simple and clean, good idea!
Maxim:
28-Sep-2006
so many years of reboling (since core 1.2) , and still parse remains 
largely untaimed by myself.
Izkata:
3-Oct-2006
That's a ~very~ good example, Oldes... it should be put in the docs 
somewhere (if it isn't already.)  I didn't understand how get-words 
and set-words worked in parse, either, before..
Rebolek:
4-Oct-2006
I've got following PARSE problem:


I've got string - "<good tag><bad tag><other tag><good tag>" and 
I want to keep "good tag" and "<>" in other tags change to let's 
say "X" (I need to change it to HTML entities but that doesn't matter 
now). So result will look like: "<good tag>Xbad tagXXother tagX<good 
tag>"


I'm working on it for last few hours but still not found sollution. 
Is there any?
Rebolek:
4-Oct-2006
I'll probable replace everything and then just revert the "good tag" 
back. It's not very elegant, but...
Anton:
4-Oct-2006
&lt;, and &gt;  ?
13001 / 4860612345...129130[131] 132133...483484485486487