• Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

AltME groups: search

Help · search scripts · search articles · search mailing list

results summary

worldhits
r4wp5907
r3wp58701
total:64608

results window for this page: [start: 17201 end: 17300]

world-name: r3wp

Group: Parse ... Discussion of PARSE dialect [web-public]
Graham:
1-Jul-2006
there is no whitespace inside a macroname
Tomc:
1-Jul-2006
so there is a seperate extendable file with the macro=expansion
Graham:
1-Jul-2006
actually  the file will be saved in a database and loaded when the 
program starts
Tomc:
1-Jul-2006
as a model orginism
Tomc:
1-Jul-2006
that the macro-expansoion fioe needs to self check for incidental 
occurances of  a "macro" in an "expansion" and protect against
Tomc:
1-Jul-2006
I wouls still sort the macros by longest to shortest so cant glob 
on to part of  a macro  ..
Graham:
1-Jul-2006
so, basically you created a single parse rule from the macro list 
and then parsed the text in one go.
BrianH:
1-Jul-2006
Tomc, that is a good point - I'll fix it. Graham, that's right.
Graham:
1-Jul-2006
We need a masterclass in parse ....
Graham:
1-Jul-2006
it's a local so memory will be released anyway ..
BrianH:
1-Jul-2006
It's a speed optimization. This might change with REBOL 3.
Graham:
1-Jul-2006
memory use is a large with Rebol.
BrianH:
1-Jul-2006
Most of the excessive memory overhead of REBOL is just sloppy (no 
offense Carl). It's not much of a problem for most, but I have run 
into memory limits when running on embedded or handheld platforms, 
or running hundreds of instances on servers.
Tomc:
1-Jul-2006
but that can just be a static rule outside of compose
Graham:
1-Jul-2006
the above macro is supposed to expand into a multiline statement.
BrianH:
1-Jul-2006
Then it is a good thing that the ^/ is in the expansion.
Graham:
1-Jul-2006
No, as it ends up on screen showing ^/ instead of a visual newline.
BrianH:
1-Jul-2006
Are they writing ^/ in the expansion text source data to indicate 
a newline?
Graham:
1-Jul-2006
They're using ^/ as the macros are being read in from a text file 
using read/lines
Henrik:
9-Jul-2006
how "local" are variables that are set during a parse? I was looking 
at Geomol's postscript.r and looked at:

coords: [err:
	(pos: none) [set pos pair! | set x number! set y number!] (
		either pos [
			append output compose [(pos/x) " " (pos/y) " lineto^/"]
		][
			append output compose [(x) " " (y) " lineto^/"]
		]
	)
]
Anton:
9-Jul-2006
Mmm let me make a few tests.
Henrik:
9-Jul-2006
actually, there is a difference between my code and this, which may 
be causing it:


I need to loop the block with 'any. I suspect the contents is lost 
after the first run.
Oldes:
9-Jul-2006
if the parse is inside function and you set pos in the function as 
a local - it will be local
Henrik:
9-Jul-2006
I want to assign a variable to each element so I can process them 
later
Anton:
9-Jul-2006
add a block to control the evaluation.
Anton:
9-Jul-2006
I'm trying to figure out a simple example to show why.
Henrik:
9-Jul-2006
I wonder what the difference is? If it's only for controlling how 
global a variable is, it seems a little backwards to me
Henrik:
9-Jul-2006
the brackets would make it a "real" rule, wouldn't it? it would be 
possible to replace the rule with a variable and have the rule block 
placed elsewhere in your code
Anton:
9-Jul-2006
You have to think of a rule like this:
	[ integer! | ]
as equivalent to
	[ integer! | none ]
or
	opt [ integer! ]
Anton:
9-Jul-2006
I think he might be using 'test-image in place of a real image! for 
this example ?
Henrik:
9-Jul-2006
It's also a good thing with these discussions. I've never really 
grown 100% comfortable with parse.
DideC:
10-Jul-2006
About Layout : parse handles only the layout words (origin, space, 
at...), see source layout.

The face description is handled by a loop, not by parse. See system/view/vid/grow-facets
Pekr:
19-Jul-2006
Hi, need a bit of help ....
Pekr:
19-Jul-2006
I now can create simply a func, which will accept mark name, and 
do some code-block accordingly - sql query, simple replace of value, 
whatever (well, it will not work for cases like img tags, so it is 
not as flexible as full html parser in temple for e.g., but hey, 
it is meant being simple)
Chris:
19-Jul-2006
Petr, I have a copy with some notes here:
http://www.ross-gill.com/techniques/rsp/
JaimeVargas:
31-Aug-2006
Very nice comments. But comparing a parser with a regex is a bit 
unfair ;-)
Volker:
31-Aug-2006
That scoping is the difference between a closure and doing a "string" 
here.
BrianH:
31-Aug-2006
REBOL blocks don't reference a context, but they may contain words 
that reference a context. Still, this distinction makes no difference 
to the argument that Peters was making - REBOL text processing is 
more powerful than regex and easier to use. It would be easier to 
replicate REBOL-style parsing in Python using closures and generators 
anyway (Peters' real subject), since that is the closest Python gets 
to Icon-style backtracking.
Volker:
31-Aug-2006
its not important what references the context, but that a variable 
can find one.
Volker:
31-Aug-2006
result := a > b
    ifTrue:[ 'greater' ]
    ifFalse:[ 'less' ]
Ladislav:
31-Aug-2006
besides, Tim was a REBOL 1.x user
Oldes:
15-Sep-2006
Maybe someone find it usefull:

remove-tags: func[html /except allowed-tags /local new x tag name 
tagchars][
	new: make string! length? html
	tagchars: charset [#"a" - #"z" #"A" - #"Z"]
	parse/all html [
		any [
			copy x to {<} copy tag thru {>}  (
				if not none? x [insert tail new x]
				if all [
					except
					parse tag ["<" opt #"/" copy name some tagchars to end]
					find allowed-tags name
				][	insert tail new tag ]
			)
		]
		copy x to end (if not none? x [insert tail new x])
	]
	new
]
Gregg:
25-Sep-2006
If it were a safe and easy thing to change, I can see some value 
in it as an option but, since words--and REBOL--are case insensitive, 
I'm inclined to live with things as they are, and use string parsing 
if case sensitivity is needed. I think it's Oldes or Rebolek that 
sometimes requests the ability to parse non-loadable strings, using 
percentage values as an example. I think loading percentages would 
be awesome, but then there are other values we might want to load 
as well; where do you draw the line? I'm waiting to see what R3 holds 
with custom datatypes and such.
Gregg:
25-Sep-2006
And didn't you suggest that values throwing errors could be coerced 
to string! or another type? e.g. add an /any refinement to load, 
and any value in the string that can't be loaded would become a string 
(or maybe you could say you want them to be tags for easy identification).
Oldes:
25-Sep-2006
I think, load/next can be used to handle invalid datatypes now:
>> b: {1 2 3 'x' ,}
== "1 2 3 'x' ,"
>> while [v: load/next b not empty? second v][probe v b: v/2]
[1 " 2 3 'x' ,"]
[2 " 3 'x' ,"]
[3 " 'x' ,"]
['x' " ,"]
** Syntax Error: Invalid word -- ,
** Near: (line 1) ,

Just add some hadler to convert the invalid datatype to something 
else what is loadable and then parse as a block
Oldes:
25-Sep-2006
But such a preloader will slow down:(
Oldes:
26-Sep-2006
(it should be a question - is there such a example?)
Rebolek:
26-Sep-2006
Words should be non-case sensitive, but is it always the case? I've 
found this today accidentaly:

>> a: [small Small]
== [small Small]
>> find/case a to word! "small"
== [small Small]
>> find/case a to word! "Small"
== [Small]
Gabriele:
26-Sep-2006
well... case insensitivity for words is done via automatic aliasing 
of words that differ in case only. (i know this because we found 
a bug related to this :)
Anton:
27-Sep-2006
Here's an idea to toss into the mix:

I am thinking of a new notation for strings using underscore (eg. 
 _"hello"_  ) in a parse block, which allows to specify whether they 
are delimited by whitespace or not. This would allow you to enable/disable 
the necessity for delimiters per-string. eg:

parse input [

 _"house"_   ; a complete word surrounded both sides by whitespace

 _"hous"   ;  this would match "house", "housing", "housed" or even 
 "housopoly" etc.. but left side must be whitespace

 "ad"_ ; this would match "ad", "fad", "glad" and right side must 
 be whitespace
]

But this would need string datatype to change.

On the other hand, I could just set underscore _ to a charset of 
whitespace, then use that with parse/all eg:

	_: charset " ^-^/"

parse/all input [
	[ _ "house" _ ]
]


though that wouldn't be as comfortable. Maybe I can create parse 
rules from a simpler dialect which understands the underscore _.
Just an idea...
MikeL:
27-Sep-2006
Anton, Andrew had defined white space patterns in his patterns.r 
script which seems usable then you can use [ ws* "house" ws*] or 
other combinations as needed without underscore.  Andrew's solution 
for this and a lot of other things have given me some good mileage 
over the past few years.   WS*: [some WS]   and WS?: [any WS].   
 It makes for clean parse scripts clear once you adopt it.
Anton:
27-Sep-2006
Oh yes, I've seen Andrew's patterns.r. I was just musing how to make 
it more concise without even using a short word like WS.  Actually 
the use case which sparked this idea was more of a "regex-level" 
pattern matcher, just a simple pattern matcher where the user writes 
the pattern to match filenames and to match strings appearing in 
file contents.
Anton:
27-Sep-2006
Gregg, + * ? could be a good idea. I'll throw that into my mix-bowl.
Gregg:
28-Sep-2006
I also have a naming convention I've been playing with for a while, 
where parse rule words have an "=" at the end (e.g. date=) and parse 
variables--values set during the parse process--have it at the beginning 
(e.g. =date). The idea is that it's sort of a cross between BNF syntax 
for production rules and set-word/get-word syntax; the goal being 
to easily distinguish parse-related words. By using the same word 
for a rule and an associated variable, with the equal sign at the 
head or tail, respectively, it also makes it easier to keep track 
of what gets set where, when you have a lot of rules.
Graham:
29-Sep-2006
This was I thought a simple task .. to parse a csv file....
Graham:
29-Sep-2006
this seems to be a difficult line as there is an embedded quote viz 
"123 "c" Avenue"
Graham:
29-Sep-2006
this is Gabriele's published parser 


CSV-parser: make object! [ line-rule: [field any [separator field]] 
field: [[quoted-string | string] (insert tail fields any [f-val copy 
""])] string: [copy f-val any str-char] quoted-string: [{"} copy 
f-val any qstr-char {"} (replace/all f-val {""} {"})] str-char: none 
qstr-char: [{""} | separator | str-char] fields: [] f-val: none separator: 
#";" set 'parse-csv-line func [ "Parses a CSV line (returns a block 
of strings)" line [string!] /with sep [char!] "The separator between 
fields" ] [ clear fields separator: any [sep #";"] str-char: complement 
charset join {"} separator parse/all line line-rule copy fields ] 
]
Graham:
29-Sep-2006
this might fix Gabriele's parser ..

CSV-parser: make object! [
	line-rule: [field any [separator field]]

 field: [[quoted-string | string] (insert tail fields any [f-val copy 
 ""])]
	string: [copy f-val any str-char] 

 quoted-string: [{"} copy f-val any qstr-char {"} (if found? f-val 
 [ replace/all f-val {""} {"}])]
	str-char: none qstr-char: [{""} | separator | str-char]
	fields: []
	f-val: none
	separator: #";" set 'parse-csv-line func [
		"Parses a CSV line (returns a block of strings)"
		line [string!]
		/with sep [char!] "The separator between fields"
	] [
		clear fields
		separator: any [sep #";"]

  str-char: complement charset join {"} separator parse/all line line-rule 
  copy fields
	]
]
Izkata:
3-Oct-2006
That's a ~very~ good example, Oldes... it should be put in the docs 
somewhere (if it isn't already.)  I didn't understand how get-words 
and set-words worked in parse, either, before..
Anton:
4-Oct-2006
string: "<good tag><bad tag><other tag><good tag>"
entity: "<ENTITY>"
parse/all string [
	any [
		to "<" start: skip
		to ">" end: skip 
		(if not find copy/part start end "good tag" [
			change/part start entity 1

   ; fix up END (for when your entity is other than a 1-character long 
   string)
			end: skip end (length? entity) - 1
			change/part end entity 1
			; fix up END again
			end: skip end (length? entity) - 1
		])
		:end skip
	]
	to end
]
string

;== {<good tag><ENTITY>bad tag<ENTITY><ENTITY>other tag<ENTITY><good 
tag>}
Anton:
4-Oct-2006
Such unmatched tags cause a headache for any parser.
Anton:
4-Oct-2006
Ok, give this a burl.
Anton:
4-Oct-2006
string: "<good tag><bad tag> 3 > 5 <other tag><good tag with something 
inside>"

string: " > >> < <<good tag><bad tag> 3 > 5 <other tag><good tag 
etc> >> > "

; (1) search for end tags >, they are erroneous so replace them

; (2) search for start tags <, if there is more than one, replace 
all except the last one

; (3) search for end tag >, check tag body and replace if necessary

entity: "&entity;"
ntag: complement charset "<>" ; non tag
parse/all result: copy string [
	any [
		; (1)
		any [
			any ntag start: ">" end: (

    change/part start entity 1 end: skip start length? entity  ;print 
    [1 index? start]
			) 
			:end
		]
	
		; (2)
		(start: none stop?: none)
		any [
			any ntag start: "<" end:   ;(print [2 mold start])
			any ntag "<" (  ;print "found a second start tag"

    change/part start entity 1 end: skip start length? entity  ;(print 
    [2.1 mold copy/part start end]) 
				start: none
			) :end
		]
		(if none? start [stop?: 'break]) stop?
		
		; ok, we found at least one start tag
		;(print ["OK we found at least one start tag" mold start])
		:start skip
		
		; (3)
		any ntag end: ">"   ;(print [3 mold copy/part start end])
		(if not find copy/part start end "good tag" [
			;print ["found a bad tag" mold copy/part start end]
			change/part start entity 1

   ; fix up END (for when your entity is other than a 1-character long 
   string)
			end: skip end (length? entity) - 1
			change/part end entity 1
			; fix up END again
			end: skip end (length? entity) - 1
		])
		:end skip
	]
	to end
]
result
Anton:
4-Oct-2006
Holy ---- ! where did two and a half hours go ?
Anton:
4-Oct-2006
oh no.. maybe I only spent one and a half hours on it, but still...!
Oldes:
5-Oct-2006
And Rebolek, you can use this my code to remove unwanted tags (It's 
already here - posted a few days befere - but with a little bug - 
this should be OK as I'm using it)

remove-tags: func[html /except allowed-tags /local new x tag name 
tagchars][
	if not string? html [return html]
	new: make string! length? html
	tagchars: charset [#"a" - #"z" #"A" - #"Z"]
	parse/all html [
		any [
			copy x to {<} copy tag thru {>}  (
				if not none? x [insert tail new x]
				if all [
					except
					parse/all tag ["<" opt #"/" copy name some tagchars to end]
					find allowed-tags name
				][	insert tail new tag ]
			)
		]
		copy x to end (if not none? x [insert tail new x])
	]
	new
]
Oldes:
5-Oct-2006
With such a converter we should theoretically be able to easily parse 
any language
Oldes:
5-Oct-2006
...There are actually lots of programs that can be given (E)BNF grammars 
as input and automatically produce code for parsers for the given 
grammar. In fact, this is the most common way to produce a compiler: 
by using a so-called compiler-compiler that takes a grammar as input 
and produces parser code in some programming language....
Anton:
5-Oct-2006
Well, I just spent two days making a matching algorithm for searching 
file contents, and I was considering making a "compile-rules" function 
(possibly similar to Gabriele or someone else's). Looks like I don't 
have to make that for now, but my mind is in this place at the moment. 
I long for the day when I don't have to use filesystems at all (which 
obviates the need for file search programs) - hopefully we can stick 
all our info in a database soon. Probably an associative database.
Anton:
5-Oct-2006
While on this topic - Was it Gregg or Sunanda who made a mini dialect 
for a file contents matcher ? That's the algorithm I just made, and 
I'm now interested to review other implementations. While developing 
I also came to an apparent cross-roads, a choice between a simple, 
"digital", logical algorithm or a more "fuzzy" algorithm with a ranking 
system like Google. This reminded me of a discussion a while back 
where this point was made.
Gregg:
5-Oct-2006
WRT BNF, it should be possible. I think Brett Handley did it, or 
the reverse, at one point; might be on codeconscious.com, not sure. 
I've also done something similar, for ABNF. It was built for a client, 
so I'd have to ask if it could be released. ABNF is what is used 
in a lot of RFCs, so it could be used on a lot of things for standards 
interop.
Robert:
9-Oct-2006
The main problem I see is that a "normal" BNF parser checks all rules 
in parallel and uses the first match. Whereas PARSE uses a sequential 
approach using the first match. So, the rule to use PARSE is, always 
have the maximum width matching rule at the beginning.

For example you want to parse for:
.
..
...


You need to put the ... as first. Otherwise the rule will match for 
a single . first and be fired three times.
BrianH:
10-Oct-2006
Actually Robert, "normal" BNF parsers usually have similar restrictions 
to the parse dialect, only more so. Shift-reduce parsers like yacc 
need the maximum width rule first; recursive-descent parsers need 
to be refactored extensively (in a way that is too complicated to 
go into now). The parse dialect is recursive-descent with backtracking, 
which in theory is less restricted than either LR (shift-reduce) 
or LL (recursive-descent). I tend to do LL refactoring on my parse 
rules just because that makes them faster, but it's nice that it 
is not always required, that I can do LR-style rules if I need to.
BrianH:
10-Oct-2006
Perhaps you are thinking of lexers that convert a source syntax with 
restrictions similar to those of regular expressions into a state 
machine. Those could be thought to operate in parallel (not really, 
but close enough), but the languages they accept are quite restricted 
compared to full parsers, let alone the parse dialect.
BrianH:
10-Oct-2006
Sorry, I came to the parse dialect from a history of using and making 
parser generators. It's annoying that the behavior of parse and the 
tricks you can use to optimize your parse rules have all of these 
arcane CS terms referring to them. At least the parse dialect is 
a lot more flexible than most of those parser generators, and easier 
to write, use and debug too.
james_nak:
10-Oct-2006
I have an easy one for you gurus. Let's say I want to parse a file 
and get all the "www..." out of it. The thing is that they end in 
either a space or a linefeed. How do I do a (written in pseudo parse 
to give you an idea) "to "www" copy tag to 'either a linefeed or 
a space'"? I've tried charsets, vars, blocks but the best I can do 
is one or the other. Note, finding the "www" is the easy part, it's 
ending the string that is giving me fits. Thanks in advance.
Maxim:
27-Oct-2006
I am almost sure this question is asked many times before... its 
my turn  :-)


is there a way for a parse rule to detect situations in which is 
should fail, because we have a succeeding rule which we know will 
match?
Maxim:
27-Oct-2006
I have rules to parse ABC explicitely and a fall back which can parse 
anything.
Maxim:
27-Oct-2006
note... the example is simple and consider each character a different 
matching condition.
Maxim:
27-Oct-2006
also, in reality, each letter in the above over-simplification is 
a word... not just one char (and there is overlap) so I can't just 
match charsets.
Maxim:
28-Oct-2006
the break seems to be what I am looking for,I'll test something out 
and if its not conclusive I will come back with a better example 
:-)  thanks guys.
Graham:
25-Nov-2006
Posted on reboltalk ...

>> parse/case "AAABBBaaaBBBAAAaaa" "A"
== ["" "" "" "BBBaaaBBB" "" "" "aaa"]

how come there are only two "" after the BBBaaaBBB ?
Henrik:
25-Nov-2006
>> parse/case "AAABBBaaaAAA" "A"
== ["" "" "" "BBBaaa" "" ""]
>> parse/case "BAAABBBaaaAAA" "A"
== ["B" "" "" "BBBaaa" "" ""]
>> parse/case "BA" "A"
== ["B"]

hmmm...
Ladislav:
25-Nov-2006
it's OK, because every A means one closing #"^"". The first A was 
used to close the "...a" string
Ingo:
26-Nov-2006
This may make it easier for some, just exchange the "A"s for "," 
and mentally read it like you would read a csv file:

>> parse/case ",,,BBBaaaBBB,,,aaa" ","
== ["" "" "" "BBBaaaBBB" "" "" "aaa"]
Anton:
26-Nov-2006
It's like cutting a piece of wood. You only cut twice but you end 
up with three pieces.
Maxim:
26-Nov-2006
huh? not sure get what you mean... how can the above be desired? 
 it mangles symmetricity of data and tokenizing?  for example it 
strips end / of a dir...
Maxim:
27-Nov-2006
the function's doc string doesn't even mention it !  its a special 
mode ...   in the dict it says:


There is also a simple parse mode that does not require rules, but 
takes a string of characters to use for splitting up the input string.

so not very explicit.
Anton:
27-Nov-2006
So the problem might be that we don't know how it's supposed to work. 
Maybe the implementor wasn't too clear how it should work either. 
From memory there was an "inconsistent case" which actually had a 
use - for something like splitting command-line args. But anyway, 
a clearer definition would be good.
Anton:
27-Nov-2006
Better to have a simple and consistent core and enable particular 
modes for specific uses with refinements.
Pekr:
5-Dec-2006
Just asking, because today I read a bit about ODF and OpenXML (two 
document formats for office apps). There is probably open space for 
small apps, parsing some info from inside the documents etc. (meta-data 
programming) ... just curious ... or will it be better to wait for 
full-spec XML MLs libs, doing the job given, and link to those libraries?
BrianH:
5-Dec-2006
Such a thing has been on my todo list for a while, but I've been 
a little busy lately with non-REBOL projects :(
Maxim:
8-Dec-2006
geomol's xml2rebxml handles XML pretty well.  one might want to change 
the parse rules a little to adapt the output, but it actually loads 
all the xml tags, empty tags and attributes.  it even handles utf-8, 
CDATA chunks, and converts some of the & chars.
BrianH:
11-Dec-2006
You really have to trust your source when using JSON to a browser 
though. Standard usage is to load with eval - only safe to use on 
https sites because of script injection.
Maxim:
11-Dec-2006
is there a way to make block parsing case sensitive?

this doesn't seem to work:
parse/case [A a] [some ['A (print "upper") | 'a (print "lower")]]
Gabriele:
11-Dec-2006
>> strict-equal? 'A 'a
== true
Gabriele:
11-Dec-2006
>> alias 'a "aa"
== aa
>> strict-equal? 'A 'a
== false
Maxim:
11-Dec-2006
hehe... I would not want the bug to get too comfortable,  less it 
becomes a feature  ;-)
Joe:
24-Dec-2006
i run the above on core 2.6 and it loops forever . This was a bug 
fixed in 2.3 but it looks like the bug still exists
Joe:
24-Dec-2006
sorry, not a bug. I was inspired by the example in the changes page 
and it is missing the  thru "^/" after the to "^/"
17201 / 6460812345...171172[173] 174175...643644645646647