r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Pekr
18-Oct-2009
[4544]
ah, got reply on Chat from Carl towards complementing:


Re #5718: Pekr, that's a good question, and I think the answer must 
be YES. We need to be able to complement bitmaps in a 

nice way". Otherwise, Unicode bitmaps, even if simply used on ASCII 
chars, would take a lot of memory.


This change should be listed on the project sheet, and if not, I'll 
add it there."
Chris
22-Oct-2009
[4545x3]
Is there any advantage in breaking up charsets that represent a large 
varied range of the 16-bit character space? For example, XML names 
are defined as below (excluding > 2 ** 16), but are most commonly 
limited to the ascii-friendly subset:

	w1: charset [

  #"A" - #"Z" #"_" #"a" - #"z" #"^(C0)" - #"^(D6)" #"^(D8)" - #"^(F6)" 
  #"^(F8)" - #"^(02FF)"

  #"^(0370)" - #"^(037D)" #"^(037F)" - #"^(1FFF)" #"^(200C)" - #"^(200D)" 
  #"^(2070)" - #"^(218F)"

  #"^(2C00)" - #"^(2FEF)" #"^(3001)" - #"^(D7FF)" #"^(f900)" - #"^(FDCF)" 
  #"^(FDF0)" - #"^(FFFD)"
	]
	w+: charset [

  #"-" #"." #"0" - #"9" #"A" - #"Z" #"_" #"a" - #"z" #"^(B7)" #"^(C0)" 
  - #"^(D6)"

  #"^(D8)" - #"^(F6)" #"^(F8)" - #"^(037D)" #"^(037F)" - #"^(1FFF)" 
  #"^(200C)" - #"^(200D)"

  #"^(203F)" - #"^(2040)" #"^(2070)" - #"^(218F)" #"^(2C00)" - #"^(2FEF)" 
  #"^(3001)" - #"^(D7FF)"
		#"^(f900)" - #"^(FDCF)" #"^(FDF0)" - #"^(FFFD)"
	]
	word: [w1 any w+]
(sorry if that looks messy)
Both w1 and w+ appear to be very large values.  Would it be smart 
to perhaps do:

	[[aw1 | w1] any [aw+ | w+]]

Where 'aw1 and 'aw+ are limited to ascii values?
Steeve
22-Oct-2009
[4548x4]
Uses R3 (and his optimized complemented bitsets)
Anyway, a bitset with a length of 2 ** 16 is not so huge in memory 
(only 16kb)
64 Kb , sorry
So W1 + W+ = 128Kb

Is this a problem ?
Chris
22-Oct-2009
[4552]
That's what I'm asking.  Complemented bitsets wouldn't make a difference 
here though as the excluded range is of similar scope, right?
Steeve
22-Oct-2009
[4553x2]
It seems
if the size is a problem you can build a function to test each range.
But It will be slow
Chris
22-Oct-2009
[4555x3]
Not size, efficiency.
Allowing 'into to look inside strings can break current usage of 
'into, requiring [and any-block! into ...]
An example: a nested d: [k v] structure where 'k is a word and 'v 
is 'd or any other type:

	data: [k [k "s"]]

R2, you can validate with d: [word! [into d | skip]]


Now you have to specify: d: [word! [and any-block! into d | skip]] 
otherwise you get an error if 'v is a string!
Sunanda
25-Oct-2009
[4558]
I guess parse can do this too?

   http://stackoverflow.com/questions/1621906/is-there-a-way-to-split-a-string-by-every-nth-seperator-in-python
Will
25-Oct-2009
[4559]
is R2/Forward available for download? thx
Geomol
25-Oct-2009
[4560x2]
Sunanda, one way:

>> out: clear []

>> parse "this-is-a-string" [mark1: any [thru "-" [to "-" | to end] 
mark2: (append out copy/part mark1 mark2) skip mark1:]]
>> out
== ["this-is" "a-string"]
Another:

>> out: parse "this-is-a-string" "-"
>> forall out [change/part out rejoin [out/1 "-" out/2] 2]
>> out
== ["this-is" "a-string"]
Steeve
25-Oct-2009
[4562]
R3 one liner ;-)

>> map-each [a b] parse "this-is-a-string" "-" [ajoin [a #"-" b]]
Graham
26-Oct-2009
[4563]
Rebol doesn't have lines :)
BrianH
26-Oct-2009
[4564x2]
Chris, there can be an advantage in R3 to breaking up a bitset into 
more that one bitset on occasion, mostly memory savings. However, 
it might not work as well as you might like since offset and/or sparse 
bitsets aren't supported. Bitsets that involve high codepoints will 
take a lot of RAM no matter what you do.
Will, R2/Forward is already available for download in DevBase (R3 
chat). It is a little outdated though, since I had to take a break 
to rewrite R3's module system. I'll catch up when I get the chance. 
The percentage of R3 that I can emulate has gone down drastically 
since the last update, since R3 has made a lot of changes to basic 
datatype behavior since then. We'll see what we can do.
Steeve
26-Oct-2009
[4566x2]
Something funny.

I spent an hour debugging a parsing rule. 
To finally understand this.  
Never name a rule, LIMIT. 
LIMIT keyword is reserved for a further use in parse apparently.
(in R3)
Pekr
26-Oct-2009
[4568x2]
:-)
I thought it is not implemented yet, hence no reservation?
Steeve
26-Oct-2009
[4570]
if you just try to use it, your parsing may crash. So, it's doing 
nothing but it's here.
Pekr
26-Oct-2009
[4571x2]
Hmm, you are right .... But we might need better error message, no?

>> test: ["123"] parse "123" [test]
== true

>> limit: ["123"] parse "123" [limit]
** Script error: PARSE - invalid rule or usage of rule: end!
** Where: parse
** Near: parse "123" [limit]
posted to Chat/R3/Parse group ...
BrianH
26-Oct-2009
[4573x2]
Keywords that are *planned* to be added should definitely be reserved.
Otherwise adding them would be difficult.
Steeve
26-Oct-2009
[4575]
But it should return a proper error message as Pekr noticed it.
BrianH
26-Oct-2009
[4576]
Agreed :)
Robert
8-Nov-2009
[4577x2]
I have used www.antlr.org stuff several years ago with C/C++ target. 
It's a very cool parser generator toolkit. Just took a look again. 
It has emitters for different languages. Maybe one of the parse gurus 
here can take a look if we can do a REBOL emitter.
IMO that would be really nice.
JoshF
17-Nov-2009
[4579x4]
Hi! I'm trying to use REBOL's parse to make a simple calculator dialect. 
However, I'm having trouble with escaping entities (I think)...  
Here's my first try (that worked):
>> parse [3 + 2] [some [integer! (print "number") | ['+ | '- ] (print 
"op")]]
number
op
number
== true
>> parse [3 - 2] [some [integer! (print "number") | ['+ | '- | '* 
| '/ ] (print "op")]]
** Syntax Error: Invalid word-lit -- '

** Near: (line 1) parse [3 - 2] [some [integer! (print "number") 
| ['+ | '- | '* | '/
 ] (print "op")]]
The second one failed when I tried to extend the dialect with multiply 
(*) and divide (/). After further experimentation, it seems that 
you can't escape the "/". Google has not been helpful here... Does 
anybody have any ideas? I could parse for just a word! instead of 
the +, -, etc., but I wanted parse to do the work of deciding what 
was a valid operation or not. Sorry for the multiple messages, I'm 
still trying to figure this client out... Thanks for any advice!
Ladislav
17-Nov-2009
[4583]
JoshF: Rebol load does not parse the '/, but you can do:

as-lit-word: func ['word [any-word!]] [to lit-word! word]
lit-div: as-lit-word /

parse [3 - 2] [some [integer! (print "number") | ['+ | '- | '* | 
lit-div] (print "op")]]
JoshF
17-Nov-2009
[4584x2]
Ha! Black magic! That works a champ Ladislav, thanks very much!  
I had tried 
>> tdiv: to-word "/"
== /

>> parse [3 / 2] [some [integer! (print "number") | ['+ | '- | '* 
| tdiv ] (print "op
)]]
But had gotten the same error. What makes yours work?
Both tdiv and lit-div type? to a word!...
Ladislav
17-Nov-2009
[4586x2]
My example works, since the LIT-DIV variable refers to a lit-word, 
while your tdiv refers to a word
check as follows:

type? :lit-div
type? :tdiv
Henrik
17-Nov-2009
[4588x2]
If LOAD won't eat a block, PARSE won't either, so you can test your 
block with LOAD. Some words can't be typed directly in, hence ladislav's 
solution.
And also hence the expression "a block is or isn't loadable"
JoshF
17-Nov-2009
[4590]
OK... Mechanically, I see what you're saying, but what's the difference 
between a lit-word and a word? The spirit eludes me...
Ladislav
17-Nov-2009
[4591]
just a different datatype
JoshF
17-Nov-2009
[4592x2]
I thought there was only word!'s and then everything else were more 
concrete types. I guess what I am asking is what is the purpose of 
lit-words?
Or are they just used for the special case of dealing with a / in 
load? ;  - )