• Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r4wp

[#Red] Red language group

DocKimbel
7-Dec-2012
[4628x3]
The lexer is choking on get-word used in path...let me see that...
Actually, it's blocking on s/(++ step), such syntax should be supported 
by the lexer, so there's a bug there.
Steeve: I have fixed the lexer bug, so it should at least load correctly 
now. But paren! in path are not yet compiled, so you'll get a "feature 
not implemented" at compilation.


Also, passing a function as argument is not yet correctly handled. 
Also I'm unsure if s/:step: will be compiled correctly, as we haven't 
yet much tests for path accesses.
Kaj
7-Dec-2012
[4631]
All examples compile without warnings now
DocKimbel
7-Dec-2012
[4632x2]
Thanks!
(for testing)
Gregg
7-Dec-2012
[4634]
For hex notation in REBOL, I've used (albeit dynamically) a simple 
HEX function with issues. 

  hex #20000001


I'm OK with the suffix approach, but if a prefix approach works I 
like that the prefix clues you in to what you're reading, rather 
than reading the number and then seeing the suffix. The question 
is what sigil to use, if lexical space becomes very tight, as in 
REBOL. Do you have any plans for &?

  &HFFFF000F
  &O77770007  ; though I don't think we need octal
  &B11110001
Maxim
7-Dec-2012
[4635]
the following are currently invalid REBOL notations (the first three 
load in R2 but get scrambled)

I prefer the first tree, since they are pretty obvious without any 
knowledge of the language.

16#FFFF000F
8#7124554764
2#0110110101

H#FFFF000F
O#7124554764
B#0110110101
Gregg
7-Dec-2012
[4636]
I like having the numbers in binary! values, but not as much for 
this. My brain says "this is a binary in base 16 notation", but for 
hex or binary literals, I want to think of the words 'hex and 'binary, 
rather than "this is a base-16 number, which means it's in hex format". 
I think I looked for alternate notations a long time ago. Have to 
see if I can find my notes.
DocKimbel
7-Dec-2012
[4637]
I have found an issue with word! value casing in Red. The Red/System 
code generated for:
	print 'a = 'A
is:
          stack/mark-native ~print
          stack/mark-native ~strict-equal?
	word/push ~a
          word/push ~A
          natives/strict-equal?*
          stack/unwind
          natives/print*
          stack/unwind


The problem is that Red/System is case-insensitive, so ~a and ~A 
are the same variable. So, no way to make it work like that. I see 
two options for solving it:

1) Make Red/System case-sensitive.

2) Deep encode each Red generated symbol to distinguish lower and 
uppercases.


Solution 2) works, but it makes symbol decoration operation very 
costly (each symbol letter is prefixed with a sigil for lowercases 
and another one for uppercases). The example above becomes:

          stack/mark-native ~_p_r_i_n_t
          stack/mark-native ~_s_t_r_i_c_t_-_e_q_u_a_l_?
          word/push ~_a
          word/push ~-A
          natives/strict-equal?*
          stack/unwind
          natives/print*
          stack/unwind


So, it is not nice, it doubles every Red symbol size that is handled 
by Red/System and slows down Red compilation by 25%.

So, my questions are:
a) Does anyone see another cheaper solution to this problem?

b) In case of option 1), do you have anything against making Red/System 
identifiers case-sensitive?
Kaj
7-Dec-2012
[4638]
Hm, I like that Red/System is case-insensitive like REBOL, so I would 
consider it an offer to have to let go of that
DocKimbel
7-Dec-2012
[4639x3]
Hmm, actually, another option should be possible, generating a unique 
new symbol for same words that have different casing. I will test 
it tomorrow. Anyway, if you have ideas/remarks about this, let me 
know.
Anyway, I don't think we use different casing for identifiers in 
Red/System. Even in REBOL, I don't remember ever using same words 
with different casing in the same app.
I would like to fix this issue and make words comparison operators 
work for the new release, so I'll postpone the release for tomorrow.
Gregg
7-Dec-2012
[4642x3]
Do you know how REBOL handles it? I prefer case-insensitive in general, 
but doubling the size of identifiers seems bad, even if hidden from 
us for the most part.
Case-sensitivity could trip up a lot of REBOLers. I know this is 
Red/System, but still. You may also find that people treat it as 
a feature and start giving things names that differ only in case, 
as happens in C.
What are the biggest downsides to having Red/System remain case-insensitive? 
That is, what does case sensitivity buy us?
Kaj
7-Dec-2012
[4645x4]
In REBOL, 'a and 'A are aliases of the same symbol. Red/System converts 
them to their integer identifier, right? I'd say you need different 
identifiers for aliases somehow to implement the REBOL semantics 
of distinguishing equal? and strict-equal?
That is, identifiers need two levels: the first level for identifying 
the symbol, and the second level for distinguishing aliases
The most space efficient encoding I can come up with would be something 
like ~a-1 for 'a and ~A-2 for 'A. That would be cheap to evaluate 
for strict-equal? but expensive for equal?
A faster encoding would be to reserve a part of the integer identifier 
for the alias number, for example one byte. That would reduce the 
number of different symbols to 2^24 and the maximum number of aliases 
for one symbol to 256. That would only allow a word up to 8 characters 
to have all its aliases, but it would be cheap to evaluate for both 
strict-equal? and equal?
DocKimbel
8-Dec-2012
[4649x5]
In REBOL, 'a and 'A are aliases of the same symbol. Red/System converts 
them to their integer identifier, right?


Symbols have two representations in Red compiler, one is at runtime 
(like in REBOL), the other is a compile-time, in the form of Red/System 
variables. In a very early version of the compiler, I was using integers 
(indexes in symbol table) instead of variables, but quickly realizef 
that it was obfuscating the generated Red/System code a lot, making 
it difficult to debug. Also, the integer approach had an additional 
runtime cost at it required to make an array access in order to retrieve 
the symbol value.


Currently, the Red/System ~<name> variables directly point to a word! 
value version, instead of a symbol! for simplicity and efficiency.
I have implemented a compile-time aliasing system for same words 
but different casing. It works fine so far and is cheap compared 
to other options (it requires a conversion table (symbol->alias) 
to be maintained during the compilation).
Aliases are already implemented in the symbol! type. Basically a 
word! relies on a symbol ID, which is an entry in the symbol table. 
Each entries in this table is a symbol! value that references the 
internal Red string! value and a possible alias ID (which is just 
another symbol ID).


Now, I just need to add alias handling in the equal? and strict-equal? 
natives when applied on words to make it work correctly.
What are the biggest downsides to having Red/System remain case-insensitive? 
That is, what does case sensitivity buy us?


Good question. I think it doesn't buy us anything nor does it remove 
us any useful feature. Actually, I think that as long as you are 
consistent in the way you name your identifiers (variables, functions, 
contexts,...), you are case-neutral. So, having Red/System case-sensitive 
wouldn't change anything for me and I guess it would be the same 
for others.


Anyway, I prefer to keep it case-insensitive for now, for the sake 
of consistency with Red, unless I really need to change it.
Ok, now equality comparison operators work on all word datatypes.
Gregg
8-Dec-2012
[4654]
Thanks Doc. This is good information to put in a doc somewhere, even 
if just as a reminder to formally doc it later.
BrianH
8-Dec-2012
[4655x2]
Why would = translate to strict-equal? - shouldn't that be == instead?
This is one area where copying R3 as it is now would be a bad idea 
though. See http://issue.cc/r3/1834for details.
DocKimbel
8-Dec-2012
[4657x2]
Brian: wrt '=, it's a typo, it should be ==.
I haven't implemented EQUIV? yet, I'll look at it when we'll have 
a complete IEEE-754 support (we are missing INFs and NaN handling 
in Red/System).
Marco
8-Dec-2012
[4659]
About hex notation etc (I like case insensitiveness for numbers):
0&a1B
0%10110
or
0b10110
0ha1B
DocKimbel
8-Dec-2012
[4660x3]
0%... prefix will clash with percent! datatype literal form.
The two last (0b... and 0h) do not read easily IMHO, especially if 
lowercases are allowed.
Anyway, having a prefix rather than a suffix is a possible option.
Steeve
9-Dec-2012
[4663]
How do one know a rebol function is supported or not ?

I tried a simple FOR loop, but no result and the compiler is quiet.
DocKimbel
9-Dec-2012
[4664x4]
How do one know a rebol function is supported or not ?

 Currently, only by looking in the source code. The compiler is lack 
 a lot of checks, so you need to get your Red code right for now.
lacking
The source code should be easily parse-able, so the list of functions, 
native, actions, ops could be extracted and pretty-printed as a web 
page. IIRC, someone tried to make such script but I didn't see any 
result yet.
New features added today worth mentioning:


- comparison operators (=, ==, <>, <, <=, >=, >) support extended 
to all datatypes.

- FIND action added (supports block! only for now, /match not implemented, 
/only always on)
Gregg
9-Dec-2012
[4668]
Excellent news Doc.
Kaj
9-Dec-2012
[4669]
Jerry wanted to publish ongoing feature stats
Arnold
9-Dec-2012
[4670]
Yes I wanted to give it a try for the doc scripts. But parse is not 
my expertise, and at the moment I am short in time as I can make 
extra hours at work. So everybody step in please and publish your 
baby-doc-scripts so we can all contribute little bits.
Kaj
9-Dec-2012
[4671]
Working to fix that COBOL code for 21-12-2012 to prevent the end 
of the world, eh? ;-)
Endo
10-Dec-2012
[4672]
About the case-sensitivity,

What about to convert all the words into lowercase in compile time? 
Does it lead some unicode problems? What if a word is in Chinese, 
is there lower/upper cases in Chinese?
GrahamC
10-Dec-2012
[4673]
No case sensitivity in chinese as there is no case
Kaj
10-Dec-2012
[4674]
The issue is to keep them separate, instead of merging them into 
lowercase; but Doc has fixed it so far
BrianH
10-Dec-2012
[4675x2]
For compiled code does it really matter? I thought it would only 
matter for words-as-data, and that compilation of case-insensitive 
code would make most words go away. For words-as-data, having some 
duplicate data when appropriate should be OK.
Are you going to have case-sensitive objects, or just case-preserving?
DocKimbel
10-Dec-2012
[4677]
What about to convert all the words into lowercase in compile time?


Words values are not "compilable", they are data (words used as variables 
can be "compiled" to some extents). Converting all words into lowercase 
during compilation (including JIT-compilation for words constructed 
at runtime) would make you loose the ability to distinguish lower/upper-cased 
letters, leading to big issues and pitfalls in the language. For 
example: (form 'A) = "a" (beause 'A would get converted to 'a). Not 
an option.