• Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r4wp

[#Red] Red language group

DocKimbel
3-Jan-2013
[5042]
Donations are currently vital for Red continuation as it is my only 
revenue stream. Without all the donations I have received, Red progress 
would slowdown a lot. So, it is probably even more important than 
contributing code. ;-)
Jerry
3-Jan-2013
[5043]
Doc, Issue! in Red will be Any-String! (like in R2) or Any-Word! 
(like in R3) ?
DocKimbel
3-Jan-2013
[5044]
Good question: I am just about to add issue! support to Red today. 
I think the change in R3 to treat it as a word is a good one, as 
the main issue! usage is for keywords. As in R3, it will allow digits 
as first character.
Kaj
3-Jan-2013
[5045]
I'm doubtful about it, as it is also commonly used to denote bug 
numbers and such, which is now expensive in R3
Andreas
3-Jan-2013
[5046]
Hmm, why is that expensive now?
Kaj
3-Jan-2013
[5047x2]
Each issue number adds a word to the word registry, that isn't garbage 
collected like strings are
I would like to be proven wrong
Andreas
3-Jan-2013
[5049x2]
No that's true. But I wouldn't consider that particularly expensive 
(esp not with R3 effectively abandoning the word limit).
For compiled Red, I think this would not matter at all (as strings 
are interned as well, afaik).
BrianH
3-Jan-2013
[5051x2]
As I mentioned in Rebol School (and elsewhere earlier), issues can 
be made to behave like strings to a certain extent even if they're 
words. To do that in compiled code you'd need to keep their spellings 
around though, unless you resolve all of those function calls statically 
(which you would be able to because issues would be immutable).
Having issues be immutable and unique could lead to lower memory 
usage, Kaj. Sure, you wouldn't be able to garbage collect them, but 
additional copies of the same issue wouldn't add any additional memory. 
Plus, you can't necessarily GC strings either - only when you don't 
need to keep references to them anymore. It may depend on the app 
whether it's more efficient to have issues be strings or words.
Kaj
3-Jan-2013
[5053x2]
Symbols are structs in the Red runtime. If you have an app server 
running that handles issue!s, it will accumulate memory over time 
that you can't collect. It will be indistinghuishable from a memory 
leak
You could even use it for a DoS attack
BrianH
3-Jan-2013
[5055]
Same with other word types, of course.
Kaj
3-Jan-2013
[5056x4]
True, but that's not a good reason to increase the problem
If you think about what you'd have to do to secure a server from 
memory overload, it would be reasonable to limit acceptable words 
to a certain dictionary, but it wouldn't be reasonable to limit issue! 
to a small range
All in all, I feel that the nature of issue! is not the same as the 
nature of word!
As programmers, we usually see forms such as #if and think it's a 
REBOL issue!, but that's not how it is used in common English
BrianH
3-Jan-2013
[5060]
They're not used at all in common english. You see them in Twitter-speak 
though.
Arnold
3-Jan-2013
[5061]
Very sharp ;)
Andreas
3-Jan-2013
[5062x2]
I think Kaj rather meant the name "issue", not the syntactical form 
(#...).
(And I think we also had discussion about renaming the R3 type to 
e.g. keyword!.)
Kaj
3-Jan-2013
[5064x2]
Surely in English people write #1, #2 and such?
Twitter certainly didn't invent them :-)
Maxim
3-Jan-2013
[5066x2]
literaly, it reads as 'number'  it music it reads as 'sharp'  any 
other use isn't from proper english afaik.
(in music)
Kaj
3-Jan-2013
[5068]
Yes, in Dutch, we write nr. like no. in other languages
BrianH
3-Jan-2013
[5069]
In business correspondence it can mean number, in Twitter speak it's 
a hashtag, in music it means someone wrote a sharp with the wrong 
character. In English, it's a symbol that means pound (the weight, 
not the currency), but it's not common anymore.
Andreas
3-Jan-2013
[5070]
In American English, that is :)
BrianH
3-Jan-2013
[5071]
I think it only precedes a word when it means number, or is in Twitter-speak.
Kaj
3-Jan-2013
[5072]
It doesn't really mean pound; English keyboards have a pound sign 
(the currency, which is the weight of silver) where # is on American 
keyboards
BrianH
3-Jan-2013
[5073]
It means pound on American keyboards. Maybe they don't use the character 
for that in England. We just use lb here now.
Sunanda
3-Jan-2013
[5074]
UK keyboards also have the "#" character. And  it's unshifted so 
it's more convenient than some other chars, such as "@" or "&" -- 
they are shifted on UK layouts
# is called hash over here.
PeterWood
3-Jan-2013
[5075]
Kaj: "Surely in English people write #1, #2 and such?" Certiainly 
not. An English person would never write that. An American would.
DocKimbel
3-Jan-2013
[5076]
Kaj: I share your security concerns about an appserver, but I don't 
think that other words datatypes can really be more secure. As long 
as you can force the LOADing of arbitrary input strings (without 
even evaluating the code), you could use it to make the symbol table 
blow up the memory.
Kaj
3-Jan-2013
[5077x3]
Peter, OK, but that's where issue! comes from
Doc, my point is that one would be more likely to screen for limited 
word use than limited issue! use
Would it be possible to have a recycle feature for the symbols registry?
DocKimbel
3-Jan-2013
[5080x7]
Hardly, the symbol table purpose is to provide a mapping between 
an integer value (the symbol ID) and a string representation. If 
we could allow the removal of a symbol, we would need: 


1) to be sure that a symbol is not used anymore anywhere (would require 
an equivalent of a full GC collection pass) before removing it.


2) maintain a list of freed "slots" in the symbol table for re-use.


3) being able to trigger the symbols-GC at relevant points in time.


Even with that, it would still be hard to counter a LOAD-based attack 
on the symbol table.
screen for limited word use

 That would need to happen at the LOAD level...not very clean from 
 a design POV.
(but doable)
GC collection pass
 => GC mark pass
Actually the best defense against such attacks is to never use LOAD 
on untrusted sources.
In the case where potentially harmful input needs to be LOADed, the 
input string needs to be validated before LOADing it with some good 
heuristics. I don't see any other way.
Kaj: you should also note that refinements already exhibit exactly 
the same behavior as issue-as-word! You can use digits only in refinements.
BrianH
3-Jan-2013
[5087]
As a basic screen, you can check the length of what you're loading. 
It can't blow out your memory much beyond twice the length of the 
source (once to read it, once for the results).
Gregg
3-Jan-2013
[5088x2]
I use issues for IDs, phone numbers, pseudo-GUIDs, and serial numbers. 
I use INCLUDE as well, and the other PREBOL bits that use them as 
keywords.


Could I use a string for those things? Sure. But I like having a 
datatype with more meaning.
Handy when parsing as well.
DocKimbel
4-Jan-2013
[5090x2]
Issue! datatype added: https://github.com/dockimbel/Red/commit/177b65e67dfc23b1fe7475686a65af49fee7e939
I think issue-as-string could still be useful, so I was wondering 
if supporting both would be a good idea. I could be achieved by adding 
a keyword! datatype, we could then have two syntaxes:

    #<keyword>			;-- for issue-as-word (keyword! datatype!)
    ##<issue>			;-- for issue-as-string (issue! datatype!)

What do you think?