r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[!REBOL3]

Geomol
19-May-2011
[8736]
I found an bad effect of not binding words in blocks at all, before 
the block is evaluated. Functions like LOOP take 2 args, count and 
block. By not binding the block content before it's evaluated, the 
count arg local to LOOP is found, if a count var is used in the block.

So I guess the REBOL early bind of words is better.
BrianH
19-May-2011
[8737]
LOOP's count arg is not bound to the block of code, so it is not 
local. This is why LOOP is the fastest loop. REPEAT is the version 
of LOOP with a bound local arg.
Geomol
19-May-2011
[8738]
The backside of REBOL's binding is all the unset words, we start 
out with. R2 has 1865 unset words registered in system/words. They 
can be seen with this code:


 foreach w first system/words [if unset? get/any to lit-word! w [prin 
 [w ""]]]
Maxim
20-May-2011
[8739x2]
but doesn't this list disapear with the new multi-level booting of 
R3?  since those words are now not sent in the user context but stay 
in their lower levels?
but its not a big deal... unset words are created just by loading 
any word in a block.  it just means, I've seen this symbol before 
and it now has its reserved number in the word list/hash table. any 
future reference will reuse the same word id (number).
Geomol
20-May-2011
[8741x5]
Yeah, they will be reused. But the way, REBOL do it, if you have 
an application, that do a lot of block parsing for example, with 
new words coming in all the time, then that global context will just 
grow and grow. In reality, it will probably come to an end, as there 
are a finite number of words in a human language, if that's what 
being parsed.


If words were not bound by just being loaded, but only when evaluated 
(or compiled, if that's the case), then parsing blocks would not 
produce any unset! words in the global context. But a consequence 
of this is, that blocks of code being sent to a function (like LOOP), 
will be able to see the words local to that function, unless the 
block is first bound outside the function, like

	count: 1
	loop 10 bind [print count] 'count
, which will then print the number 1 10 times.
I just realized, the global context in R2 and the user context in 
R3 (system/contexts/user) wont grow by this:

to block! "some_random_word"

but it will grow by this:

[some_random_word]
The consequence:

>> count: 1
>> blk: [print]
>> append blk to block! "count"
== [print count]
>> do blk
** Script Error: count word has no context

I wonder, why TO BLOCK! works like this.
Maybe that's the way to have blocks of lots of words, which are just 
data and not bound to anything.
So to not exhaust the global context (if that's a problem), we should 
parse like this:


parse to block! "a few words and then some more" to block! "'a 'few 
'words to end"
Rebolek
20-May-2011
[8746]
AFAIK, to block! doesn't do binding, you have to LOAD the block.
Kaj
20-May-2011
[8747x2]
Yes
John, you seem to want to break down the basic principles REBOL is 
built on
Geomol
20-May-2011
[8749]
I'm trying to figure out the basic principles to understand better, 
what REBOL is. What's good design and what isn't. And what the consequences 
are of different design.
Maxim
20-May-2011
[8750]
john, you are in error when you say: "exhaust the global context" 
 the number of words in the global context is irrelevant to exhausting 
the number of usable words in rebol.

the reason is that binding is not what reserves words in master word 
table.  its anything that creates a new word value, bound or not.

here is an example, using your own to block! example:

>> repeat i 100000 [to-block join "random-word" i]
>> probe length? first system/words
== 2616


pump up the number to 500000 (in 2.7.8) and it crashes.   IIRC this 
was as low as 32k in older versions ! 


with each increment of 100000 you will see the rebol process gobble 
up a few MBs more as it assigns new word-ids to those words.
Kaj
20-May-2011
[8751x2]
Yes, REBOL is symbolic, so there is an internal table of numeric 
IDs for every word it ever encountered in the session
This is an indirection, but different from binding. To abolish that 
table would mean to keep strings everywhere internally instead of 
simple numbers. If you want a language to work that way, you should 
use shell scripting. It's very slow
Maxim
20-May-2011
[8753]
yep
Geomol
20-May-2011
[8754]
I think, there is a third alternative. When we deal with strings, 
a data structure is made, and we just have a pointer to that.

var: "a string"
var: none


When noone is using the string anymore, memory can be completely 
cleaned for it. If I do the same with a word:

var: 'word
var: none


I don't see, why memory can't be cleaned just the same. Is it a design 
flaw, the way it is?
onetom
20-May-2011
[8755x2]
which suggest using some kind of reference counter for the words, 
but what would decrement such a reference counter?
*suggests
Geomol
20-May-2011
[8757]
REBOL use garbage collection, but in the case of counters, the same 
that would decrement a string counter. If a word stop pointing to 
it, decrement. If it's in a block, and the block is destroyed, decrement 
the block content.
BrianH
20-May-2011
[8758]
R3 doesn't have anything like R2's system/user. For all we know symbols 
could be garbage collected. In 32biit R3 though, afaik you will not 
reach the total number of possible words until you have already hit 
the limits of the memory address space first. Does someone have a 
computer with enough RAM to test this? I only have 4 GB.
Andreas
20-May-2011
[8759]
If you write the test, I can certainly run it :)
BrianH
20-May-2011
[8760x2]
for x 0 to-integer #7fffffffffffffff 1 [to-word ajoin ["a" x]]


Then watch the memory usage of the process, and tell us what happens 
and which error is triggered when it fails.
>> to-integer #7fffffffffffffff
== 9223372036854775807

That will be many more words than could be held in memory in a 32bit 
address space.
Geomol
20-May-2011
[8762]
R3 doesn't have anything like R2's system/user.


I don't think, anybody mentioned R2's system/user. Do you mean system/words 
?
BrianH
20-May-2011
[8763x2]
Sorry, yes, that's what I meant.
Though technically all R3 objects are more like system/words than 
they are like R2's objects.
Geomol
20-May-2011
[8765]
R3 has system/contexts/user , which seem to work like R2's system/words. 
Try

? system/contexts/user


before and after making some random words, e.g. in a block like [some 
random words]
BrianH
20-May-2011
[8766x2]
No, it really doesn't. All loaded words are added to system/words. 
Only words that are referenced directly in user scripts, or added 
explicitly with INTERN, are added to system/contexts/user. I had 
to add a bit of really careful code to make sure that system/contexts/user 
doesn't get words added to it until absolutely necessary, because 
it is an isolated context from the runtime library system/contexts/lib.
Symbols in R3 are stored in an internal symbols table, a btree(?) 
or some other unknown data structure that you can't reference externally.
Geomol
20-May-2011
[8768]
Is the internal symbol table there to save memory? Maybe like Lua's 
internal structure to hold strings?
BrianH
20-May-2011
[8769x3]
The internal symbol table is there to make symbols work at all. In 
R2, system/words was the symbol table. However, it does save memory 
relative to strings because there are no duplicates, and because 
the symbol data for the words is stored in UTF8 instead of 16-bit 
characters.
They aren't added to any context until you add them explicitly. The 
R3 interpreter would not presume to know what a word would mean to 
you until you tell it what it means, by binding the word to a context 
or by just using the word as data.
bbl
Andreas
20-May-2011
[8772x3]
>> repeat i to-integer #7fffffffffffffff [to word! ajoin ["a" i]]
** Internal error: not enough memory
** Where: to repeat
** Near: to word! ajoin ["a" i]
at ~1700MB resident
and it took 2h50m cpu time to get there :)
Geomol
20-May-2011
[8775]
:) Cool test!
BrianH
20-May-2011
[8776]
This code might be a better test: repeat i to-integer #7fffffffffffffff 
[if zero? i // 1'000'000 [recycle] to-hex i]

It should have less memory usage overall and if words are recycled 
then it won't run out. I'll run it now.
Geomol
20-May-2011
[8777]
Where are the words coming into the picture?
BrianH
20-May-2011
[8778]
TO-HEX generates an issue!, which is a word type in R3. Yes, you 
can even bind them.
Geomol
20-May-2011
[8779]
Spooky! :)
BrianH
20-May-2011
[8780]
I figure that not creating the temporary string, and running recycle 
every once in a while, might make the memory problems go away. So 
far the R3 process is staying at exactly the same memory, less than 
a gig. I also tossed an assignment in there so I can know which number 
it fails on.
Andreas
20-May-2011
[8781x4]
the jump from ~900M to ~1.2G took ages
then another aeon fuer 1.2G to 1.4G
and hours for 1.4G to 1.7G and fail
(jfyi)
BrianH
20-May-2011
[8785]
The time on mine won't be comparable because it's only running on 
one of 4 cores, and the others will be mildly occupied.