r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[!REBOL3 Proposals] For discussion of feature proposals

Maxim
28-Jan-2011
[907]
does JOIN still reduce like it did in R2?
BrianH
28-Jan-2011
[908]
Yup, so it's not a direct correspondance.
Maxim
28-Jan-2011
[909]
I always wondered why it reduced... I find that very annoying... 
many times I'd use it and It ends up mucking up my data, so I just 
almost never use it.
BrianH
28-Jan-2011
[910]
I mostly use REJOIN and AJOIN instead of JOIN, or maybe APPEND COPY.
Maxim
28-Jan-2011
[911]
me to.
Ladislav
28-Jan-2011
[912]
I am pretty sure, that:


1) the set operations in Rebol are in fact GC-safe using the standard 
meaning of the sentence

2) it is always necessary to use auxiliary data, if the wish is to 
do the set operation efficiently

3) nobody pretending to need a modifying version really needs an 
inefficient variant, which does not use any auxiliary data
Maxim
28-Jan-2011
[913]
why do you say we "pretend"?
BrianH
28-Jan-2011
[914]
I think that people who need a modifying version really need it, 
but the rest of us need the non-modifying default :)
Ladislav
28-Jan-2011
[915x2]
really need it

 - does that mean they need what I specified - an inefficient variant 
 not using any auxiliary data?
I suppose that is nonsense
BrianH
28-Jan-2011
[917]
Nope, it just means they need the first argument modified to contain 
the result instead of what it originally contained.
Maxim
28-Jan-2011
[918]
no one is saying to use anything less efficient than the copy returning 
version.
BrianH
28-Jan-2011
[919]
The only difference between the DEDUPLICATE code in the ticket and 
a native version is that the auxiliary data could be deleted immediately 
after use instead of at the next GC run.
Maxim
28-Jan-2011
[920x4]
and the data is managed directly by C, not by the interpreter, which 
is faster for sure.
which also means that some of that can possibly be optimised by the 
compiler... something that cannot happen within rebol.
its also going to use less ram, even if it does use some auxilliary 
data... since that auxilliary data is not wrapped within REBOL interpreter 
wrappers.
or at least, parts of it wont.
BrianH
28-Jan-2011
[924]
Not much less RAM. The "interpreter wrapper" is pretty much constant, 
no matter the size of the data. Remember, the data you are doing 
set operations on is REBOL data already.
Maxim
28-Jan-2011
[925x2]
yes, but the extra data used to build it as a mezz, including the 
stack frames and stuff is prevented.   


I know I'm being picky here.  but we're doing a detailed analysis.. 
 :-)
but in the end, its the usability which everyone wants, even if its 
only slightly more effective.
Ladislav
28-Jan-2011
[927]
The only difference between the DEDUPLICATE code in the ticket and 
a native version is that the auxiliary data could be deleted immediately 
after use instead of at the next GC run.
 - that would be inefficient as well
BrianH
28-Jan-2011
[928]
INSERT, CLEAR and UNIQUE are already native, so the actual time-consuming 
portions are already optimized. The only overhead you would be reducing 
by making DEDUPLICATE native is constant per function call, and freeing 
the memory immediately just takes a little pressure off the GC at 
collection time. You don't get as much benefit as adding /into to 
REDUCE and COMPOSE gave, but it might be worth adding as a /no-copy 
option, or just as useful to add as a library function.
Maxim
28-Jan-2011
[929x2]
right now the GC is very cumbersome. it waits for it to have 3-5MB 
before working. and it can take a noticeable amount of time to do 
when there is a lot of ram.  I've had it freeze for a second in some 
apps.

everything we can do to prevent memory being scanned by the GC is 
a good thing.
by 3-5MB, I mean that it will usually accumulate ~ 3-5 MB of new 
data before running.
BrianH
28-Jan-2011
[931]
Mark and sweep only scans the referenced data, not the unreferenced 
data, but adding a lot of unreferenced data makes the GC run more 
often.
Maxim
28-Jan-2011
[932]
yep.
Ladislav
28-Jan-2011
[933x2]
right now the GC is very cumbersome. it waits for it to have 3-5MB 
before working. and it can take a noticeable amount of time to do 
when there is a lot of ram.  I've had it freeze for a second in some 
apps.

 - what exactly does the GC have in common with the "Deduplicate issue"?
I demonstrated above, that, in fact, nothing.
BrianH
28-Jan-2011
[935]
But that doesn't  mean that deallocating immediately will be any 
more efficient; likely it won't.
Ladislav
28-Jan-2011
[936]
This is all just pretending, if, what is needed, is a kind of incremental/generational/whichever 
other GC variant, then no "Deduplicate" can help with that
BrianH
28-Jan-2011
[937x2]
We don't need DEDUPLICATE to help with the GC. He was suggesting 
that having it be native would help reduce the pressure on the GC 
when used for other reasons instead of a mezzanine version. I don't 
think it will by much.
He needs DEDUPLICATE for his own code. The GC also needs work, but 
that is another issue :)
Maxim
28-Jan-2011
[939]
if we implement deduplicate as a mezz, we are juggling data which 
invariably tampers the GC.  doing this native, helps to prevent the 
GC from working to hard.


the problem is not how long/fast the allocation/deallocation is... 
its the fact that cramming data for the GC to manage, will make the 
GC trigger longer/more often.
BrianH
28-Jan-2011
[940]
But you would need to deallocate every time the function is called, 
instead of just when the GC is called. This is often slower.
Ladislav
28-Jan-2011
[941]
Did you read Carl's note to the subject?
BrianH
28-Jan-2011
[942]
The subject of the set function implementation, or the GC implementation 
and how it compares to direct deallocation? If the latter then no.
Ladislav
28-Jan-2011
[943]
I meant the note more to Max, and it was about the set function
BrianH
28-Jan-2011
[944]
Ah, cool :)
Ladislav
28-Jan-2011
[945]
The GC is not a slow approach to the garbage collection. The main 
problem is, that it is "unpredictable", and possibly producing delays, 
when other processing stops. (but that does not mean, that immediate 
collection would be faster)
Maxim
28-Jan-2011
[946]
yes I did.
Ladislav
28-Jan-2011
[947]
the "stop the world" approach is disturbing for user interface, which 
might need a different type of GC...
BrianH
28-Jan-2011
[948]
Also, multitasking could require a different kind of GC. Any thoughts 
on this, Ladislav?
Maxim
28-Jan-2011
[949x2]
just adding a generational system to the GC would help a lot.  I've 
read that some systems also use reference counting and mark and sweep 
together to provide better performance on data which is highly subject 
to volatility.
though I guess it requires a bit more structured code than rebol 
to properly predict what is expected to be volatile.
Ashley
28-Jan-2011
[951]
re DEDUPLICATE ... it's not just GUI code that would benefit, DB 
code often needs to apply this on the result set. "buffer: unique 
buffer" is a common idiom, which can be problematic with very large 
datasets.
BrianH
28-Jan-2011
[952]
You still need an intermediate dupe even for DEDUPLICATE. This just 
makes it so the argument is modified, in case there are multiple 
references to it.
Ladislav
29-Jan-2011
[953]
it's not just GUI code that would benefit, DB code often needs to 
apply this on the result set. 

buffer: unique buffer" is a common idiom, which can be problematic 
with very large datasets" - that is exactly where I was trying to 
explain it was just a superstition - buffer: unique buffer is as 
memory hungry as you can get any Deduplicate is just pretentding 
it does not happen, when, in fact, that is not true
Ashley
29-Jan-2011
[954]
Does DEDUPLICATE *have to* create a new series. How inefficient would 
it be to SORT the series then REMOVE duplicates starting from the 
TAIL. Inefficient as a mezz, but as native?
Ladislav
29-Jan-2011
[955]
Deduplicate does not have to use auxiliary data, if the goal is to 
use an inefficient algorithm. But, in that case, there's no need 
to have it as a native.
Maxim
29-Jan-2011
[956]
even if done without aux data, it will still be MUCH faster to do 
in native.