r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Core] Discuss core issues

Jerry
19-Oct-2006
[5756]
About the out-of-memory error, the story is ...


I am trying to compare two complete Windows Registry, which are both 
huge. I export them into files (in little-endian 16-bit Unicode), 
which are both 300+ MB. To save the space and to make REBOL easy 
to handle them, I encode these files into UTF-8. they are now 150+ 
MB. I try to load these two UTF-8 files into memory:

>> lines1: read/lines %/c/reg1.reg

== ["Windows Registry Editor Version 5.00" "" "[HKEY_LOCAL_MACHINE]" 
"" ...
>> lines3: read/lines %/c/reg2.reg
== ** Script Error: Not enough memory
** Where: halt-view
** Near: halt 
>> rebol/version
== 1.3.2.3.1
Maxim
19-Oct-2006
[5757x4]
have you tried not using lines?  its pretty easy to chop them up 
after...
not saying /lines has an issue, but I have loaded 700MB ascii files 
on a 1GB RAM computer... 150 is peanuts.  but I never use the /lines 
argument.
you should use the stats to see if something strange is going on. 
 It will give you the amount of RAM REBOL is currently using.
>> stats
== 4432555
Allen
19-Oct-2006
[5761]
There have been some issues with read/lines .  What REBOL version 
are you using? http://www.rebol.net/cgi-bin/rambo.r?sort=1&limit=1&cmd=Search&id=&pattern=read%2Flines
Maxim
19-Oct-2006
[5762]
just above he quoted:

 >> rebol/version
== 1.3.2.3.1
Allen
19-Oct-2006
[5763]
[Allen mumbles something about his glasses]
Maxim
19-Oct-2006
[5764]
lol  :-)
Jerry
19-Oct-2006
[5765x2]
Without the /line refinement, it takes less memory, and the situation 
is much better. When it reach the 1.28 GB Memory (observed by STATS), 
however, The Out-Of-Memoey Error still happened. Does the 1.28GB-boundary 
have anything to do with my 1GB physical memory?
Is there a way that I can make REBOL recycle the memory? RECYCLE 
seems not to work. Thanks for your help.
Maxim
19-Oct-2006
[5767x2]
strange... are you processing the files in any way? appending.  I 
don't see how loading two 150MB files jumps stats over 1.2GB
there is a possibility that windows does not allow any application 
to allocate more than 1.2GB.  I remember that a 3d application (Maya) 
seemed to crash or freeze when it reached  that amount or RAM IIRC.
Maxim
20-Oct-2006
[5769]
recycle will only work if you free all references to data.  you can 
do this by setting any unused global and local vars to none.
Jerry
20-Oct-2006
[5770]
Thank you Maxim. I loaded more than two 150MB files into the system 
actually. I did set unused global variables to none, then called 
RECYCLE, but it still not worked. Thanks, anyway.
Maxim
20-Oct-2006
[5771x3]
do you pass the data to functions?
cause functions retain the pointers even if they use local values.
a little shortcomming of the actual function implementation. (which 
I hope will be addressed in R3 )
Anton
20-Oct-2006
[5774x2]
Jerry, if you're not aware, just a word of caution about line by 
line entry in the console; the console will attempt to mold the result. 
That means rebol will use at least the same amount of memory again 
just to create the molded string.
I would avoid molding by putting length? on the front, and I'd also 
avoid line conversions done in non-binary mode:

	length? lines3: read/binary %/c/reg2.reg
Maxim
20-Oct-2006
[5776x2]
is it normal that join attemps to evaluate its second argument?
ex:
>> join [1 2 3] [bogus-word]
** Script Error: bogus-word has no value
** Where: repend
** Near: bogus-word

where append does not give the error:

>> append copy [1 2 3] [bogus-word]
==  [1 2 3 bogus-word]
the help only talks about concatenation, no details about reducing 
the second argument    , :-/
Graham
20-Oct-2006
[5778]
try repend .. that will give you the error you seek!
Maxim
20-Oct-2006
[5779]
exactly, I would have expected the error with rejoin. not join.
Graham
20-Oct-2006
[5780]
except join normally converts the second argument to the datatype 
of the first
Maxim
20-Oct-2006
[5781x2]
rejoin converts all internal values to the value of its first item 
in the block... similar...
but if both arguments are blocks... it should not complain.
Graham
20-Oct-2006
[5783]
rambo it
Maxim
20-Oct-2006
[5784x3]
even if I do a to-block on [bogus-word] I get no errors.
I will   :-)
sourcing join I see it uses repend instead of append.  any gurus 
share to comment if they think this shold be changed?
Anton
20-Oct-2006
[5787]
I don't think the function should be changed now. The doc string 
could be more descriptive, but it's pretty easy to read that short 
code.
Maxim
20-Oct-2006
[5788x2]
I did a RAMBO on it... I understand the effects on current code, 
maybe it should be revised for R3?
and yess, in next R2 version, the doc string should be more explicit,
Gabriele
20-Oct-2006
[5790x2]
Max, the reason behind repend there is that join "a" something and 
join "a" [something something-else] should produce similar results.
if you don't want the reduce, just use append copy "a" [something].
Maxim
20-Oct-2006
[5792]
yeah I know, but join is just much more cleaner in the code... and 
now I realise that its use is quite limited, since most uses of block 
with words have unbound words.
Gabriele
20-Oct-2006
[5793]
hmm, well, it really depends. noone complained so far :-)
Rebolek
20-Oct-2006
[5794]
Maxim, 'join is much more cleaner in code? That's matter of opinion, 
I use 'rejoin almost everywhere and 'join just very rare :)
Maxim
20-Oct-2006
[5795x4]
I use rejoin a lot to.
but not having to wrap everything in a block when you really just 
want to append a value to a copy is easy to read.
so I guess I'll just use rejoin for blocks and join for strings :-)
thanks Gabriele
Jerry
20-Oct-2006
[5799x2]
The following code: 

unicode-to-ascii: func [ from to /local fs ts sz] [
    fs: open/binary/direct/read from
    ts: open/binary/direct/write to
    sz: size? from
    fs/1 fs/1 ; discard the first two bytes, FFFE
    for i 3 sz 2 [
        append ts to-char fs/1 
        fs: skip fs 1 ; SKIP is the problem
    ]
    close fs
    close ts
]
unicode-to-ascii %/c/Unicode.txt %/c/Ascii.txt

In REBOL/View 1.2.7.3.1 12-Sep-2006 Core 2.6.0
** CRASH (Should not happen) - Expand series overflow

In REBOL/View 1.3.2.3.1 5-Dec-2005 Core 2.6.3
** Script Error: Not enough memory
** Where: do-body
** Near: fs: skip fs 1
Anton, thanks for the tip on avoiding molding.
Rebolek
20-Oct-2006
[5801]
Jerry: For conversion from/to UTF/UCS... you can use Oldes' unicode 
tools, it handles it very well (unfortunately you have to look around 
AltMe for some link, because Oldes does not upload to rebol.org and 
has his files all around the web - shame on you, Oldes! ;)
Jerry
20-Oct-2006
[5802]
Thank you, Rebolek.
Gregg
20-Oct-2006
[5803]
Depending on how you're going to do the actual diff, there are other 
ways you could work around this. You could try using the /seek refinement 
to read just parts of the files, you could split the files into chunks, 
or you could split them by top-level key (HKLM, HKCU, etc.); assuming 
you can read one full file into memory in order to do that.
Jerry
20-Oct-2006
[5804]
To Gregg,

I tried what you said. But there was a weird situation for the Windows 
Registry in my computer.

If I export these 5 HKEY_??? into 5 files, respectively, the sum 
of their size is 568 MB.

If I export all of them into 1 file. the file size is 316 MB, which 
is much smaller than 568 MB. I don't know why.

So the 5-file version of Registry-Diff in REBOL might use more memory 
if the GC doesn't work well.
Gregg
20-Oct-2006
[5805]
Hmmm. How are you diffing the data?