Large Files and Binary Read
[1/8] from: james::mustard::co::nz at: 20-Oct-2002 22:56
OK, well never mind with the fixes.. I just figured out what /direct was
for.. doh.. I never expected that it would be buffering the file.
/me feels silly
James.
[2/8] from: james:mustard at: 20-Oct-2002 22:49
Hi folks,
I've recently been working with large amounts of GIS data and am looking
for a way to read arbitrary windows into the data matrix from rebol.
I tried the following (which dies in Win2k and linux with out of
memory).
;-----------------------------------------------------
rebol []
window-data: copy []
offset-column: 30000
data-length: 1500
for y 10000 10499 1 [
window-data: append window-data read/binary/skip/part
%/mnt/hda2/eastern_hemisphere.raw ((y * 64800 ) + offset-column)
data-length
]
print length? window-data
halt
;-----------------------------------------------------
this is with the Jun 2001 version of View 1.2.3 or sommit like that.
I tried the read parameters in different orders and same error. The
annoying thing is that I can write something to skip/scan the datastream
in C/C++ quite easily without reading the ENTIRE file into memory. Did
I use the wrong rebol command or can rebol not handle 1.5GB+ files
without a corresponding amount of ram?
I currently and doing this on 512MB DDR with Athlon XP 2000+ chip and
60GB HDD. I will soon be moving to data-windows on 4GB+ files so it is
rather important I get this figured out :P
thanks,
James.
[3/8] from: gscottjones:mchsi at: 20-Oct-2002 8:27
From: "James Marsden"
> OK, well never mind with the fixes.. I just figured out
> what /direct was for.. doh.. I never expected that it
> would be buffering the file.
>
> /me feels silly
Hi, James,
Don't feel too silly because next you will likely run into the /direct/skip
bug when used on local files. In fact, I thought I would already have seen
a complaint. Is it working?
--Scott Jones
[4/8] from: james:mustard at: 21-Oct-2002 9:42
Scott wrote:
> Don't feel too silly because next you will likely run into the
/direct/skip
> bug when used on local files. In fact, I thought I would already have
seen
> a complaint. Is it working?
Hi Scott,
Yeah now I found that bug.. grrrr.. it lets me get the first block of data
then repeats endlessly.
Anyone suggest a fix?
Here's my code:
;----------------------------------------------------------------
rebol []
window: layout/size [
backcolor black
at 150x50 window-data: image 500x500 edge [color: white size: 1x1] effect
[none]
] 800x600
d: #{}
offset-column: 0
offset-row: 0
data-length: 500
data-rows: 500
map-width: 21600 * 3
for y 0 (data-rows - 1) 1 [
bp: y * map-width + offset-column + 1
e: read/binary/direct/part/skip %./gameX/eastern_hemisphere3.raw
(data-length * 3) bp
append d e
]
window-data/image: do join "make image! [" [data-length "x" data-rows " " d
]
]
view window
;---------------------------------------------------------------------------
[5/8] from: gscottjones:mchsi at: 20-Oct-2002 17:37
From: "James Marsden"
> Yeah now I found that bug.. grrrr.. it lets me
> get the first block of data then repeats endlessly.
>
> Anyone suggest a fix?
...
Hi, James,
I hoped you wouldn't be back, which would be good news, but I suspected that
you would be back.
:(
There is no direct working substitute that I am aware of for a true seek
(skip in REBOLese). When needing to skip through data while using
/direct/binary in combination on a local file, the only thing that I am
aware of is to open the file, then "waste" parts of the file as a way to
simulate skipping. Given that it is in direct mode, the memory is not being
eaten up by an ever expanding buffer. However, you are, in essence, cycling
through *all* the data, which may be substantial in file sizes to which you
have refered.
Also, there is a good chance that the first block of info you got that you
thought was correct was probably in fact an incorrect block (it was probably
the beginning of the file, even though you used read/direct/binary/skip).
So that we "know" what we are dealing with, I greated a very small file with
repeating data by column. Here is a ten row matrix with a hex in each
column:
blk: copy []
loop 10 [repeat n 16 [append blk skip to-hex n - 1 7]]
write %//windows/desktop/test.txt rejoin blk
Now, when you practice with your actual algorithm, you'll be able to see
that you in fact have the correct columns. Now for one of many, many
variations to show how to pseudo skip through your data:
rows: 10
cols: 16
data-length: 4
start-col: 3
data-slice: copy ""
data: open/direct/binary %//windows/desktop/test.txt
repeat r rows [
;skip to proper column
copy/part data start-col - 1
;collect some data
append data-slice to-string copy/part data data-length
;skip to end of column
copy/part data cols - start-col - data-length + 1
]
close data
probe data-slice
The most pertinent part is the "throw-away" copy/part statements. The rest
was just my arbitrary controls to cycle by rows (hey, it was a quick and
dirty hack! :-).
For huge row counts but with nominal column counts, I suspect you will
actually want to read in a buffered row of data at a time, and then parse
the proper column stuff out. This would help to reduce disk access while
protecting memory.
If the column count and row counts are huge, then I suspect grabbing a
sector of disk data at a time would be more efficient, but more work
controlling the column access algorithm.
Hope this makes some sense. Out of time. Good luck.
--Scott Jones
[6/8] from: james:mustard at: 21-Oct-2002 18:08
Thanks for the tips Scott.
I am now seriously considering buying /Pro so that i can do things the
easy
way and just cheat by calling a C program to mine the data necessary
and then just call this into View as an image. It would be a little painful
to do data files with 129600x21600 resolution currently in rebol ;-)
I could have done a gui in C but Rebol is just soooo much simpler when it
comes to testing and fiddling :-)
James.
[7/8] from: gchiu:compkarori at: 21-Oct-2002 21:07
>I am now seriously considering buying /Pro so that i can
>do things the
>"easy" way and just cheat by calling a C program to mine
>
>I could have done a gui in C but Rebol is just soooo much
>simpler when it
>comes to testing and fiddling :-)
>
Of course, all of us using View in a commercial
environment have purchased it as per the license
statements on RT's website :)
And, Cindy tells me that they will soon be able to accept
Paypal payments which will make it sooo much more easier.
Look out also for an SDK for Rebol which will include the
ability to encap products, but not sell them, and this
will require the ownership of a purchased version of View
or Commmand.
--
Graham Chiu
[8/8] from: james:mustard at: 21-Oct-2002 22:07
;-)
It seems strange that the read /direct/skip bug hasn't been addressed by RT
(I noticed its been about for quite a while - at least a year!) considering
the frequency with which random file accessed is used today - especially in
the music / video / data scenes.
Graham wrote:
> Of course, all of us using View in a commercial
> environment have purchased it as per the license
> statements on RT's website :)
This is only a home hobby sense - the data is from NASA's global observatory
and gets quite large.
Our workplace did look at using REBOL/ENCAP but the encap licensing terms
were considered over-the-top and the technology was viewed as still beta
quality. I personally like rebol but most of my colleagues view it as
something akin to a cross between BASIC and PHP - opting instead for
VB/C/VC++.
> And, Cindy tells me that they will soon be able to accept
> Paypal payments which will make it sooo much more easier.
Hmm.. sounds good, although i'd probably pay by credit card.
> Look out also for an SDK for Rebol which will include the
> ability to encap products, but not sell them, and this
> will require the ownership of a purchased version of View
> or Commmand.
Again, sounds good - although with /Pro and access to the shell/libraries
its pretty easy to do most anything.
James.