Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

Large Files and Binary Read

 [1/8] from: james::mustard::co::nz at: 20-Oct-2002 22:56


OK, well never mind with the fixes.. I just figured out what /direct was for.. doh.. I never expected that it would be buffering the file. /me feels silly James.

 [2/8] from: james:mustard at: 20-Oct-2002 22:49


Hi folks, I've recently been working with large amounts of GIS data and am looking for a way to read arbitrary windows into the data matrix from rebol. I tried the following (which dies in Win2k and linux with out of memory). ;----------------------------------------------------- rebol [] window-data: copy [] offset-column: 30000 data-length: 1500 for y 10000 10499 1 [ window-data: append window-data read/binary/skip/part %/mnt/hda2/eastern_hemisphere.raw ((y * 64800 ) + offset-column) data-length ] print length? window-data halt ;----------------------------------------------------- this is with the Jun 2001 version of View 1.2.3 or sommit like that. I tried the read parameters in different orders and same error. The annoying thing is that I can write something to skip/scan the datastream in C/C++ quite easily without reading the ENTIRE file into memory. Did I use the wrong rebol command or can rebol not handle 1.5GB+ files without a corresponding amount of ram? I currently and doing this on 512MB DDR with Athlon XP 2000+ chip and 60GB HDD. I will soon be moving to data-windows on 4GB+ files so it is rather important I get this figured out :P thanks, James.

 [3/8] from: gscottjones:mchsi at: 20-Oct-2002 8:27


From: "James Marsden"
> OK, well never mind with the fixes.. I just figured out > what /direct was for.. doh.. I never expected that it > would be buffering the file. > > /me feels silly
Hi, James, Don't feel too silly because next you will likely run into the /direct/skip bug when used on local files. In fact, I thought I would already have seen a complaint. Is it working? --Scott Jones

 [4/8] from: james:mustard at: 21-Oct-2002 9:42


Scott wrote:
> Don't feel too silly because next you will likely run into the
/direct/skip
> bug when used on local files. In fact, I thought I would already have
seen
> a complaint. Is it working?
Hi Scott, Yeah now I found that bug.. grrrr.. it lets me get the first block of data then repeats endlessly. Anyone suggest a fix? Here's my code: ;---------------------------------------------------------------- rebol [] window: layout/size [ backcolor black at 150x50 window-data: image 500x500 edge [color: white size: 1x1] effect [none] ] 800x600 d: #{} offset-column: 0 offset-row: 0 data-length: 500 data-rows: 500 map-width: 21600 * 3 for y 0 (data-rows - 1) 1 [ bp: y * map-width + offset-column + 1 e: read/binary/direct/part/skip %./gameX/eastern_hemisphere3.raw (data-length * 3) bp append d e ] window-data/image: do join "make image! [" [data-length "x" data-rows " " d ] ] view window ;---------------------------------------------------------------------------

 [5/8] from: gscottjones:mchsi at: 20-Oct-2002 17:37


From: "James Marsden"
> Yeah now I found that bug.. grrrr.. it lets me > get the first block of data then repeats endlessly. > > Anyone suggest a fix?
... Hi, James, I hoped you wouldn't be back, which would be good news, but I suspected that you would be back. :( There is no direct working substitute that I am aware of for a true seek (skip in REBOLese). When needing to skip through data while using /direct/binary in combination on a local file, the only thing that I am aware of is to open the file, then "waste" parts of the file as a way to simulate skipping. Given that it is in direct mode, the memory is not being eaten up by an ever expanding buffer. However, you are, in essence, cycling through *all* the data, which may be substantial in file sizes to which you have refered. Also, there is a good chance that the first block of info you got that you thought was correct was probably in fact an incorrect block (it was probably the beginning of the file, even though you used read/direct/binary/skip). So that we "know" what we are dealing with, I greated a very small file with repeating data by column. Here is a ten row matrix with a hex in each column: blk: copy [] loop 10 [repeat n 16 [append blk skip to-hex n - 1 7]] write %//windows/desktop/test.txt rejoin blk Now, when you practice with your actual algorithm, you'll be able to see that you in fact have the correct columns. Now for one of many, many variations to show how to pseudo skip through your data: rows: 10 cols: 16 data-length: 4 start-col: 3 data-slice: copy "" data: open/direct/binary %//windows/desktop/test.txt repeat r rows [ ;skip to proper column copy/part data start-col - 1 ;collect some data append data-slice to-string copy/part data data-length ;skip to end of column copy/part data cols - start-col - data-length + 1 ] close data probe data-slice The most pertinent part is the "throw-away" copy/part statements. The rest was just my arbitrary controls to cycle by rows (hey, it was a quick and dirty hack! :-). For huge row counts but with nominal column counts, I suspect you will actually want to read in a buffered row of data at a time, and then parse the proper column stuff out. This would help to reduce disk access while protecting memory. If the column count and row counts are huge, then I suspect grabbing a sector of disk data at a time would be more efficient, but more work controlling the column access algorithm. Hope this makes some sense. Out of time. Good luck. --Scott Jones

 [6/8] from: james:mustard at: 21-Oct-2002 18:08


Thanks for the tips Scott. I am now seriously considering buying /Pro so that i can do things the easy way and just cheat by calling a C program to mine the data necessary and then just call this into View as an image. It would be a little painful to do data files with 129600x21600 resolution currently in rebol ;-) I could have done a gui in C but Rebol is just soooo much simpler when it comes to testing and fiddling :-) James.

 [7/8] from: gchiu:compkarori at: 21-Oct-2002 21:07


>I am now seriously considering buying /Pro so that i can >do things the >"easy" way and just cheat by calling a C program to mine > >I could have done a gui in C but Rebol is just soooo much >simpler when it >comes to testing and fiddling :-) >
Of course, all of us using View in a commercial environment have purchased it as per the license statements on RT's website :) And, Cindy tells me that they will soon be able to accept Paypal payments which will make it sooo much more easier. Look out also for an SDK for Rebol which will include the ability to encap products, but not sell them, and this will require the ownership of a purchased version of View or Commmand. -- Graham Chiu

 [8/8] from: james:mustard at: 21-Oct-2002 22:07


;-) It seems strange that the read /direct/skip bug hasn't been addressed by RT (I noticed its been about for quite a while - at least a year!) considering the frequency with which random file accessed is used today - especially in the music / video / data scenes. Graham wrote:
> Of course, all of us using View in a commercial > environment have purchased it as per the license > statements on RT's website :)
This is only a home hobby sense - the data is from NASA's global observatory and gets quite large. Our workplace did look at using REBOL/ENCAP but the encap licensing terms were considered over-the-top and the technology was viewed as still beta quality. I personally like rebol but most of my colleagues view it as something akin to a cross between BASIC and PHP - opting instead for VB/C/VC++.
> And, Cindy tells me that they will soon be able to accept > Paypal payments which will make it sooo much more easier.
Hmm.. sounds good, although i'd probably pay by credit card.
> Look out also for an SDK for Rebol which will include the > ability to encap products, but not sell them, and this > will require the ownership of a purchased version of View > or Commmand.
Again, sounds good - although with /Pro and access to the shell/libraries its pretty easy to do most anything. James.