Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

Deleteing lines from a big file

 [1/5] from: vincenke::ohsu::edu at: 16-Jul-2001 15:23


Greetings, I'm looking for a way to delete problematic lines from very large data files (too large for buffered access). row-to-delete: 25 condition: "00000027" big-file: open/direct/lines %VeryBigDataFile.DAT skip big-file (row-to-delete - 1) line-to-check: first big-file ; confirm that this is the correct line to delete if (find line-to-check condition) <> none [remove line] <------------- Is there some way to do this??? ^^^^^^^ (I'm still creeping along at the beginning of the learning curve!)

 [2/5] from: holger:rebol at: 17-Jul-2001 9:51


On Mon, Jul 16, 2001 at 03:23:40PM -0700, Ken Vincent wrote:
> Greetings, > I'm looking for a way to delete problematic lines from very large data files (too large for buffered access).
<<quoted lines omitted: 4>>
> line-to-check: first big-file ; confirm that this is the correct line to delete > if (find line-to-check condition) <> none [remove line] <------------- Is there some way to do this???
No, because operating systems do not allow this. File data is typically stored consecutively, so you cannot delete something in the middle of a file. The best solution is to copy the file to another file, line by line, in /direct mode, skipping the line you want to delete. Then delete the old file and rename the new one. -- Holger Kruse [holger--rebol--com]

 [3/5] from: max:ordigraphe at: 17-Jul-2001 11:57


HI, theoretically, unless your hard-disk can access UNIVERSE:/VOID and put a few of its sectors in a black hole I dont think so... ;-) one way to do this is to go to the end of the line you which to remove, and byte-by-byte copy the data to the start of the region you which to remove (remember to use at least 1k-20k buffers though, or it will be REALLY slow). But I can forecast the use of the /lines refinement as a hindrance to doing this. just my two cents -Max

 [4/5] from: jelinem1:nationwide at: 17-Jul-2001 13:15


I thought it might be interesting to give another solution that I've seen implemented in practice. We had many flat data files, some very large, others not, of records with variable length. The files were sorted nightly; IIRC sorting aided performance, but wasn't required. They were treated as relational data, and the records were subject to the usual CRUD operations. Due to the performance hit of having to re-copy a file every time a record is updated or deleted, this is what was done: the line-to-delete was located in the file and overwritten with blanks (records added were simply appended onto the end of the file. Updates were a combination of delete/add). These blanks lines would float to the top of the file during the nightly sort, and would be removed at that time. I didn't write the system so I don't know the mechanics of how the lines were removed once they had "floated to the top". Another file copy? The point was that the performance hit of copying the file would occur only once/24hrs, at night: during off-hours, while still "deleting" the data immediately. - Michael Jelinek From: Holger Kruse <[holger--rebol--com]>@rebol.com on 07/17/2001 11:51 AM Please respond to [rebol-list--rebol--com] Sent by: [rebol-bounce--rebol--com] To: [rebol-list--rebol--com] cc: Subject: [REBOL] Re: Deleteing lines from a big file On Mon, Jul 16, 2001 at 03:23:40PM -0700, Ken Vincent wrote:
> Greetings, > > I'm looking for a way to delete problematic lines from very large data
files (too large for buffered access).
> row-to-delete: 25 > condition: "00000027" > > big-file: open/direct/lines %VeryBigDataFile.DAT > skip big-file (row-to-delete - 1) > line-to-check: first big-file ; confirm that this is the correct line
to delete
> if (find line-to-check condition) <> none [remove line] <-------------
Is there some way to do this??? No, because operating systems do not allow this. File data is typically stored consecutively, so you cannot delete something in the middle of a file. The best solution is to copy the file to another file, line by line, in /direct mode, skipping the line you want to delete. Then delete the old file and rename the new one. -- Holger Kruse [holger--rebol--com]

 [5/5] from: max:ordigraphe at: 18-Jul-2001 12:03


Ah... the legendary Holger Kruse is now at RT. cool :-)
> The best > solution is to copy the file to another file, line by line, > in /direct mode, > skipping the line you want to delete. Then delete the old > file and rename the > new one.
Yep, much better than my idea (also easier to handle the buffering thing) you could copy several at a time, increasing speed of copy. -Max

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted