Deleteing lines from a big file
[1/5] from: vincenke::ohsu::edu at: 16-Jul-2001 15:23
Greetings,
I'm looking for a way to delete problematic lines from very large data files (too large
for buffered access).
row-to-delete: 25
condition: "00000027"
big-file: open/direct/lines %VeryBigDataFile.DAT
skip big-file (row-to-delete - 1)
line-to-check: first big-file ; confirm that this is the correct line to delete
if (find line-to-check condition) <> none [remove line] <------------- Is there
some way to do this???
^^^^^^^
(I'm still creeping along at the beginning of the learning curve!)
[2/5] from: holger:rebol at: 17-Jul-2001 9:51
On Mon, Jul 16, 2001 at 03:23:40PM -0700, Ken Vincent wrote:
> Greetings,
> I'm looking for a way to delete problematic lines from very large data files (too large
for buffered access).
<<quoted lines omitted: 4>>
> line-to-check: first big-file ; confirm that this is the correct line to delete
> if (find line-to-check condition) <> none [remove line] <------------- Is there
some way to do this???
No, because operating systems do not allow this. File data is typically stored
consecutively, so you cannot delete something in the middle of a file. The best
solution is to copy the file to another file, line by line, in /direct mode,
skipping the line you want to delete. Then delete the old file and rename the
new one.
--
Holger Kruse
[holger--rebol--com]
[3/5] from: max:ordigraphe at: 17-Jul-2001 11:57
HI,
theoretically, unless your hard-disk can access UNIVERSE:/VOID and put a
few of its sectors in a black hole I dont think so... ;-)
one way to do this is to go to the end of the line you which to remove,
and byte-by-byte copy the data to the start of the region you which to
remove (remember to use at least 1k-20k buffers though, or it will be
REALLY slow). But I can forecast the use of the /lines refinement as a
hindrance to doing this.
just my two cents
-Max
[4/5] from: jelinem1:nationwide at: 17-Jul-2001 13:15
I thought it might be interesting to give another solution that I've seen
implemented in practice.
We had many flat data files, some very large, others not, of records with
variable length. The files were sorted nightly; IIRC sorting aided
performance, but wasn't required. They were treated as relational data,
and the records were subject to the usual CRUD operations. Due to the
performance hit of having to re-copy a file every time a record is updated
or deleted, this is what was done: the line-to-delete was located in the
file and overwritten with blanks (records added were simply appended onto
the end of the file. Updates were a combination of delete/add). These
blanks lines would float to the top of the file during the nightly sort,
and would be removed at that time.
I didn't write the system so I don't know the mechanics of how the lines
were removed once they had "floated to the top". Another file copy?
The point was that the performance hit of copying the file would occur
only once/24hrs, at night: during off-hours, while still "deleting" the
data immediately.
- Michael Jelinek
From: Holger Kruse <[holger--rebol--com]>@rebol.com on 07/17/2001 11:51 AM
Please respond to [rebol-list--rebol--com]
Sent by: [rebol-bounce--rebol--com]
To: [rebol-list--rebol--com]
cc:
Subject: [REBOL] Re: Deleteing lines from a big file
On Mon, Jul 16, 2001 at 03:23:40PM -0700, Ken Vincent wrote:
> Greetings,
>
> I'm looking for a way to delete problematic lines from very large data
files (too large for buffered access).
> row-to-delete: 25
> condition: "00000027"
>
> big-file: open/direct/lines %VeryBigDataFile.DAT
> skip big-file (row-to-delete - 1)
> line-to-check: first big-file ; confirm that this is the correct line
to delete
> if (find line-to-check condition) <> none [remove line] <-------------
Is there some way to do this???
No, because operating systems do not allow this. File data is typically
stored
consecutively, so you cannot delete something in the middle of a file. The
best
solution is to copy the file to another file, line by line, in /direct
mode,
skipping the line you want to delete. Then delete the old file and rename
the
new one.
--
Holger Kruse
[holger--rebol--com]
[5/5] from: max:ordigraphe at: 18-Jul-2001 12:03
Ah... the legendary Holger Kruse is now at RT. cool :-)
> The best
> solution is to copy the file to another file, line by line,
> in /direct mode,
> skipping the line you want to delete. Then delete the old
> file and rename the
> new one.
Yep, much better than my idea (also easier to handle the buffering
thing) you could copy several at a time, increasing speed of copy.
-Max
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted