unbuffered file reads (large files)
[1/3] from: tim-johnsons::web::com at: 23-Oct-2007 10:46
Hello: I'm processing some large text files - 100,000 lines or more. It would seem to
me that using 'open with the 'direct refinement would be the answer, but I'm seeing 1)buffering
2)problems terminating the read loop. What follows is a test rebol file and a little
text file to test. I've made this to run as CGI, so that the port dump is a little more
readable. I have further comments at the end. ;; rebol file - can run from command line
or as CGI #!/usr/bin/rebol -cs REBOL[] print "Content-Type: text/html^/" print <pre>
print "Read file with cache" inf: open/lines %test.txt while[not tail? inf][ print first
inf inf: next inf ] close inf ;; works fine, but is buffered print "Read file without
cache" inf: open/direct/lines %test.txt ;; help open says 'direct should be unbuffered
;; tail test fails immediately ;while[not tail? inf][ ; print first inf ; inf: next inf
; ] ;; use truth test of inf while[inf][ ?? inf ;; look at the 'state members, expecially
'inBuffer print first inf inf: next inf ] close inf ;; here is the input text file line
one line two line three line four line five ;; comments follow: 1)the tail test fails
in direct mode 2)the truth test for 'inf is not helpful either. 3)It looks to me like
direct mode *is* buffered after the first read 4)The termination test could be something
like if all[string? inf/states/inBuffer empty? inf/states/inbuffer][break] 5)But we still
have buffered input right? What do you all think? Thanks
[2/3] from: sqlab:gmx at: 23-Oct-2007 21:29
If you open bigger files, you will see, that not the whole file is read ahead.
The termination can look like this
inf: open/direct/lines %file
while [line: pick inf 1] [probe line]
-------- Original-Nachricht --------
> Datum: Tue, 23 Oct 2007 10:46:14 -0800
> Von: Tim Johnson <tim-johnsons-web.com>
<<quoted lines omitted: 55>>
> To unsubscribe from the list, just send an email to
> lists at rebol.com with unsubscribe as the subject.
--
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kanns mit allen: http://www.gmx.net/de/go/multimessenger
[3/3] from: tim-johnsons::web::com at: 23-Oct-2007 11:55
On Tuesday 23 October 2007, Anton Reisacher wrote:
> If you open bigger files, you will see, that not the whole file is read
> ahead.
Aha!
> The termination can look like this
>
> inf: open/direct/lines %file
> while [line: pick inf 1] [probe line]
Understood.
thanks Anton.
That's a big help.
Tim
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted