Reading directories faster.
[1/7] from: reboler::programmer::net at: 8-Oct-2002 14:50
I am using the code below to read and sort the current directory contents.
It is _terribly_ slow on large directories (> 1000 files/directories)
either none? df: attempt [read %./][
"Error reading Directory"
][
foreach file df [insert tail either (dir? file)[dirs][files] file]
]
Any ideas on how to speed it up?
--
__________________________________________________________
Sign-up for your own FREE Personalized E-mail at Mail.com
http://www.mail.com/?sr=signup
Free price comparison tool gives you the best prices and cash back!
http://www.bestbuyfinder.com/download.htm
[2/7] from: petr:krenzelok:trz:cz at: 8-Oct-2002 22:21
alan parman wrote:
>I am using the code below to read and sort the current directory contents.
>It is _terribly_ slow on large directories (> 1000 files/directories)
<<quoted lines omitted: 3>>
> foreach file df [insert tail either (dir? file)[dirs][files] file]
>]
I am not sure you can. some time ago I requested RT to add ability to
load/dirs load/files refinements or something like that, but even ppl
here argued rebol is fast enough, which is not true on larger and
especially network drives - quite unusable. But it was some two years ago.
Maybe you could look at new native called 'remove-each? For further
description go here:
http://www.reboltech.com/downloads/changes.html#sect4.4.
There is quite the example you are looking for - but if you want to get
list of files and directories - you will have to copy original block,
and then run it two times - once for files, second time for dirs ..
Let us know if the result is faster now ...
-pekr-
[3/7] from: greggirwin:mindspring at: 8-Oct-2002 14:27
Hi Alan,
<<
I am using the code below to read and sort the current directory contents.
It is _terribly_ slow on large directories (> 1000 files/directories)
either none? df: attempt [read %./][
"Error reading Directory"
][
foreach file df [insert tail either (dir? file)[dirs][files] file]
]
>>
DIR? does a bit of work to find out if a file has the directory flag set for
it (look at the source for it). Since REBOL is very consistent about how it
forms directory names, you can cheat a bit and avoid the extra disk hits for
every file by doing something like this:
either none? df: attempt [read %.][
"Error reading Directory"
][
foreach file df [
insert tail either (#"/" = last file)[dirs][files] file
]
]
--Gregg
[4/7] from: brett:codeconscious at: 9-Oct-2002 9:31
Hi Alan,
> either none? df: attempt [read %./][
> "Error reading Directory"
> ][
> foreach file df [insert tail either (dir? file)[dirs][files] file]
> ]
>
> Any ideas on how to speed it up?
The dir? function is based on the info? function. info? queries the target
system to get the attributes of the target.
But the results of read %./ already contains enough information about
whether an item is a file or a directory - just look for a slash #"/" at the
end of the name. So by using dir? you end up making an unnecessary call to
the file system for every file.
So try changing the code to:
either none? df: attempt [read %./][
"Error reading Directory"
][
foreach file df [insert tail either #"/" = last file [dirs][files]
file]
]
On my system your code took 9 - 10 seconds. After the change, approximately
0.05 of a second.
Also, don't forgot to preallocate the dirs and the files block! or list! or
whatever you are using to be a "reasonable size".
For example,
files: make block! 2000
This avoids a lot of the time taken to automatically expand the block as it
is being filled.
Regards,
Brett.
[5/7] from: atruter:hih:au at: 9-Oct-2002 9:32
> I am using the code below to read and sort the current directory
contents.
If you only need files or dirs the following may help:
read-dir: func [path /files /dirs] [
either dirs [
sort remove-each dir read path [#"/" <> last dir]
][
sort remove-each dir read path [#"/" = last dir]
]
]
Regards,
Ashley
[6/7] from: atruter:hih:au at: 9-Oct-2002 10:03
> Any ideas on how to speed it up?
Alternatively, how about the following from left field? (not tested ;) )
sort/compare read %. func [a b] [
either (last a) = (last b) [
a < b
][
(last a) < (last b)
]
]
Regards,
Ashley
[7/7] from: reboler:programmer at: 10-Oct-2002 12:41
Re: Reading directories faster
Thanks all!
It _was_ the 'dir? portion that was slowing things down.
'dir? uses 'info?, which uses 'query, so, as mentioned, there was a disk read on every
file.
I am currently using ...
either none? df: attempt [read %./][
tell directory "Error reading Directory"
][
forall df [insert either #"/" = last df/1 [dirs][files] df/1]
; either dir? df/1 ; old method
]
This is supremely superior to the old 'dir? method!
I do pre-set the size of the dirs & files blocks, but before I go to a new directory,
I 'clear them rather than resetting them ("clear dirs" instead of "dirs: make block!
16"). Then they do have to grow dynamically, but only when I go to a larger directory.
This way they never grow larger than needed for the largest directory I visit - no wasted
memory allocation.
Ashley, your "left field" sort works fine. It puts the directories at the head of the
output,
but I need to have them separated from the files. This is also a neat way to group files
by extension!
probe sort/compare read %. func [a b] [
either (last a) = (last b) [
a < b
][
(last a) < (last b)
]
]
--
__________________________________________________________
Sign-up for your own FREE Personalized E-mail at Mail.com
http://www.mail.com/?sr=signup
Free price comparison tool gives you the best prices and cash back!
http://www.bestbuyfinder.com/download.htm
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted