From Newsgroup: comp.sys.acorn.programmer
A general afterthought about the efficiency (speed wise) of searching
a directory tree.
On 26/05/2020 13:46, Bob Latham wrote:
Can someone tell me what is the best (speed wise) method of testing
for a specific file but importantly the name in lower case.
I have a recursive program running which scans my music library. I
want it to specifically test each album for the existence of a file 'folder/jpg' but to fail anything with a different case like
'Folder/jpg'.
OS_File 17 does not appear to be case sensitive.
The only way I can see is to read the contents of the directory using
OS_GBPB 9 and wildcards and then test the characters for lower case.
I'm thinking that may be a little slow when doing thousands and I'm
also struggling to make it work anyway. on a short test run it fails
7 out of 10 albums and all albums had folder.jpg in them.
(NOTE: it has been a long time since I studied the internals of
ADFS. Specific efficiency details of SWI calls such as OS_FILE and
OS_GBPB will have significant effect on the real runtime of any
such program. Read documentation and experiment to find the best
solution)
== In short, the thing I want to impress on all programmers is this:
To make any algorithm involving disk I/O fast, the focus needs to be
on:
- Making as few reads as possible
- Reading as much data in one operation as possible
Also:
- Don't spend much effort optimising the processing of the data by
the CPU, as the disk I/O will dominate the time the algorithm takes
to complete.
This example case of searching through a directory tree involves
reading several (or a lot) of directories and processing the
information with a program.
By far the most time-consuming part of this is the physical reading
of the information from a disk.
Reading one block of data requires:
1) moving the disk head to the correct track
2) waiting for the disk to rotate to the sector that contains the block
3) reading the magnetic information from the disk and transferring it
to memory.
Of these, steps 1 and 2 take up the most time, in the order of
milliseconds.
By comparison, you can do tons of CPU processing in a few milliseconds.
Note that reading several blocks in a row on the same track
returns more data, but only requires one head move (step 1) and one
wait (step 2).
Also note that continuing to read from the next track only needs
a very short (and thus quick) head move, while the wait time can be
practically eliminated by organising the disk in such a way that the
next block to read on this next track shows up just as the head has
settled in its new position.
So in the case of traversing a directory structure, it would be much
more efficient to read an entire directory on one go and then
process the data in memory (e.g. searching for a file that matches
a certain name or pattern), than it would be to ask for the first
directory entry, process it, then ask for the second entry, process
it, etcetera.
My advice for this particular program is to find the best combination
of SWI calls to get a good I/O performance.
In a more general sense it is a lot more efficient to read one big file
with all the data in it rather than have that data spread over lots
of small files. (For example: the game Kerbal Space Program used to
have every detail of the game in a separate file, taking up tens of
thousands of files.
It took several minutes to load. In recent versions many of
those files have been combined into a smaller number of bigger files,
and now the program loads in under a minute.)
And finally: developers of filing systems have worked for decades to
optimise the finding, reading, writing, extending and deletion of files,
using every trick in the book and inventing new ones, because disk I/O
is one of the major bottlenecks in the speed at which programs run.
--
Erik G.
From address is fake
See
http://erikgrnh.home.xs4all.nl/
--- Synchronet 3.21d-Linux NewsLink 1.2