Forum: Too Lazy BBS

Re: case sensitive file test

From Steve Fryatt@news@stevefryatt.org.uk to comp.sys.acorn.programmer on Tue May 26 18:21:19 2020

From Newsgroup: comp.sys.acorn.programmer

On 26 May, Bob Latham wrote in message
<5876c928b4bob@sick-of-spam.invalid>:

I presume this is a means to test a directory listing to make sure an
entry is lower case?

No, it's just a generic "is this string lower case" test. The two SWIs
return pointers to tables of bit flags (so 32 bytes of 8 bits each, for all
256 characters in a RISC OS character set). In alpha_table%, a bit is set if the character is alphabetic; in lower_table%, its set if the character is considered lower case.

You still need OS_GBPB to find the names to test.
--
Steve Fryatt - Leeds, England

http://www.stevefryatt.org.uk/
--- Synchronet 3.21d-Linux NewsLink 1.2

From Bob Latham@bob@sick-of-spam.invalid to comp.sys.acorn.programmer on Tue May 26 19:40:03 2020

From Newsgroup: comp.sys.acorn.programmer

In article <5876cabc53UCEbin@tiscali.co.uk>,
John Williams (News) <UCEbin@tiscali.co.uk> wrote:

In article <5876c7ef57bob@sick-of-spam.invalid>,
Bob Latham <bob@sick-of-spam.invalid> wrote:

I might have expected a flag on the entry to OS_file 17 to say
fixed case but it appears not.

Is not, and has the filer not always been, famously case agnostic?

I can't say it has ever been high on my thoughts so not that famous.

And as a consequence, isn't your expectation above a bit
unreasonable?

"Unreasonable"

Of course yes, how nice of you to point it out.

Bob.

John

--
Bob Latham
Stourbridge, West Midlands
--- Synchronet 3.21d-Linux NewsLink 1.2

From Bob Latham@bob@sick-of-spam.invalid to comp.sys.acorn.programmer on Tue May 26 19:40:53 2020

From Newsgroup: comp.sys.acorn.programmer

In article <4982cb7658.DaveMeUK@BeagleBoard-xM>,
David Higton <dave@davehigton.me.uk> wrote:

In message <5876b6c3c3bob@sick-of-spam.invalid>
Bob Latham <bob@sick-of-spam.invalid> wrote:

But if anyone has a good way to test for a lowercase file name I'd
love to hear it.

RISC OS filing systems are case insensitive. The only way you can
do what you want is to iterate through the filenames, and do
whatever test you want on each filename returned.

Thank you David.

Cheers,

Bob.
--
Bob Latham
Stourbridge, West Midlands
--- Synchronet 3.21d-Linux NewsLink 1.2

From Bob Latham@bob@sick-of-spam.invalid to comp.sys.acorn.programmer on Tue May 26 19:46:08 2020

From Newsgroup: comp.sys.acorn.programmer

In article <mpro.qay87d069uaym02mn.news@stevefryatt.org.uk>,
Steve Fryatt <news@stevefryatt.org.uk> wrote:

On 26 May, Bob Latham wrote in message
<5876c928b4bob@sick-of-spam.invalid>:

I presume this is a means to test a directory listing to make
sure an entry is lower case?

No, it's just a generic "is this string lower case" test. The two
SWIs return pointers to tables of bit flags (so 32 bytes of 8 bits
each, for all 256 characters in a RISC OS character set). In
alpha_table%, a bit is set if the character is alphabetic; in
lower_table%, its set if the character is considered lower case.

You still need OS_GBPB to find the names to test.

Understood, thank you.

Cheers,

Bob.
--
Bob Latham
Stourbridge, West Midlands
--- Synchronet 3.21d-Linux NewsLink 1.2

From Steve Drain@steve@kappa.me.uk to comp.sys.acorn.programmer on Wed May 27 13:14:37 2020

From Newsgroup: comp.sys.acorn.programmer

On 26/05/2020 17:12, Steve Fryatt wrote:

DEF FNis_lower(string$)
LOCAL loop%, char%, byte%, bit%, alpha_table%, case_table%, alpha%, lower%

SYS "Territory_CharacterPropertyTable", -1, 2 TO lower_table%
SYS "Territory_CharacterPropertyTable", -1, 3 TO alpha_table%

FOR loop% = 1 TO LEN(string$)
char% = ASC(MID$(string$, loop%, 1))

byte% = char% DIV 8
bit% = char% MOD 8

alpha% = ((alpha_table%?byte%) AND (1 << bit%)) <> 0
lower% = ((lower_table%?byte%) AND (1 << bit%)) <> 0

IF alpha% AND (NOT lower%) THEN =FALSE
NEXT loop%

=TRUE

Perhaps:

DEF FNis_lower(string$)
LOCAL buff%,upper%,char%
buff%=&8200:REM use input buffer or other block
$buff%=string$
SYS "Territory_UpperCaseTable",-1 TO upper%
FOR char%=buff% TO buff%+LENstring$-1
IF ?char%=upper%??char% THEN =FALSE:REM note ??
NEXT char%
=TRUE

Or, if you want to disentangle it, try:

DEF FNis_lower(string$)
LOCAL upper%,char%
SYS "Territory_UpperCaseTable",-1 TO upper%
FOR char%=&8100 TO &8100+LENstring$-1
IF ?char%=upper%??char% THEN =FALSE:REM note ??
NEXT char%
=TRUE

;-)
--- Synchronet 3.21d-Linux NewsLink 1.2

From jgh@jgh@mdfs.net to comp.sys.acorn.programmer on Wed May 27 16:25:40 2020

From Newsgroup: comp.sys.acorn.programmer

Or, if you want to disentangle it, try:

DEF FNis_lower($&8100)
LOCAL upper%,char%
SYS "Territory_UpperCaseTable",-1 TO upper%
char%=&8100-1
REPEAT
char%=char%+1
UNTIL ?char%=upper%??char% OR ?char%=13
=?char%=13
--- Synchronet 3.21d-Linux NewsLink 1.2

From Steve Drain@steve@kappa.me.uk to comp.sys.acorn.programmer on Thu May 28 14:16:37 2020

From Newsgroup: comp.sys.acorn.programmer

On 28/05/2020 00:25, jgh@mdfs.net wrote:

Or, if you want to disentangle it, try:

DEF FNis_lower($&8100)
LOCAL upper%,char%
SYS "Territory_UpperCaseTable",-1 TO upper%
char%=&8100-1
REPEAT
char%=char%+1
UNTIL ?char%=upper%??char% OR ?char%=13
=?char%=13

There are many ways to skin this cat and speed is hardly important these
days, but I think an early exit from the loop on first failure is
worthwhile. It certainly would be with a long string.

BTW my trick of using the string accumulator (&8100) works because the LENstring function put the string in there. It is only safe until the
next string keyword and I would never actually use it.
--- Synchronet 3.21d-Linux NewsLink 1.2

From Erik G@noreply123@xs4all.nl to comp.sys.acorn.programmer on Mon Jun 1 03:19:56 2020

From Newsgroup: comp.sys.acorn.programmer

A general afterthought about the efficiency (speed wise) of searching
a directory tree.

On 26/05/2020 13:46, Bob Latham wrote:

Can someone tell me what is the best (speed wise) method of testing
for a specific file but importantly the name in lower case.

I have a recursive program running which scans my music library. I
want it to specifically test each album for the existence of a file 'folder/jpg' but to fail anything with a different case like
'Folder/jpg'.

OS_File 17 does not appear to be case sensitive.

The only way I can see is to read the contents of the directory using
OS_GBPB 9 and wildcards and then test the characters for lower case.

I'm thinking that may be a little slow when doing thousands and I'm
also struggling to make it work anyway. on a short test run it fails
7 out of 10 albums and all albums had folder.jpg in them.

(NOTE: it has been a long time since I studied the internals of
ADFS. Specific efficiency details of SWI calls such as OS_FILE and
OS_GBPB will have significant effect on the real runtime of any
such program. Read documentation and experiment to find the best
solution)

== In short, the thing I want to impress on all programmers is this:

To make any algorithm involving disk I/O fast, the focus needs to be
on:
- Making as few reads as possible
- Reading as much data in one operation as possible

Also:
- Don't spend much effort optimising the processing of the data by
the CPU, as the disk I/O will dominate the time the algorithm takes
to complete.

This example case of searching through a directory tree involves
reading several (or a lot) of directories and processing the
information with a program.
By far the most time-consuming part of this is the physical reading
of the information from a disk.
Reading one block of data requires:
1) moving the disk head to the correct track
2) waiting for the disk to rotate to the sector that contains the block
3) reading the magnetic information from the disk and transferring it
to memory.

Of these, steps 1 and 2 take up the most time, in the order of
milliseconds.

By comparison, you can do tons of CPU processing in a few milliseconds.

Note that reading several blocks in a row on the same track
returns more data, but only requires one head move (step 1) and one
wait (step 2).
Also note that continuing to read from the next track only needs
a very short (and thus quick) head move, while the wait time can be
practically eliminated by organising the disk in such a way that the
next block to read on this next track shows up just as the head has
settled in its new position.

So in the case of traversing a directory structure, it would be much
more efficient to read an entire directory on one go and then
process the data in memory (e.g. searching for a file that matches
a certain name or pattern), than it would be to ask for the first
directory entry, process it, then ask for the second entry, process
it, etcetera.

My advice for this particular program is to find the best combination
of SWI calls to get a good I/O performance.

In a more general sense it is a lot more efficient to read one big file
with all the data in it rather than have that data spread over lots
of small files. (For example: the game Kerbal Space Program used to
have every detail of the game in a separate file, taking up tens of
thousands of files.
It took several minutes to load. In recent versions many of
those files have been combined into a smaller number of bigger files,
and now the program loads in under a minute.)

And finally: developers of filing systems have worked for decades to
optimise the finding, reading, writing, extending and deletion of files,
using every trick in the book and inventing new ones, because disk I/O
is one of the major bottlenecks in the speed at which programs run.
--
Erik G.
From address is fake
See http://erikgrnh.home.xs4all.nl/
--- Synchronet 3.21d-Linux NewsLink 1.2

From Bob Latham@bob@sick-of-spam.invalid to comp.sys.acorn.programmer on Mon Jun 1 15:53:53 2020

From Newsgroup: comp.sys.acorn.programmer

In article <5ed457bc$0$1436$e4fe514c@newszilla.xs4all.nl>,
Erik G <noreply123@xs4all.nl> wrote:

And finally: developers of filing systems have worked for decades
to optimise the finding, reading, writing, extending and deletion
of files, using every trick in the book and inventing new ones,
because disk I/O is one of the major bottlenecks in the speed at
which programs run.

Thank you for an interesting read.

In my case I'm checking for various things in a music library stored
on a Synology DS214+. My program written in assembler, uses Lanman98
to access the NAS which was quite a bit faster than moonfish.

The program examines every album and checks for images, file types,
and tags. On flac albums (all 3390 of them), for every track on every
album the file is opened and the tagging checked and then the file is
closed again.

The program then gives a report on any none conformity to various
parameters set.

It varies slightly from run to run but it takes about 14 minutes and
20 seconds to complete. I'm impressed with the speed.

Thanks again.

Bob.
--
Bob Latham
Stourbridge, West Midlands
--- Synchronet 3.21d-Linux NewsLink 1.2

From druck@news@druck.org.uk to comp.sys.acorn.programmer on Mon Jun 1 20:01:57 2020

From Newsgroup: comp.sys.acorn.programmer

On 28/05/2020 14:16, Steve Drain wrote:

There are many ways to skin this cat and speed is hardly important these days,

It can be if you hit a directory on a file server with many thousand
entries - it certainly lets you know who OS_GBPB's one entry at a time,
and who uses a decent sized buffer!

---druck
--- Synchronet 3.21d-Linux NewsLink 1.2

From druck@news@druck.org.uk to comp.sys.acorn.programmer on Mon Jun 1 20:57:46 2020

From Newsgroup: comp.sys.acorn.programmer

On 01/06/2020 02:19, Erik G wrote:

And finally: developers of filing systems have worked for decades to
optimise the finding, reading, writing, extending and deletion of files, using every trick in the book and inventing new ones, because disk I/O
is one of the major bottlenecks in the speed at which programs run.

Unfortunately except on RISC OS, where no use is made of free memory to
cache filing system operations, as just about every other common OS does.

The closest RISC OS comes is some fixed size buffering an ADFS, which
often resulted in the Risc PC's slow motherboard IDE interface
outperforming much better 3rd party IDE hardware using IDEFS variants
with no caching.

---druck

--- Synchronet 3.21d-Linux NewsLink 1.2

From jgh@jgh@mdfs.net to comp.sys.acorn.programmer on Thu Jun 4 09:23:40 2020

From Newsgroup: comp.sys.acorn.programmer

Similarly, if there's some I/O information that won't change over the
run of a program, read it once into a variable, then access the variable.
For example:
size%=EXT#inputfile then use size% instead of EXT#
If your program is never going to change screen mode:
SYS whatever TO xsz%,ysz%,etc then use xsz% and ysz%

etc.
--- Synchronet 3.21d-Linux NewsLink 1.2

From Martin@News03@avisoft.f9.co.uk to comp.sys.acorn.programmer on Thu Jun 4 17:51:27 2020

From Newsgroup: comp.sys.acorn.programmer

On 04 Jun in article
<a248d019-7c38-439a-8d5f-62d6d817a285@googlegroups.com>,
<jgh@mdfs.net> wrote:

Similarly, if there's some I/O information that won't change over
the run of a program, read it once into a variable, then access the
variable.

For example:
size%=EXT#inputfile then use size% instead of EXT#

Excellent advice, in general ... but this example ...

If your program is never going to change screen mode:
SYS whatever TO xsz%,ysz%,etc then use xsz% and ysz%

is a bad one, because if it is a Wimp program the mode is usually
changed outside your program, so ModeChange messages have to be
watched for and the relevant variables read again.
--
Martin Avison
Note that unfortunately this email address will become invalid
without notice if (when) any spam is received.
--- Synchronet 3.21d-Linux NewsLink 1.2

From druck@news@druck.org.uk to comp.sys.acorn.programmer on Thu Jun 4 20:49:53 2020

From Newsgroup: comp.sys.acorn.programmer

On 04/06/2020 17:23, jgh@mdfs.net wrote:

Similarly, if there's some I/O information that won't change over the
run of a program, read it once into a variable, then access the variable.
For example: > size%=EXT#inputfile then use size% instead of EXT#

Sorry, that's bad advice, a program should always assume filing system
data may be altered by other processes.

1) Obviously if its a Wimp application, other tasks are running
2) If the single tasking program can be run a in taskwindow or graphic
taskwindow, other tasks are running
3) If the file is on a remote filing system, other machines may alter it
4) If the file is on a local filing system which is shared, other
machines may alter it.

So only if you are outside the desktop, and storage is on a local non
shared disc, can you be sure it wont be altered by anything else.

If your program is never going to change screen mode:
SYS whatever TO xsz%,ysz%,etc then use xsz% and ysz%

Only if its running outside the desktop. Inside the desktop the mode can change, so you need to ensure you handle the mode change message an
re-read any mode related parameters you are using.

---druck

--- Synchronet 3.21d-Linux NewsLink 1.2

From jgh@jgh@mdfs.net to comp.sys.acorn.programmer on Thu Jun 4 16:18:24 2020

From Newsgroup: comp.sys.acorn.programmer

On Thursday, 4 June 2020 20:49:57 UTC+1, druck wrote:

For example: size%=EXT#inputfile then use size% instead of EXT#

Sorry, that's bad advice, a program should always assume filing system
data may be altered by other processes.

If it's open for input, other processes *can't* alter it.
Read By Many, Write By One.

If your program is never going to change screen mode:
SYS whatever TO xsz%,ysz%,etc then use xsz% and ysz%

Only if its running outside the desktop. Inside the desktop the mode can change, so you need to ensure you handle the mode change message an
re-read any mode related parameters you are using.

Which is why I wrote 'your program is never going to change screen
mode'. Maybe it should have been 'where the screen mode is never
going to be changed during the execution of the program'. Such as
a command line tool or a single-taking application.
--- Synchronet 3.21d-Linux NewsLink 1.2

From druck@news@druck.org.uk to comp.sys.acorn.programmer on Fri Jun 5 11:27:33 2020

From Newsgroup: comp.sys.acorn.programmer

On 05/06/2020 00:18, jgh@mdfs.net wrote:

On Thursday, 4 June 2020 20:49:57 UTC+1, druck wrote:

For example: size%=EXT#inputfile then use size% instead of EXT#

Sorry, that's bad advice, a program should always assume filing system
data may be altered by other processes.

If it's open for input, other processes *can't* alter it.
Read By Many, Write By One.

It's down to the implementation of the filing system to whether that is
true. Local filing systems will tend to lock on write, remote ones will
tend not to. It's a bit of a mine field!

---druck
--- Synchronet 3.21d-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Hannibal
  Fri Jul 3 01:51:09 2026
  from Des Moines via Telnet
- Geek2
  Thu Jul 2 11:41:05 2026
  from Euclid, Oh via Telnet
- Hannibal
  Thu Jul 2 05:49:27 2026
  from Des Moines via SSH
- Geek2
  Wed Jul 1 16:31:20 2026
  from Euclid, Oh via Telnet

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	70
Nodes:	6 (0 / 6)
Uptime:	03:38:55
Calls:	949
Calls today:	1
Files:	1,325
Messages:	281,242

Re: case sensitive file test

Who's Online

Recent Visitors

System Info