• Re: case sensitive file test

    From Steve Fryatt@news@stevefryatt.org.uk to comp.sys.acorn.programmer on Tue May 26 18:21:19 2020
    From Newsgroup: comp.sys.acorn.programmer

    On 26 May, Bob Latham wrote in message
    <5876c928b4bob@sick-of-spam.invalid>:

    I presume this is a means to test a directory listing to make sure an
    entry is lower case?

    No, it's just a generic "is this string lower case" test. The two SWIs
    return pointers to tables of bit flags (so 32 bytes of 8 bits each, for all
    256 characters in a RISC OS character set). In alpha_table%, a bit is set if the character is alphabetic; in lower_table%, its set if the character is considered lower case.

    You still need OS_GBPB to find the names to test.
    --
    Steve Fryatt - Leeds, England

    http://www.stevefryatt.org.uk/
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Bob Latham@bob@sick-of-spam.invalid to comp.sys.acorn.programmer on Tue May 26 19:40:03 2020
    From Newsgroup: comp.sys.acorn.programmer

    In article <5876cabc53UCEbin@tiscali.co.uk>,
    John Williams (News) <UCEbin@tiscali.co.uk> wrote:
    In article <5876c7ef57bob@sick-of-spam.invalid>,
    Bob Latham <bob@sick-of-spam.invalid> wrote:

    I might have expected a flag on the entry to OS_file 17 to say
    fixed case but it appears not.

    Is not, and has the filer not always been, famously case agnostic?

    I can't say it has ever been high on my thoughts so not that famous.

    And as a consequence, isn't your expectation above a bit
    unreasonable?

    "Unreasonable"

    Of course yes, how nice of you to point it out.

    Bob.


    John
    --
    Bob Latham
    Stourbridge, West Midlands
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Bob Latham@bob@sick-of-spam.invalid to comp.sys.acorn.programmer on Tue May 26 19:40:53 2020
    From Newsgroup: comp.sys.acorn.programmer

    In article <4982cb7658.DaveMeUK@BeagleBoard-xM>,
    David Higton <dave@davehigton.me.uk> wrote:
    In message <5876b6c3c3bob@sick-of-spam.invalid>
    Bob Latham <bob@sick-of-spam.invalid> wrote:

    But if anyone has a good way to test for a lowercase file name I'd
    love to hear it.

    RISC OS filing systems are case insensitive. The only way you can
    do what you want is to iterate through the filenames, and do
    whatever test you want on each filename returned.


    Thank you David.

    Cheers,

    Bob.
    --
    Bob Latham
    Stourbridge, West Midlands
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Bob Latham@bob@sick-of-spam.invalid to comp.sys.acorn.programmer on Tue May 26 19:46:08 2020
    From Newsgroup: comp.sys.acorn.programmer

    In article <mpro.qay87d069uaym02mn.news@stevefryatt.org.uk>,
    Steve Fryatt <news@stevefryatt.org.uk> wrote:
    On 26 May, Bob Latham wrote in message
    <5876c928b4bob@sick-of-spam.invalid>:

    I presume this is a means to test a directory listing to make
    sure an entry is lower case?

    No, it's just a generic "is this string lower case" test. The two
    SWIs return pointers to tables of bit flags (so 32 bytes of 8 bits
    each, for all 256 characters in a RISC OS character set). In
    alpha_table%, a bit is set if the character is alphabetic; in
    lower_table%, its set if the character is considered lower case.

    You still need OS_GBPB to find the names to test.

    Understood, thank you.

    Cheers,

    Bob.
    --
    Bob Latham
    Stourbridge, West Midlands
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Steve Drain@steve@kappa.me.uk to comp.sys.acorn.programmer on Wed May 27 13:14:37 2020
    From Newsgroup: comp.sys.acorn.programmer

    On 26/05/2020 17:12, Steve Fryatt wrote:

    DEF FNis_lower(string$)
    LOCAL loop%, char%, byte%, bit%, alpha_table%, case_table%, alpha%, lower%

    SYS "Territory_CharacterPropertyTable", -1, 2 TO lower_table%
    SYS "Territory_CharacterPropertyTable", -1, 3 TO alpha_table%

    FOR loop% = 1 TO LEN(string$)
    char% = ASC(MID$(string$, loop%, 1))

    byte% = char% DIV 8
    bit% = char% MOD 8

    alpha% = ((alpha_table%?byte%) AND (1 << bit%)) <> 0
    lower% = ((lower_table%?byte%) AND (1 << bit%)) <> 0

    IF alpha% AND (NOT lower%) THEN =FALSE
    NEXT loop%

    =TRUE
    Perhaps:

    DEF FNis_lower(string$)
    LOCAL buff%,upper%,char%
    buff%=&8200:REM use input buffer or other block
    $buff%=string$
    SYS "Territory_UpperCaseTable",-1 TO upper%
    FOR char%=buff% TO buff%+LENstring$-1
    IF ?char%=upper%??char% THEN =FALSE:REM note ??
    NEXT char%
    =TRUE

    Or, if you want to disentangle it, try:

    DEF FNis_lower(string$)
    LOCAL upper%,char%
    SYS "Territory_UpperCaseTable",-1 TO upper%
    FOR char%=&8100 TO &8100+LENstring$-1
    IF ?char%=upper%??char% THEN =FALSE:REM note ??
    NEXT char%
    =TRUE

    ;-)
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From jgh@jgh@mdfs.net to comp.sys.acorn.programmer on Wed May 27 16:25:40 2020
    From Newsgroup: comp.sys.acorn.programmer

    Or, if you want to disentangle it, try:

    DEF FNis_lower($&8100)
    LOCAL upper%,char%
    SYS "Territory_UpperCaseTable",-1 TO upper%
    char%=&8100-1
    REPEAT
    char%=char%+1
    UNTIL ?char%=upper%??char% OR ?char%=13
    =?char%=13
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Steve Drain@steve@kappa.me.uk to comp.sys.acorn.programmer on Thu May 28 14:16:37 2020
    From Newsgroup: comp.sys.acorn.programmer

    On 28/05/2020 00:25, jgh@mdfs.net wrote:
    Or, if you want to disentangle it, try:

    DEF FNis_lower($&8100)
    LOCAL upper%,char%
    SYS "Territory_UpperCaseTable",-1 TO upper%
    char%=&8100-1
    REPEAT
    char%=char%+1
    UNTIL ?char%=upper%??char% OR ?char%=13
    =?char%=13


    There are many ways to skin this cat and speed is hardly important these
    days, but I think an early exit from the loop on first failure is
    worthwhile. It certainly would be with a long string.

    BTW my trick of using the string accumulator (&8100) works because the LENstring function put the string in there. It is only safe until the
    next string keyword and I would never actually use it.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Erik G@noreply123@xs4all.nl to comp.sys.acorn.programmer on Mon Jun 1 03:19:56 2020
    From Newsgroup: comp.sys.acorn.programmer

    A general afterthought about the efficiency (speed wise) of searching
    a directory tree.

    On 26/05/2020 13:46, Bob Latham wrote:
    Can someone tell me what is the best (speed wise) method of testing
    for a specific file but importantly the name in lower case.

    I have a recursive program running which scans my music library. I
    want it to specifically test each album for the existence of a file 'folder/jpg' but to fail anything with a different case like
    'Folder/jpg'.

    OS_File 17 does not appear to be case sensitive.

    The only way I can see is to read the contents of the directory using
    OS_GBPB 9 and wildcards and then test the characters for lower case.

    I'm thinking that may be a little slow when doing thousands and I'm
    also struggling to make it work anyway. on a short test run it fails
    7 out of 10 albums and all albums had folder.jpg in them.

    (NOTE: it has been a long time since I studied the internals of
    ADFS. Specific efficiency details of SWI calls such as OS_FILE and
    OS_GBPB will have significant effect on the real runtime of any
    such program. Read documentation and experiment to find the best
    solution)

    == In short, the thing I want to impress on all programmers is this:

    To make any algorithm involving disk I/O fast, the focus needs to be
    on:
    - Making as few reads as possible
    - Reading as much data in one operation as possible

    Also:
    - Don't spend much effort optimising the processing of the data by
    the CPU, as the disk I/O will dominate the time the algorithm takes
    to complete.

    This example case of searching through a directory tree involves
    reading several (or a lot) of directories and processing the
    information with a program.
    By far the most time-consuming part of this is the physical reading
    of the information from a disk.
    Reading one block of data requires:
    1) moving the disk head to the correct track
    2) waiting for the disk to rotate to the sector that contains the block
    3) reading the magnetic information from the disk and transferring it
    to memory.

    Of these, steps 1 and 2 take up the most time, in the order of
    milliseconds.

    By comparison, you can do tons of CPU processing in a few milliseconds.

    Note that reading several blocks in a row on the same track
    returns more data, but only requires one head move (step 1) and one
    wait (step 2).
    Also note that continuing to read from the next track only needs
    a very short (and thus quick) head move, while the wait time can be
    practically eliminated by organising the disk in such a way that the
    next block to read on this next track shows up just as the head has
    settled in its new position.

    So in the case of traversing a directory structure, it would be much
    more efficient to read an entire directory on one go and then
    process the data in memory (e.g. searching for a file that matches
    a certain name or pattern), than it would be to ask for the first
    directory entry, process it, then ask for the second entry, process
    it, etcetera.

    My advice for this particular program is to find the best combination
    of SWI calls to get a good I/O performance.

    In a more general sense it is a lot more efficient to read one big file
    with all the data in it rather than have that data spread over lots
    of small files. (For example: the game Kerbal Space Program used to
    have every detail of the game in a separate file, taking up tens of
    thousands of files.
    It took several minutes to load. In recent versions many of
    those files have been combined into a smaller number of bigger files,
    and now the program loads in under a minute.)

    And finally: developers of filing systems have worked for decades to
    optimise the finding, reading, writing, extending and deletion of files,
    using every trick in the book and inventing new ones, because disk I/O
    is one of the major bottlenecks in the speed at which programs run.
    --
    Erik G.
    From address is fake
    See http://erikgrnh.home.xs4all.nl/
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Bob Latham@bob@sick-of-spam.invalid to comp.sys.acorn.programmer on Mon Jun 1 15:53:53 2020
    From Newsgroup: comp.sys.acorn.programmer

    In article <5ed457bc$0$1436$e4fe514c@newszilla.xs4all.nl>,
    Erik G <noreply123@xs4all.nl> wrote:

    And finally: developers of filing systems have worked for decades
    to optimise the finding, reading, writing, extending and deletion
    of files, using every trick in the book and inventing new ones,
    because disk I/O is one of the major bottlenecks in the speed at
    which programs run.

    Thank you for an interesting read.

    In my case I'm checking for various things in a music library stored
    on a Synology DS214+. My program written in assembler, uses Lanman98
    to access the NAS which was quite a bit faster than moonfish.

    The program examines every album and checks for images, file types,
    and tags. On flac albums (all 3390 of them), for every track on every
    album the file is opened and the tagging checked and then the file is
    closed again.

    The program then gives a report on any none conformity to various
    parameters set.

    It varies slightly from run to run but it takes about 14 minutes and
    20 seconds to complete. I'm impressed with the speed.

    Thanks again.

    Bob.
    --
    Bob Latham
    Stourbridge, West Midlands
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From druck@news@druck.org.uk to comp.sys.acorn.programmer on Mon Jun 1 20:01:57 2020
    From Newsgroup: comp.sys.acorn.programmer

    On 28/05/2020 14:16, Steve Drain wrote:
    There are many ways to skin this cat and speed is hardly important these days,

    It can be if you hit a directory on a file server with many thousand
    entries - it certainly lets you know who OS_GBPB's one entry at a time,
    and who uses a decent sized buffer!

    ---druck
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From druck@news@druck.org.uk to comp.sys.acorn.programmer on Mon Jun 1 20:57:46 2020
    From Newsgroup: comp.sys.acorn.programmer

    On 01/06/2020 02:19, Erik G wrote:
    And finally: developers of filing systems have worked for decades to
    optimise the finding, reading, writing, extending and deletion of files, using every trick in the book and inventing new ones, because disk I/O
    is one of the major bottlenecks in the speed at which programs run.

    Unfortunately except on RISC OS, where no use is made of free memory to
    cache filing system operations, as just about every other common OS does.

    The closest RISC OS comes is some fixed size buffering an ADFS, which
    often resulted in the Risc PC's slow motherboard IDE interface
    outperforming much better 3rd party IDE hardware using IDEFS variants
    with no caching.

    ---druck

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From jgh@jgh@mdfs.net to comp.sys.acorn.programmer on Thu Jun 4 09:23:40 2020
    From Newsgroup: comp.sys.acorn.programmer

    Similarly, if there's some I/O information that won't change over the
    run of a program, read it once into a variable, then access the variable.
    For example:
    size%=EXT#inputfile then use size% instead of EXT#
    If your program is never going to change screen mode:
    SYS whatever TO xsz%,ysz%,etc then use xsz% and ysz%

    etc.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Martin@News03@avisoft.f9.co.uk to comp.sys.acorn.programmer on Thu Jun 4 17:51:27 2020
    From Newsgroup: comp.sys.acorn.programmer

    On 04 Jun in article
    <a248d019-7c38-439a-8d5f-62d6d817a285@googlegroups.com>,
    <jgh@mdfs.net> wrote:
    Similarly, if there's some I/O information that won't change over
    the run of a program, read it once into a variable, then access the
    variable.

    For example:
    size%=EXT#inputfile then use size% instead of EXT#

    Excellent advice, in general ... but this example ...

    If your program is never going to change screen mode:
    SYS whatever TO xsz%,ysz%,etc then use xsz% and ysz%

    is a bad one, because if it is a Wimp program the mode is usually
    changed outside your program, so ModeChange messages have to be
    watched for and the relevant variables read again.
    --
    Martin Avison
    Note that unfortunately this email address will become invalid
    without notice if (when) any spam is received.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From druck@news@druck.org.uk to comp.sys.acorn.programmer on Thu Jun 4 20:49:53 2020
    From Newsgroup: comp.sys.acorn.programmer

    On 04/06/2020 17:23, jgh@mdfs.net wrote:
    Similarly, if there's some I/O information that won't change over the
    run of a program, read it once into a variable, then access the variable.
    For example: > size%=EXT#inputfile then use size% instead of EXT#

    Sorry, that's bad advice, a program should always assume filing system
    data may be altered by other processes.

    1) Obviously if its a Wimp application, other tasks are running
    2) If the single tasking program can be run a in taskwindow or graphic
    taskwindow, other tasks are running
    3) If the file is on a remote filing system, other machines may alter it
    4) If the file is on a local filing system which is shared, other
    machines may alter it.

    So only if you are outside the desktop, and storage is on a local non
    shared disc, can you be sure it wont be altered by anything else.

    If your program is never going to change screen mode:
    SYS whatever TO xsz%,ysz%,etc then use xsz% and ysz%

    Only if its running outside the desktop. Inside the desktop the mode can change, so you need to ensure you handle the mode change message an
    re-read any mode related parameters you are using.

    ---druck

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From jgh@jgh@mdfs.net to comp.sys.acorn.programmer on Thu Jun 4 16:18:24 2020
    From Newsgroup: comp.sys.acorn.programmer

    On Thursday, 4 June 2020 20:49:57 UTC+1, druck wrote:
    For example: size%=EXT#inputfile then use size% instead of EXT#

    Sorry, that's bad advice, a program should always assume filing system
    data may be altered by other processes.

    If it's open for input, other processes *can't* alter it.
    Read By Many, Write By One.

    If your program is never going to change screen mode:
    SYS whatever TO xsz%,ysz%,etc then use xsz% and ysz%

    Only if its running outside the desktop. Inside the desktop the mode can change, so you need to ensure you handle the mode change message an
    re-read any mode related parameters you are using.

    Which is why I wrote 'your program is never going to change screen
    mode'. Maybe it should have been 'where the screen mode is never
    going to be changed during the execution of the program'. Such as
    a command line tool or a single-taking application.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From druck@news@druck.org.uk to comp.sys.acorn.programmer on Fri Jun 5 11:27:33 2020
    From Newsgroup: comp.sys.acorn.programmer

    On 05/06/2020 00:18, jgh@mdfs.net wrote:
    On Thursday, 4 June 2020 20:49:57 UTC+1, druck wrote:
    For example: size%=EXT#inputfile then use size% instead of EXT#

    Sorry, that's bad advice, a program should always assume filing system
    data may be altered by other processes.

    If it's open for input, other processes *can't* alter it.
    Read By Many, Write By One.

    It's down to the implementation of the filing system to whether that is
    true. Local filing systems will tend to lock on write, remote ones will
    tend not to. It's a bit of a mine field!

    ---druck
    --- Synchronet 3.21d-Linux NewsLink 1.2