• Re: RMS File statistics and Hein's RMS_STATS program, all zeroes at tim

    From Richard Jordan@21:1/5 to Richard Jordan on Mon Nov 18 15:08:40 2024
    On 11/18/24 2:45 PM, Richard Jordan wrote:
    Still working on determining the cause of the sporadic severe slowdown
    issues with a particular batch job and have run into another issue.

    We're trying to get RMS stats from one of the files.  We have a monitor batch job that snapshots info about the problem batch job; that monitor
    batch job uses Hein's RMS_STATS program to dump them twice, once just
    before the start of the problem batch job, and once right after it completes.  I don't want to risk using the option to zero the counters
    on the production system and file so we need before and after.


    So the issue is, the output of RMS_STATS in the batch job looks correct
    but all the counters are always zero (0).

    I can do

         RMS_STATS -c -o=a DKA2:[DIR1.DIR2]FILE.DAT

    interactively and I'll get the expected output, with counters climbing
    on subsequent runs.

    If I temporarily disable stats I'll get the warning from RMS_STATS as expected; it is looking at the same correct file either way.

    The problem batch job is run under a normal user account.  The monitor
    batch that does the system analyzer snapshots (to watch for 'busy' file channels) and tries to use RMS_STATS is currently running as SYSTEM, but we've tested it under our priv'd maintenance account also.  No
    difference in behavior.

    And I can run the intaractive command under our maintenance account OR
    as SYSTEM and in both cases get real and incrementing counters back, not
    just zeroes.

    Any thoughts?

    Forgot to add the last test. I trimmed the monitor batch procedure
    that ran RMS_STATS down to just the symbol definitions and those two
    runs, only added a 5 second wait in between them, and its returning real numbers instead of zeroes. So something is different in the 1:55AM -
    3:30AM timeframe when the full monitor job runs. Tested the same
    accounts (SYSTEM and our maintenance account) on two separate batch
    queues, all returns are good.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From abrsvc@21:1/5 to All on Tue Nov 19 11:42:21 2024
    File statistics are gathered only if enabled. Please be sure that you
    have enabled them by using the set file/statistics command for each of
    hte files of interest.

    Dan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Jordan@21:1/5 to abrsvc on Tue Nov 19 10:24:48 2024
    On 11/19/24 5:42 AM, abrsvc wrote:
    File statistics are gathered only if enabled. Please be sure that you
    have enabled them by using the set file/statistics command for each of
    hte files of interest.

    Dan

    They are enabled on the one single data file as noted. If they weren't
    then RMS_STATS would return an error. As it is it returns zero values
    in every field when run in the early morning batch jobs, but returns
    real data when RMS_STATS is run interactively or in batch (during the
    day). It doesn't seem to make sense.

    The 1:55AM batch job originally enabled stats so we would not have them
    running all day, then snapshotted the stats to get starting numbers
    (except all it ever shows are zeros). The actual monitor batch would
    turn stats off on completion (and after trying to take a snapshot) so
    they were not enabled all day (I wasn't sure of overhead imposed).

    Now they are enabled full time, so we still need a 'start' and 'end'
    snapshot for stats during only the problem batch run time.

    Same thing happened this morning. Both the 1:55AM batch log and the
    problem job log has output with zero for all values.

    I define the symbol RMS_STATS on my interactive process and run it
    (copying the lines from the batch jobs so they're identical) and I get
    real values back. If I turn stats off on the file I get the expected
    error back about RMS statistics NOT enabled on the file. If I wait a
    while and turn them back on, I get values showing the incremental usage,
    so turning off stats doesn't seem to actually stop them from being
    retained and updated.

    And I ran my abbreviated batch test again (the stripped down monitor
    batch) on the same queue it runs on, and getting valid numbers.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From abrsvc@21:1/5 to All on Tue Nov 19 16:57:09 2024
    I believe that the /ahare option will make a difference to the
    /statistics behavior. That might explain the differences here.

    Dan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Volker Halle@21:1/5 to All on Tue Nov 19 17:55:26 2024
    NOTE this from good old ITRC forum back in 2006:

    https://community.hpe.com/t5/operating-system-openvms/monitor-rms-problem/td-p/3707104

    ...
    Because rms stats are kept in memory and will reset to 0 if, for a
    moment, no process has the file open.

    Volker.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Jordan@21:1/5 to abrsvc on Tue Nov 19 17:15:10 2024
    On 11/19/24 10:57 AM, abrsvc wrote:
    I believe that the /ahare option will make a difference to the
    /statistics behavior.  That might explain the differences here.

    Dan

    The docs say the /share option allows you to enable or disable stats
    while the file is open. That file is always open by many users during
    business hours and poeridically at night for batch operations. I
    default to using the /share option as a precaution.

    It is highly likely that when the problem batch job finishes, the file
    we're looking is not open by anyone. The job calls multiple images, and
    only on completion of the last image is the RMS_STATS program called.

    So if the word from Volker about stats being cleared when the file is
    not opened by anyone is the case, that would explain things. The file
    is quiesced around 11PM; after that it is not opened until the problem
    batch starts at 2AM.


    So maybe we can create another job that opens the file with minimal
    access at 1:59AM and closes it when the monitor batch has completed its snapshots. Won't be able to test that tonight but tomorrow we'll see.

    Thanks for the input!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From abrsvc@21:1/5 to All on Tue Nov 19 23:47:16 2024
    Volker is right and I didn't think of it at the time. There are sites
    that have a simple program that does nothing but open files and
    hibernate to both keep the statistics active but also the global buffers
    which are reset as well.

    Dan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Arne_Vajh=C3=B8j?=@21:1/5 to abrsvc on Tue Nov 19 19:44:41 2024
    On 11/19/2024 6:47 PM, abrsvc wrote:
    Volker is right and I didn't think of it  at the time. There are sites
    that have a simple program that does nothing but open files and
    hibernate to both keep the statistics active but also the global buffers which are reset as well.

    Any special requirements needed or is:

    $ open/read/share f 'p1'
    $ wait 'p2'
    $ close f
    $ exit

    and:

    $ @keepopen foobar.dat 03:00:00

    good enough?

    Arne

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Volker Halle@21:1/5 to All on Wed Nov 20 08:46:24 2024
    Richard,

    if you'd be already using T4 on the production system, here is an
    article, which explain how to add RMS file information to T4:

    https://h41379.www4.hpe.com/openvms/journal/v11/t4_and_rms_collector.html

    Volker.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)