• The magic number 384

    From Charles Anthony@charles.unix.pro@gmail.com to alt.os.multics on Tue Apr 14 17:08:50 2020
    From Newsgroup: alt.os.multics

    Adventures in DPS8/M emulator debugging... The magic number 384.
    A user reported an issue where a program they had written would sometimes crash with an instruction fault, sometimes not. Working on the assumption that the underlying issue was an uninitialized stack variable, they rigged up a framework that would allow initializing the stack space that would be used for varaibles with a user specified value. The value (a command line parameter) was specifed as a nine-bit decimal integer which was repeated four times per word. (0 would become (in octal) 000000000000, 7 would becomes 007007007007, 511 would become 777777777777). Testing with a 0 did not fault, testing with 511 (setting the words to all ones) produced a fault, thus confirming the unitialized variable theory. Seeking further clues, different values were tried to see if a pattern emerged. Values less then 384 did not fault, values greater or equal to 384 did fault.
    Why 384. What is it about that value that is special?
    The faulting instruction was LPRP rCo Load Pointer Register Packed. The instruction reads an word from memory and extracts three fields from it that are copied into the pointer register: a segment number, a word number and a bit number. The "bit number" field is six bits in length, so can hold values from 0 to 63, however only values of 0 to 35 are valid (as the word length is 36 bits). The LPRP instuction does not test for values greater the 35, but it does a simpler test: if the two high bits are set (110000), then the value (being 48) is obviously to large and the instruction throws a "command fault".
    383 in binary is 101111111 so the stack would be filled with 101111111101111111101111111101111111.
    384 in binary is 110000000 so the stack would be filled with 110000000110000000110000000110000000.
    Note that 384 is smallest value that results in the two high bits being set in word, so all smaller values were used by LPRP without error, but after that with error.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Peter Flass@peter_flass@yahoo.com to alt.os.multics on Wed Apr 15 06:49:14 2020
    From Newsgroup: alt.os.multics

    Charles Anthony <charles.unix.pro@gmail.com> wrote:
    Adventures in DPS8/M emulator debugging... The magic number 384.

    A user reported an issue where a program they had written would sometimes crash with an instruction fault, sometimes not. Working on the assumption that the underlying issue was an uninitialized stack variable, they
    rigged up a framework that would allow initializing the stack space that would be used for varaibles with a user specified value. The value (a
    command line parameter) was specifed as a nine-bit decimal integer which
    was repeated four times per word. (0 would become (in octal)
    000000000000, 7 would becomes 007007007007, 511 would become
    777777777777). Testing with a 0 did not fault, testing with 511 (setting
    the words to all ones) produced a fault, thus confirming the unitialized variable theory. Seeking further clues, different values were tried to
    see if a pattern emerged. Values less then 384 did not fault, values
    greater or equal to 384 did fault.

    Why 384. What is it about that value that is special?

    The faulting instruction was LPRP rCo Load Pointer Register Packed. The instruction reads an word from memory and extracts three fields from it
    that are copied into the pointer register: a segment number, a word
    number and a bit number. The "bit number" field is six bits in length, so
    can hold values from 0 to 63, however only values of 0 to 35 are valid
    (as the word length is 36 bits). The LPRP instuction does not test for
    values greater the 35, but it does a simpler test: if the two high bits
    are set (110000), then the value (being 48) is obviously to large and the instruction throws a "command fault".

    383 in binary is 101111111 so the stack would be filled with 101111111101111111101111111101111111.

    384 in binary is 110000000 so the stack would be filled with 110000000110000000110000000110000000.

    Note that 384 is smallest value that results in the two high bits being
    set in word, so all smaller values were used by LPRP without error, but
    after that with error.


    I wonder how long it took to discover this
    --
    Pete
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Charles Anthony@charles.unix.pro@gmail.com to alt.os.multics on Wed Apr 15 08:38:15 2020
    From Newsgroup: alt.os.multics

    On Wednesday, April 15, 2020 at 6:49:15 AM UTC-7, Peter Flass wrote:

    I wonder how long it took to discover this

    I don't know how much time the user spent creating the test program that isolated the error, but I from the size and complexity of it, I would hazard at least several days.
    When I received a copy of that program, I was able to use the tools that I had build into the emulator to track instructions and data at a sub-instruction level. It took maybe an hour of analysis understand the data flow from the comamand line parameter to the high bits of the LPRP operand.
    In retrospect, the process took longer because I was assuming that there was an error in the emulator and I spent too much time looking carefully at code that was correct. If I had started by reviewing the documentation for the LPRP instruction, I would have immediately seen that the fault would only be caused by the two high bits being set, then would have looked at the source of the user's code and seen how the stack memory was initialized; persumably I would have worked it out with having to run any tests. There have been so many bugs in the code that I have a conditioned response.
    -- Charles
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Peter Flass@peter_flass@yahoo.com to alt.os.multics on Wed Apr 15 09:14:53 2020
    From Newsgroup: alt.os.multics

    Charles Anthony <charles.unix.pro@gmail.com> wrote:
    On Wednesday, April 15, 2020 at 6:49:15 AM UTC-7, Peter Flass wrote:


    I wonder how long it took to discover this


    I don't know how much time the user spent creating the test program that isolated the error, but I from the size and complexity of it, I would
    hazard at least several days.

    When I received a copy of that program, I was able to use the tools that
    I had build into the emulator to track instructions and data at a sub-instruction level. It took maybe an hour of analysis understand the
    data flow from the comamand line parameter to the high bits of the LPRP operand.

    In retrospect, the process took longer because I was assuming that there
    was an error in the emulator and I spent too much time looking carefully
    at code that was correct. If I had started by reviewing the documentation
    for the LPRP instruction, I would have immediately seen that the fault
    would only be caused by the two high bits being set, then would have
    looked at the source of the user's code and seen how the stack memory was initialized; persumably I would have worked it out with having to run any tests. There have been so many bugs in the code that I have a conditioned response.

    I do that with my stuff, too. First assumption is that thererCOs a bug in my code. Probably best to start there, anyway.
    --
    Pete
    --- Synchronet 3.21d-Linux NewsLink 1.2