• Uncontained battery failure involving Sydney Light Rail

    From Sylvia Else@sylvia@email.invalid to aus.rail on Thu Dec 3 12:46:35 2020
    From Newsgroup: aus.rail

    https://www.atsb.gov.au/media/5778951/ro-2020-005_final.pdf

    "Data corruption likely occurred during the uploading of the auxiliary converter software configuration file (version B09) on some light rail vehicles, including for LRV 053. The fault rendered the battery
    temperature compensation ineffective, resulting in overcharging of the
    battery system."

    Because there's no need to apply a checksum or CRC validation to a configuration file controlling a safety critical system.

    Sylvia.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Matthew Geier@matthew@sleeper.apana.org.au to aus.rail on Thu Dec 3 15:03:38 2020
    From Newsgroup: aus.rail

    On Thursday, 3 December 2020 at 12:46:35 pm UTC+11, Sylvia Else wrote:
    Because there's no need to apply a checksum or CRC validation to a configuration file controlling a safety critical system.
    Such a safety cut of should be hard-wired, not run by software. Battery too hot should have opened a contactor and cut the supply, or even used resettable thermal fuses. Too much faith is conferred to software systems these days. (And even using software to 'correct' 'deficiencies' in the hardware implementation.)
    Even if the software had CRC checks, etc, etc, that's not 'bulletproof'. The software can and does fail. Microprocessors crash. Failure of a software system should not result in catastrophic failure, hardware interlocks should catch it before it gets that far.
    A software-driven 3 phase drive I'm familiar with has it's switching hardware designed in such a way that's it's impossible for the software to turn on a self-destructing combination of switching transistors, the hardware simply will not do it even if commanded to by the microprocessor control unit.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Sylvia Else@sylvia@email.invalid to aus.rail on Fri Dec 4 12:45:21 2020
    From Newsgroup: aus.rail

    On 04-Dec-20 10:03 am, Matthew Geier wrote:
    On Thursday, 3 December 2020 at 12:46:35 pm UTC+11, Sylvia Else wrote:

    Because there's no need to apply a checksum or CRC validation to a
    configuration file controlling a safety critical system.

    Such a safety cut of should be hard-wired, not run by software. Battery too hot should have opened a contactor and cut the supply, or even used resettable thermal fuses. Too much faith is conferred to software systems these days. (And even using software to 'correct' 'deficiencies' in the hardware implementation.)

    Even if the software had CRC checks, etc, etc, that's not 'bulletproof'. The software can and does fail. Microprocessors crash. Failure of a software system should not result in catastrophic failure, hardware interlocks should catch it before it gets that far.
    A software-driven 3 phase drive I'm familiar with has it's switching hardware designed in such a way that's it's impossible for the software to turn on a self-destructing combination of switching transistors, the hardware simply will not do it even if commanded to by the microprocessor control unit.


    I agree that such safety critical stuff should be implemented in
    hardware. But the apparent failure even to use a CRC check emphasises
    the fact that the designers lacked a clue.

    Sylvia.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Matthew Geier@matthew@sleeper.apana.org.au to aus.rail on Wed Dec 16 15:14:20 2020
    From Newsgroup: aus.rail

    On Friday, 4 December 2020 at 12:45:23 pm UTC+11, Sylvia Else wrote:

    I agree that such safety critical stuff should be implemented in
    hardware. But the apparent failure even to use a CRC check emphasises
    the fact that the designers lacked a clue.

    They did on a later release, a release that had not been pushed out to all vehicles at the time of the incident.
    In the server type systems I look after, pretty well NONE of the applications running do anything other than a syntax check of configuration files. The configuration values could be complete gibberish, but it would pass the syntax check, the application will load it then not do what it's supposed to.
    These vehicles are now running commodity operating systems (I know the Sydney CAF Urbos 3 trams run Ubuntu Linux version 11.04, I watched one boot!, I expect Alstom are also running some Linux variant too now). The developers are working on standard desktop systems using standard operating system installations and using standard libraries.
    Once upon a time this was all done by exotic safety proven embedded operating systems that forced a particular design methodology. That's all out the window replaced by 'cheaper and faster' commodity operating systems, development tools and methodology.
    I can't say I agree with this approach, despite being able to see that using a standard Linux distribution will make long term maintenance MUCH easier, but it doesn't force a particular safety critical mind set.
    I once had to maintain a computer lab where we had to support the embedded operating system QNX as well as Windows and Linux. QNX was a right royal pain to make work and use. QNX since been dropped and the students only get exposed to hardware programming under Windows and Linux now.
    --- Synchronet 3.21d-Linux NewsLink 1.2