• Re: 80286 protected mode

    From David Brown@21:1/5 to All on Fri Oct 11 14:10:13 2024
    On 10/10/2024 23:30, MitchAlsup1 wrote:
    On Thu, 10 Oct 2024 19:21:20 +0000, David Brown wrote:

    On 10/10/2024 20:38, MitchAlsup1 wrote:
    This is more a symptom of bad ISA design/evolution than of libc
    writers needing superpowers.

    No, it is not.  It has absolutely /nothing/ to do with the ISA.

    For example, if ISA contains an MM instruction which is the
    embodiment of memmove() then absolutely no heroics are needed
    of desired in the libc call.


    The existence of a dedicated assembly instruction does not let you write
    an efficient memmove() in standard C.

          {
               memmove( p, q, size );
          }


    What is that circular reference supposed to do? The whole discussion
    has been about the /fact/ that you cannot implement the "memmove"
    function in a C standard library using fully portable standard C code.

    Do you think you can just write this :

    void * memmove(void * s1, const void * s2, size_t n)
    {
    return memmove(s1, s2, n);
    }

    in your library's source?


    You can implement "memcpy" in portable standard C, using a loop and
    array or pointer syntax (somewhat like your loop below, but with the
    correct type for the index). But you cannot do so for memmove() because
    you cannot identify the direction you need to run your loop in an
    efficient and fully portable manner.

    It does not matter what the target is - the target is totally irrelevant
    for /portable/ standard C code. If the target made a difference, it
    would not be portable!

    I can't understand why this is causing you difficulty.

    Perhaps you simply didn't understand what you wrote a few posts back,
    when you claimed that the reason people writing portable standard C code
    cannot write an efficient memmove() implementation is "a symptom of bad
    ISA design".


    Where the compiler produces the MM instruction itself. Looks damn
    close to standard C to me !!
    OR
          for( int i = 0, i < size; i++ )
               p[i] = q[i];

    Which gets compiled to memcpy()--also looks to be standard C.
    OR

          p_struct = q_struct;

    gets compiled to::

          memmove( &p_struct, &q_struct, sizeof( q_struct ) );

    also looks to be std C.


    Those are standard C, yes. And a good compiler will optimise such code.
    And if the target has some kind of scalable vector support or other
    dedicated instructions for moving or copying memory, it can do a better
    job of optimising the code.

    That has /nothing/ to do with the point under discussion.


    I think you are simply confused about what you are talking about here.
    Either you don't know what is meant by writing portable standard C, or
    you don't know what is meant by implementing a C standard library, or
    you haven't actually been reading the posts you replied to. You seem determined to make the point that /your/ ISA has useful and efficient instructions and features for memory copy functionality, while the x86
    ISA does not, and that means /your/ ISA is good design and the x86 ISA
    is bad design.

    Now, I will fully agree with you that the x86 is not a good design. The
    modern x86 processor devices are proof that you /can/ polish a turd.
    And I fully agree with you that instructions for arbitrary length vector instructions of various sorts (of which memory copying is the simplest operation) have many advantages over SIMD using fixed-size vector
    registers. (ARM and RISC-V also agree with you there.)

    But that is all irrelevant to the discussion.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Brian G. Lucas on Fri Oct 11 13:37:03 2024
    On 10/10/2024 23:19, Brian G. Lucas wrote:
    On 10/10/24 2:21 PM, David Brown wrote:
    [ SNIP]

    The existence of a dedicated assembly instruction does not let you
    write an efficient memmove() in standard C.  That's why I said there
    was no connection between the two concepts.

    If the compiler generates the memmove instruction, then one doesn't
    have to write memmove() is C - it is never called/used.


    The common case is that a good compiler will generate inline code for
    some cases - typically known (at compile-time) small sizes - and call a
    generic library function when the size is not known or is over a certain
    size. Then there are some targets where it will always call the library
    code, and some where it will always generate inline code.

    Even if the compiler /can/ generate inline code, there can be
    circumstances when it will not do so - such as if you have not enabled optimisation, or are optimising for size, or using a weaker compiler, or calling the function indirectly.

    For some targets, it can be helpful to write memmove() in assembly or
    using inline assembly, rather than in non-portable C (which is the
    common case).

    Thus, it IS a symptom of ISA evolution that one has to rewrite
    memmove() every time wider SIMD registers are available.

    It is not that simple.

    There can often be trade-offs between the speed of memmove() and
    memcpy() on large transfers, and the overhead in setting things up
    that is proportionally more costly for small transfers.  Often that
    can be eliminated when the compiler optimises the functions inline -
    when the compiler knows the size of the move/copy, it can optimise
    directly.

    The use of wider register sizes can help to some extent, but not once
    you have reached the width of the internal buses or cache bandwidth.

    In general, there will be many aspects of a C compiler's code
    generator, its run-time support library, and C standard libraries that
    can work better if they are optimised for each new generation of
    processor. Sometimes you just need to re-compile the library with a
    newer compiler and appropriate flags, other times you need to modify
    the library source code.  None of this is specific to memmove().

    But it is true that you get an easier and more future-proof memmove()
    and memcopy() if you have an ISA that supports scalable vector
    processing of some kind, such as ARM and RISC-V have, rather than
    explicitly sized SIMD registers.


    Not applicable.


    I don't understand what you mean by that. /What/ is not applicable to
    /what/ ?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to David Brown on Fri Oct 11 15:13:17 2024
    On Fri, 11 Oct 2024 13:37:03 +0200
    David Brown <david.brown@hesbynett.no> wrote:

    On 10/10/2024 23:19, Brian G. Lucas wrote:

    Not applicable.


    I don't understand what you mean by that. /What/ is not applicable
    to /what/ ?


    Brian probably meant to say that that it is not applicable to his my66k
    LLVM back end.

    But I am pretty sure that what you suggest is applicable, but bad idea
    for memcpy/memmove routine that targets Arm+SVE.
    Dynamic dispatch based on concrete core features/identification, i.e.
    exactly the same mechanism that is done on "non-scalable"
    architectures, would provide better performance. And memcpy/memmove is certainly sufficiently important to justify an additional development
    effort.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Michael S on Fri Oct 11 16:54:13 2024
    On 11/10/2024 14:13, Michael S wrote:
    On Fri, 11 Oct 2024 13:37:03 +0200
    David Brown <david.brown@hesbynett.no> wrote:

    On 10/10/2024 23:19, Brian G. Lucas wrote:

    Not applicable.


    I don't understand what you mean by that. /What/ is not applicable
    to /what/ ?


    Brian probably meant to say that that it is not applicable to his my66k
    LLVM back end.

    But I am pretty sure that what you suggest is applicable, but bad idea
    for memcpy/memmove routine that targets Arm+SVE.
    Dynamic dispatch based on concrete core features/identification, i.e.
    exactly the same mechanism that is done on "non-scalable"
    architectures, would provide better performance. And memcpy/memmove is certainly sufficiently important to justify an additional development
    effort.


    That explanation helps a little, but only a little. I wasn't suggesting anything - or if I was, it was several posts ago and the context has
    long since been snipped. Can you be more explicit about what you think
    I was suggesting, and why it might not be a good idea for targeting a
    "my66k" ISA? (That is not a processor I have heard of, so you'll have
    to give a brief summary of any particular features that are relevant here.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Stephen Fuld on Fri Oct 11 08:15:29 2024
    Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:

    On 10/9/2024 1:20 PM, David Brown wrote:

    There are lots of parts of the standard C library that cannot be
    written completely in portable standard C. (How would you write
    a function that handles files? You need non-portable OS calls.)
    That's why these things are in the standard library in the first
    place.

    I agree with everything you say up until the last sentence. There
    are several languages, mostly older ones like Fortran and COBOL,
    where the file handling/I/O are defined portably within the
    language proper, not in a separate library. It just moves the
    non-portable stuff from the library writer (as in C) to the
    compiler writer (as in Fortran, COBOL, etc.)

    What I think you mean is that I/O and file handling are defined as
    part of the language rather than being written in the language.
    Assuming that's true, what you're saying is not at odds with what
    David said. I/O and so forth cannot be written in unaugmented
    standard C without changing the language. Given the language as
    it is, these things must be put in the standard library, because
    they cannot be provided in the existing language.

    There is an advantage to the C approach of separating out some
    facilities and supplying them only in the standard library. In
    particular, it makes for a very clean distinction between two
    kinds of implementation, what the C standard calls a freestanding implementation (which excludes most of the library) and a hosted
    implementation (which includes the whole library). This facility
    is what allows C to run easily on very small processors, because
    there is no overhead for non-essential language features. That is
    not to say such things couldn't be arranged for Fortran or COBOL,
    but it would be harder, because those languages are not designed
    to be separable.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to David Brown on Fri Oct 11 18:55:29 2024
    On Fri, 11 Oct 2024 12:10:13 +0000, David Brown wrote:


    Do you think you can just write this :

    void * memmove(void * s1, const void * s2, size_t n)
    {
    return memmove(s1, s2, n);
    }

    in your library's source?

    .global memmove
    memmove:
    MM R2,R1,R3
    RET

    sure !

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to Scott Lurndal on Fri Oct 11 15:21:47 2024
    Scott Lurndal wrote:
    Stefan Monnier <monnier@iro.umontreal.ca> writes:
    In the VMS/WinNT way, each memory section is defined as either shared
    or private when created and cannot be changed. This allows optimizations >>> in page table and page file handling.
    Interesting. Do you happen to have a pointer for further reading
    about it?

    *nix needs to maintain various data structures to support forking
    memory just in case it happens.
    I can't imagine what those datastructures would be (which might be just
    another way to say that I was brought up on POSIX and can't imagine the
    world differently).


    http://bitsavers.org/pdf/dec/vax/vms/training/EY-8264E-DP_VMS_Internals_and_Data_Structures_4.4_1988.pdf

    Yeah, that's a great book on how VMS works in detail.
    My copy is v1.0 from 1981.
    It describes the various data structures, some down to the bit level.
    Then chapter 15 Paging Dynamics walks through the details of how
    paging works.

    A book of comparable detail on Linux (but dated) would be:

    Understanding the Linux Virtual Memory Manager, Gorman, 2007 https://www.kernel.org/doc/gorman/pdf/understand.pdf

    Of a similar nature on Windows but without the detail of the above two is:

    (this appears to be two volumes jammed together)
    Windows Internals 6th ed vol 1&2, 2012 https://empyreal96.github.io/nt-info-depot/Windows-Internals-PDFs/Windows%20Internals%206e%20Part1%2B2.pdf

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to All on Sat Oct 12 00:02:32 2024
    On 11/10/2024 20:55, MitchAlsup1 wrote:
    On Fri, 11 Oct 2024 12:10:13 +0000, David Brown wrote:


    Do you think you can just write this :

    void * memmove(void * s1, const void  * s2, size_t n)
    {
        return memmove(s1, s2, n);
    }

    in your library's source?

          .global memmove
    memmove:
          MM     R2,R1,R3
          RET

    sure !

    You are either totally clueless, or you are trolling. And I know you
    are not clueless.

    This discussion has become pointless.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to David Brown on Fri Oct 11 23:32:20 2024
    On Fri, 11 Oct 2024 22:02:32 +0000, David Brown wrote:

    On 11/10/2024 20:55, MitchAlsup1 wrote:
    On Fri, 11 Oct 2024 12:10:13 +0000, David Brown wrote:


    Do you think you can just write this :

    void * memmove(void * s1, const void  * s2, size_t n)
    {
        return memmove(s1, s2, n);
    }

    in your library's source?

          .global memmove
    memmove:
          MM     R2,R1,R3
          RET

    sure !

    You are either totally clueless, or you are trolling. And I know you
    are not clueless.

    This discussion has become pointless.

    The point is that there are a few things that may be hard to do
    with {decode, pipeline, calculations, specifications...}; but
    because they are so universally needed; these, too, should
    "get into ISA".

    One good reason to put them in ISA is to preserve the programmers
    efforts over decades, so they don't have to re-write libc every-
    time a new set of instructions come out.

    Moving an arbitrary amount of memory from point a to point b
    happens to fall into that universal need. Setting an arbitrary
    amount of memory to a value also falls into that universal
    need.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Brett@21:1/5 to mitchalsup@aol.com on Sat Oct 12 05:06:05 2024
    MitchAlsup1 <mitchalsup@aol.com> wrote:
    On Fri, 11 Oct 2024 12:10:13 +0000, David Brown wrote:


    Do you think you can just write this :

    void * memmove(void * s1, const void * s2, size_t n)
    {
    return memmove(s1, s2, n);
    }

    in your library's source?

    .global memmove
    memmove:
    MM R2,R1,R3
    RET

    sure !


    Can R3 be a const, that causes issues for restartability, but branch
    prediction is easier and the code is shorter.

    Though I guess forwarding a const is probably a thing today to improve
    branch prediction, which is normally HORRIBLE for short branch counts.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Fri Oct 18 06:00:54 2024
    Michael S <already5chosen@yahoo.com> writes:

    On Mon, 14 Oct 2024 17:19:40 +0200
    David Brown <david.brown@hesbynett.no> wrote:

    [...]

    My only point of contention is that the existence or lack of such
    instructions does not make any difference to whether or not you can
    write a good implementation of memcpy() or memmove() in portable
    standard C.

    You are moving a goalpost.

    No, he isn't.

    One does not need "good implementation" in a sense you have in mind.
    All one needs is an implementation that pattern matching logic of
    compiler unmistakably recognizes as memove/memcpy. That is very easily
    done in standard C. For memmove, I had shown how to do it in one of the
    posts below. For memcpy its very obvious, so no need to show.

    You have misunderstood the meaning of "standard C", which means
    code that does not rely on any implementation-specific behavior.
    "All one needs is an implementation that ..." already invalidates
    the requirement that the code not rely on implementation-specific
    behavior.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to All on Fri Oct 18 05:39:02 2024
    Terje Mathisen <terje.mathisen@tmsw.no> writes:

    [ISA support for copying possibly overlapping regions of memory]

    [Separately, what is possible to do in portable standard C]

    [...] I really don't think any of us really disagree, it is just
    that we have been discussing two (mostly) orthogonal issues.

    I would summarize the string of conversations as follows.

    It started with talking about what is or is not possible in
    "standard C", by which is meant C that does not rely on any implementation-specific behavior. (Topic A.)

    The discussion shifted after a comment about how to provide
    architectual support for copying one region of memory to
    another, where the areas of memory might overlap. (Topic B.)

    After the introduction of Topic B, most of the subsequent
    conversation either ignored Topic A or conflated the two
    topics.

    The key point is that Topic B has nothing to do with Topic A,
    and vice versa. It's like asking why it's colder in the
    mountains than it is in the summer: both parts have something
    to do with temperature, but in spite of that there is no
    meaningful relationship between them.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Michael S on Fri Oct 18 14:06:17 2024
    Michael S <already5chosen@yahoo.com> writes:
    On Mon, 14 Oct 2024 19:39:41 GMT
    scott@slp53.sl.home (Scott Lurndal) wrote:

    mitchalsup@aol.com (MitchAlsup1) writes:
    On Mon, 14 Oct 2024 15:04:28 +0000, David Brown wrote:

    On 13/10/2024 17:45, Anton Ertl wrote:

    I do think it would be convenient if there were a fully standard
    way to compare independent pointers (other than just for
    equality). Rarely needing something does not mean /never/ needing
    it.

    OK, take a segmented memory model with 16-bit pointers and a 24-bit
    virtual address space. How do you actually compare to segmented
    pointers ??

    Depends. On the Burroughs mainframe there could be eight
    active segments and the segment number was part of the pointer.

    Pointers were 32-bits (actually 8 BCD digits)

    S s OOOOOO

    Where 'S' was a sign digit (C or D), 's' was the
    segment number (0-7) and OOOOOO was the six digit
    offset within the segment (500kB/1000kD each).

    A particular task (process) could have up to
    one million "environments", each environment
    could have up to 100 "memory areas (up to 1000kD)
    of which the first eight were loaded into the
    processor base/limit registers. Index registers
    were 8 digits and were loaded with a pointer as
    described above. Operands could optionally select
    one of the index registers and the operand address
    was treated as an offset to the index register;
    there were 7 index registers.

    Access to memory areas 8-99 use string instructions
    where the pointer was 16 BCD digits:

    EEEEEEMM SsOOOOOO

    Where EEEEEE was the evironment number (0-999999);
    environments starting with D00000 were reserved for
    the MCP (Operating System). MM was the memory area
    number and the remaining eight digits described the
    data within the memory area. A subroutine call could
    call within a memory area or switch to a new environment.

    Memory area 1 was the code region for the segment,
    Memory area 0 held the stack and some global variables
    and was typically shared by all environments.
    Memory areas 2-7 were application dependent and could
    be configured to be shared between environments at
    link time.

    What was the size of phiscal address space ?
    I would suppose, more than 1,000,000 words?

    It varied based on the generation. In the
    1960s, a half megabyte (10^6 digits)
    was the limit.

    In the 1970s, the architecture supported
    10^8 digits, the largest B4800 systems
    were shipped with 2 million digits (1MB).
    In 1979, the B4900 was introduced supporting
    up to 10MB (20 MD), later increased to
    20MB/40MD.

    In the 1980s, the largest systems (V500)
    supported up to 10^9 digits. It
    was that generation of machine where the
    environment scheme was introduced.

    Binaries compiled in 1966 ran on all
    generations without recompilation.

    There was room in the segmentation structures
    for up to 10^18 digit physical addresses
    (where the segments were aligned on 10^3
    digit boundaries).

    Unisys discontinued that line of systems in 1992.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Scott Lurndal on Fri Oct 18 17:34:16 2024
    On Fri, 18 Oct 2024 14:06:17 GMT
    scott@slp53.sl.home (Scott Lurndal) wrote:

    Michael S <already5chosen@yahoo.com> writes:
    On Mon, 14 Oct 2024 19:39:41 GMT
    scott@slp53.sl.home (Scott Lurndal) wrote:

    mitchalsup@aol.com (MitchAlsup1) writes:
    On Mon, 14 Oct 2024 15:04:28 +0000, David Brown wrote:

    On 13/10/2024 17:45, Anton Ertl wrote:

    I do think it would be convenient if there were a fully standard
    way to compare independent pointers (other than just for
    equality). Rarely needing something does not mean /never/
    needing it.

    OK, take a segmented memory model with 16-bit pointers and a
    24-bit virtual address space. How do you actually compare to
    segmented pointers ??

    Depends. On the Burroughs mainframe there could be eight
    active segments and the segment number was part of the pointer.

    Pointers were 32-bits (actually 8 BCD digits)

    S s OOOOOO

    Where 'S' was a sign digit (C or D), 's' was the
    segment number (0-7) and OOOOOO was the six digit
    offset within the segment (500kB/1000kD each).

    A particular task (process) could have up to
    one million "environments", each environment
    could have up to 100 "memory areas (up to 1000kD)
    of which the first eight were loaded into the
    processor base/limit registers. Index registers
    were 8 digits and were loaded with a pointer as
    described above. Operands could optionally select
    one of the index registers and the operand address
    was treated as an offset to the index register;
    there were 7 index registers.

    Access to memory areas 8-99 use string instructions
    where the pointer was 16 BCD digits:

    EEEEEEMM SsOOOOOO

    Where EEEEEE was the evironment number (0-999999);
    environments starting with D00000 were reserved for
    the MCP (Operating System). MM was the memory area
    number and the remaining eight digits described the
    data within the memory area. A subroutine call could
    call within a memory area or switch to a new environment.

    Memory area 1 was the code region for the segment,
    Memory area 0 held the stack and some global variables
    and was typically shared by all environments.
    Memory areas 2-7 were application dependent and could
    be configured to be shared between environments at
    link time.

    What was the size of phiscal address space ?
    I would suppose, more than 1,000,000 words?

    It varied based on the generation. In the
    1960s, a half megabyte (10^6 digits)
    was the limit.

    In the 1970s, the architecture supported
    10^8 digits, the largest B4800 systems
    were shipped with 2 million digits (1MB).
    In 1979, the B4900 was introduced supporting
    up to 10MB (20 MD), later increased to
    20MB/40MD.

    In the 1980s, the largest systems (V500)
    supported up to 10^9 digits. It
    was that generation of machine where the
    environment scheme was introduced.

    Binaries compiled in 1966 ran on all
    generations without recompilation.

    There was room in the segmentation structures
    for up to 10^18 digit physical addresses
    (where the segments were aligned on 10^3
    digit boundaries).

    So, can it be said that ar least some of B6500-compatible models
    suffered from the same problem as 80286 - the segment of maximal size
    didn't cover all linear (or physical) address space?
    Or their index register width was increased to accomodate 1e9 digits in
    the single segment?


    Unisys discontinued that line of systems in 1992.

    I thought it lasted longer. My impresion was that there were still
    hardware implemntation (alongside with emulation on Xeons) sold up
    until 15 years ago.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Vir Campestris@21:1/5 to David Brown on Fri Oct 18 17:38:55 2024
    On 16/10/2024 08:21, David Brown wrote:

    I don't see an advantage in being able to implement them in standard C.
    I /do/ see an advantage in being able to do so well in non-standard, implementation-specific C.

    The reason why you might want your own special memmove, or your own
    special malloc, is that you are doing niche and specialised software.
    For example, you might be making real-time software and require specific
    time constraints on these functions.  In such cases, you are not
    interested in writing fully portable software - it will already contain
    many implementation-specific features or use compiler extensions.

    I have a vague feeling that once upon a time I wrote a malloc for an
    embedded system. Having only one process it had access to the entire
    memory range, and didn't need to talk to the OS. Entirely C is quite
    feasible there.

    But memmove? On an 80286 it will be using rep movsw, rather than a
    software loop, to copy the memory contents to the new location.

    _That_ does require assembler, or compiler extensions, not standard C.

    Andy

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Michael S on Fri Oct 18 16:19:08 2024
    Michael S <already5chosen@yahoo.com> writes:
    On Fri, 18 Oct 2024 14:06:17 GMT
    scott@slp53.sl.home (Scott Lurndal) wrote:

    Michael S <already5chosen@yahoo.com> writes:
    On Mon, 14 Oct 2024 19:39:41 GMT
    scott@slp53.sl.home (Scott Lurndal) wrote:

    mitchalsup@aol.com (MitchAlsup1) writes:
    On Mon, 14 Oct 2024 15:04:28 +0000, David Brown wrote:

    On 13/10/2024 17:45, Anton Ertl wrote:

    I do think it would be convenient if there were a fully standard
    way to compare independent pointers (other than just for
    equality). Rarely needing something does not mean /never/
    needing it.

    OK, take a segmented memory model with 16-bit pointers and a
    24-bit virtual address space. How do you actually compare to
    segmented pointers ??

    Depends. On the Burroughs mainframe there could be eight
    active segments and the segment number was part of the pointer.

    Pointers were 32-bits (actually 8 BCD digits)

    S s OOOOOO

    Where 'S' was a sign digit (C or D), 's' was the
    segment number (0-7) and OOOOOO was the six digit
    offset within the segment (500kB/1000kD each).

    A particular task (process) could have up to
    one million "environments", each environment
    could have up to 100 "memory areas (up to 1000kD)
    of which the first eight were loaded into the
    processor base/limit registers. Index registers
    were 8 digits and were loaded with a pointer as
    described above. Operands could optionally select
    one of the index registers and the operand address
    was treated as an offset to the index register;
    there were 7 index registers.

    Access to memory areas 8-99 use string instructions
    where the pointer was 16 BCD digits:

    EEEEEEMM SsOOOOOO

    Where EEEEEE was the evironment number (0-999999);
    environments starting with D00000 were reserved for
    the MCP (Operating System). MM was the memory area
    number and the remaining eight digits described the
    data within the memory area. A subroutine call could
    call within a memory area or switch to a new environment.

    Memory area 1 was the code region for the segment,
    Memory area 0 held the stack and some global variables
    and was typically shared by all environments.
    Memory areas 2-7 were application dependent and could
    be configured to be shared between environments at
    link time.

    What was the size of phiscal address space ?
    I would suppose, more than 1,000,000 words?

    It varied based on the generation. In the
    1960s, a half megabyte (10^6 digits)
    was the limit.

    In the 1970s, the architecture supported
    10^8 digits, the largest B4800 systems
    were shipped with 2 million digits (1MB).
    In 1979, the B4900 was introduced supporting
    up to 10MB (20 MD), later increased to
    20MB/40MD.

    In the 1980s, the largest systems (V500)
    supported up to 10^9 digits. It
    was that generation of machine where the
    environment scheme was introduced.

    Binaries compiled in 1966 ran on all
    generations without recompilation.

    There was room in the segmentation structures
    for up to 10^18 digit physical addresses
    (where the segments were aligned on 10^3
    digit boundaries).

    So, can it be said that ar least some of B6500-compatible models

    No. The systems I described above are from the medium
    systems family (B2000/B3000/B4000). The B5000/B6000/B7000
    (large) family systems were a completely different stack based
    architecture with a 48-bit word size. The Small systems (B1000)
    supported task-specific dynamic microcode loading (different
    microcode for a cobol app vs. a fortran app).

    Medium systems evolved from the Electrodata Datatron and 220 (1954) through
    the Burroughs B300 to the Burroughs B3500 by 1965. The B5000
    was also developed at the old Electrodata plant in Pasadena
    (where I worked in the 80s) - eventually large systems moved
    out - the more capable large systems (B7XXX) were designed in Tredyffrin
    Pa, the less capable large systems (B5XXX) were designed in Mission Viejo, Ca.

    suffered from the same problem as 80286 - the segment of maximal size
    didn't cover all linear (or physical) address space?
    Or their index register width was increased to accomodate 1e9 digits in
    the single segment?


    Unisys discontinued that line of systems in 1992.

    I thought it lasted longer. My impresion was that there were still
    hardware implemntation (alongside with emulation on Xeons) sold up
    until 15 years ago.

    Large systems still exist today in emulation[*], as do the
    former Univac (Sperry 2200) systems. The last medium system
    (V380) was retired by the City of Santa Ana in 2010 (almost two
    decades after Unisys cancelled the product line) and was moved
    to the Living Computer Museum.

    City of Santa Ana replaced the single 1980 vintage V380 with
    29 windows servers.

    After the merger of Burroughs and Sperry in '86 there were six
    different mainframe architectures - by 1990, all but
    two (2200 and large systems) had been terminated.

    [*] Clearpath Libra https://www.unisys.com/client-education/clearpath-forward-libra-servers/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Vir Campestris on Fri Oct 18 21:45:37 2024
    On 18/10/2024 18:38, Vir Campestris wrote:
    On 16/10/2024 08:21, David Brown wrote:

    I don't see an advantage in being able to implement them in standard
    C. I /do/ see an advantage in being able to do so well in
    non-standard, implementation-specific C.

    The reason why you might want your own special memmove, or your own
    special malloc, is that you are doing niche and specialised software.
    For example, you might be making real-time software and require
    specific time constraints on these functions.  In such cases, you are
    not interested in writing fully portable software - it will already
    contain many implementation-specific features or use compiler extensions.

    I have a vague feeling that once upon a time I wrote a malloc for an
    embedded system. Having only one process it had access to the entire
    memory range, and didn't need to talk to the OS. Entirely C is quite
    feasible there.


    Sure - but you are not writing portable standard C. You are relying on implementation details, or writing code that is only suitable for a
    particular implementation (or set of implementations). It is normal to
    write this kind of thing in C, but it is non-portable C. (Or at least,
    not fully portable C.)

    But memmove? On an 80286 it will be using rep movsw, rather than a
    software loop, to copy the memory contents to the new location.

    _That_ does require assembler, or compiler extensions, not standard C.


    It would normally be written in C, and the compiler will generate the
    "rep" assembly. The bit you can't write in fully portable standard C is
    the comparison of the pointers so you know which direction to do the
    copying.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Waldek Hebisch@21:1/5 to Anton Ertl on Sun Jan 5 21:49:20 2025
    Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    antispam@fricas.org (Waldek Hebisch) writes:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    antispam@fricas.org (Waldek Hebisch) writes:
    From my point of view main drawbacks of 286 is poor support for
    large arrays and problem for Lisp-like system which have a lot
    of small data structures and traverse then via pointers.

    Yes. In the first case the segments are too small, in the latter case
    there are too few segments (if you have one segment per object).

    In the second case one can pack several objects into single
    segment, so except for loct security properties this is not
    a big problem.

    If you go that way, you lose all the benefits of segments, and run
    into the "segments too small" problem. Which you then want to
    circumvent by using segment and offset in your addressing of the small
    data structures, which leads to:

    But there is a lot of loading segment registers
    and slow loading is a problem.

    ...
    Using 16-bit offsets for jumps inside procedure and
    segment-offset pair for calls is likely to lead to better
    or similar performance as purely 32-bit machine.

    With the 80286's segments and their slowness, that is very doubtful.
    The 8086 has branches with 8-bit offsets and branches and calls with
    16-bit offsets. The 386 in 32-bit mode has branches with 8-bit
    offsets and branches and calls with 32-bit offsets; if 16-bit offsets
    for branches would be useful enough for performance, they could
    instead have designed the longer branch length to be 16 bits, and
    maybe a prefix for 32-bit branch offsets.

    At that time Intel apparently wanted to avoid having too many
    instructions.

    Looking in my Pentium manual, the section on CALL has a 20 lines for
    "call intersegment", "call gate" (with priviledge variants) and "call
    to task" instructions, 10 of which probably already existed on the 286 (compared to 2 lines for "call near" instructions that existed on the
    286), and the "Operation" section (the specification in pseudocode)
    consumes about 4 pages, followed by a 1.5 page "Description" section.

    9 of these 10 far call variants deal with protected-mode things, so
    Intel obviously had no qualms about adding instruction variants. If
    they instead had no protected mode, but some 32-bit support, including
    the near call with 32-bit offset that I suggest, that would have
    reduced the number of instruction variants.

    I wrote "instructions". Intel clearly used modes and variants,
    but different call would lead to new opcode.

    I used Xenix on a 286 in 1986 or 1987; my impression is that programs
    were limited to 64KB code and 64KB data size, exactly the PDP-11 model
    you denounce.

    Maybe. I have seen many cases where sofware essentiallt "wastes"
    good things offered by hardware.

    Which "good things offered by hardware" do you see "wasted" by this
    usage in Xenix?

    Medimu mode and shared segments. Plus escape for programs needing
    bigger memory (traditional Unix programs by neccesity fit in 64kB
    limits).

    To me this seems to be the only workable way to use
    the 286 protected mode. Ok, the medium model (near data, far code)
    may also have been somewhat workable, but looking at the cycle counts
    for the protected-mode far calls on the Pentium (and on the 286 they
    were probably even more costly), which start at 22 cycles for a "call
    gate, same priviledge" (compared to 1 cycle on the Pentium for a
    direct call near), one would strongly prefer the small model.

    I have found instruction list on the web which claims 26 + m cycles
    where m in "length of next instruction" (whatever that means) for
    protected mode call using segement. Real mode call using segement
    is 13 + m cycles. Near call call is 7 + m cycles.

    Intel clearly expected that segment-changing calls are infrequent.
    AFAICS this was better than system conventions on IBM mainframes
    where "standard" call normally called memory allocation function
    to allocate stack frame. I do not have data for VAX handy, but
    VAX calls were quite complex, so probably also not fast.

    And modern data at least partially confirms Intel beliefs. When
    AMD introduced 64-bit mode thay also introduced complex calling
    convention intended to optimize speed of calls. Later there
    was a paper by Intel folks essentially claiming that this
    calling convention does not matter: C compilers inline small
    routines, so cost of calls relatively to other things is quite
    small. I think that what was inlined in 2010 would be called
    using near calls in 1982.

    Every successful software used direct access to hardware because of
    performance; the rest waned. Using BIOS calls was just too slow.
    Lotus 1-2-3 won out over VisiCalc and Multiplan by being faster from
    writing directly to video.

    For most early graphic cards direct screen access could be allowed
    just by allocating appropriate segment. And most non-games
    could gain good performance with better system interface.
    I think that variaty of tricks used in games and their
    popularity made protected mode system much less appealing
    to vendors. And that discouraged work on better interfaces
    for non-games.

    MicroSoft and IBM invested lots of work in a 286 protected-mode
    interface: OS/2 1.x. It was limited to the 286 at the insistence of
    IBM, even though work started in August 1985, when they already knew
    that the 386 was coming soon. OS/2 1.0 was released in April 1987,
    1.5 years after the 386.

    OS/2 1.x flopped, and by the time OS/2 was adjusted to the 386, it was
    too late, so the 286 killed OS/2; here we have a case of a software
    project being death-marched by tying itself to "good things offered by hardware" (except that Microsoft defected from the death march after a
    few years).

    Meanwhile, Microsoft introduced Windows/386 in September 1987 (in
    addition to the base (8086) variant of Windows 2.0, which was released
    in December 1987), which used 386 protected mode and virtual 8086 mode
    (which was missing in the "brain-damaged" (Bill Gates) 286). So
    Windows completely ignored 286 protected mode. Windows eventually
    became a big success.

    What I recall is a bit different. IIRC first successful version of
    Windows, that is Windows 3.0 had 3 modes of operation: 8086 compatible,
    286 protected mode and 386 protected mode. Only later Microsoft
    dropped requirement for 8086 compatiblity. I think still later
    it dropped 286 support. Windows 95 was supposed to be 32-bit,
    but contained quite a lot of 16-bit code. IIRC system interface
    to Windows 3.0 and 3.1 was 16-bit and only later Microsoft
    released extention allowing 32-bit system calls.

    I have no information about Windows internals except for some
    public statements by Microsoft and other people, but I think
    it reasonable to assume that Windows was actually a succesful
    example of 8086/286/386 compatibility. That is their 16 bit
    code could use real mode segmentation or protected mode
    segmentation the later both for 286 and 386. For 32-bit
    version they added translation layer to transform arguments
    between 16-bit world and 32-bit world. It is possible
    that this translation layer involved a lot of effort. IIUC
    DEC when porting VMS to Alpha essentially gave up using
    32-bit pointers as main interface.

    Anyway, it seems that Windows was at least as tied to 286
    as OS/2 when it became sucessful and dropped 286 support
    later. And for long time after dropping 286 support
    Windows massively used 16-bit segments.

    Also, Microsoft started NT OS/2 in November 1988 to target the 386
    while IBM was still working on 286 OS/2. Eventually Microsoft and IBM
    parted ways, NT OS/2 became Windows NT, which is the starting point of
    all remaining Windowses from Windows XP onwards.

    Xenix, apart from OS/2 the only other notable protected-mode OS for
    the 286, was ported to the 386 in 1987, after SCO secured "knowledge
    from Microsoft insiders that Microsoft was no longer developing
    Xenix", so SCO (or Microsoft) might have done it even earlier if the commercial situation had been less muddled; in any case, Xenix jumped
    the 286 ship ASAP.

    The verdict is: The only good use of the 286 is as a faster 8086;
    small memory model multi-tasking use is possible, but the 64KB
    segments are so limiting that everybody who understood software either decided to skip this twist (MicroSoft, except on their OS/2 death
    march), or jumped ship ASAP (SCO).

    As I mentioned above I do not believe your claim about Microsoft.
    There were DOS-extenders which allowed use of 286 protected mode
    under DOS. They were used by several software vendors. Clearly,
    programming for flat 32-bit mode is easier and on software market
    that matters more than other factors.

    I think that 286 protected mode is good for its intended use, that
    is protected multitasking systems having more than 64 kB but less
    than say 4 MB. Of course, if you have a lot of hardware resources,
    than 32-bit system using paging may be easier to create. Also,
    speed is tricky: on 486 (and possibly 386) hardware task switch
    was easy to use, but slower than tuned purely software
    implementation. In other parts reloading of segment registers
    could slow down things quite a lot, so 16-bit protected mode
    required a lot of tuning to minimize number of times when
    segement registers were reloaded.

    I do not know if people used 286 in this way, but natural use
    of 286 is as a debugger for 8086 programs. That is use segment
    protection to catch stray accesses. Once program works OK
    deliver it as a real mode program on 8086 gaining speed and
    bigger market.

    AFAIK Linux started using 32-bit mode but heavily depending on
    386 segmentation. Rater quickly dependence on segments was
    limited and what remained was well isolated. But I think that
    Linux shows that _creating_ simple multitasking system is
    easier using hardware properties coming together with 286
    segmentation.

    Intel misjudged what is typical in programs. But they were not
    alone in this. I have translation of Tanenbaum book on computer
    architecture from 1976 (original, translation is from 1983).
    Tanenbaum is very posivite about segmentation, descriptors and
    "high level machines". He gave simple examples where descriptors
    and microprogrammed "high level machine" are supposed to give
    better performance than more conventianal machine.

    And as I already wrote, Intel misjudged market for 286. They
    could guess that 286 system will be too expensive for home
    market for long time. They probably did not expect that
    286 will find its way into PC-s.

    More generally, vendors could release separate versions of
    programs for 8086 and 286 but few did so.

    Were there any who released software both as 8086 and a protected-mode
    80286 variants? Microsoft/SCO with Xenix, anyone else?

    IIUC Microsoft Windows up to 3.0 and probably everbody who wanted
    to say "supported on Windows". That is Windows 3.0 on 286 almost
    surely used 286 protected mode and probably run "Windows" programs
    in protected mode. But Windows also supported 8086 and Microsoft
    guidelines insisted that proper "Windows program" should run on
    8086.

    On DOS I do not remember names of specific programs. I remember
    Phar Lap who provided 286 DOS extender and quite a few programs
    used it. Browsing trough binaries on machines that I used I saw
    the name several times. Sometimes program using DOS extender
    would clearly say that it requires 286, but I vaguely remember
    cases with separate 286 binaries and 8086 binaries where startup
    code loaded right binary. Probably there were also cases whare
    needed switching was hidden inside a single binary.

    And users having
    only binaries wanted to use 8086 on their new systems which
    led to heroic efforts like OS/2 DOS box and later Linux
    dosemu. But integration of 8086 programs with protected
    mode was solved too late for 286 model to gain traction
    (and on 286 "DOS box" had to run in real mode, breaking
    normal system protection).

    Linux never ran on a 80286, and DOSemu uses the virtual 8086 mode,
    which does not require heroic efforts AFAIK.

    Well, baside virtual 8086 mode there is tricky code to get
    right effect. A lot of late "DOS" programs dependend on DOS
    extenders and significant fraction of such programs run fine
    under dosemu. I do not know if Windows ever got its DOS box
    to level of dosemu, but when I used dosemu I heard that
    various things did not work in Windows DOS box.

    There was various segmented hardware around, first and foremost (for
    the designers of the 80286), the iAPX432. And as you write, all the
    good reasons that resulted in segments on the iAPX432 also persisted
    in the 80286. However, given the slowness of segmentation, only the
    tiny (all in one segment), small (one segment for code and one for
    data), and maybe medium memory models (one data segment) are
    competetive in protected mode compared to real mode.

    AFAICS that covered wast majority of programs during eighties.

    The "vast majority" is not enough; if a key application like Lotus
    1-2-3 or Wordperfect did not work on the DOS alternative, the DOS
    alternative was not used. And Lotus 1-2-3 and Wordperfect certainly
    did not limit themselves to 64KB of data.

    I do not know if they offered protected mode versions. But it
    is likely that they did once machines with more than 640 kB
    formed resonable fraction of the PC market.

    Turbo Pascal offered only medium memory model

    Acoording to Terje Mathiesen, it also offered the large memory model.
    On its Wikipedia page, I find: "Besides allowing applications larger
    than 64 KB, Byte in 1988 reported ... for version 4.0". So apparently
    Turbo Pascal 4.0 introduced support for the large memory model in
    1988.

    I am not entirely sure, but probaly I used 4.0. I certainly used
    5.0 and later versions. AFAIR all versions that I used limited
    "static" data to 64 kB, that together with no such limit for code
    I take as definition of "medium" model. I do not remeber explicit
    model switches which were common on PC C compilers. PC compilers
    allowed far/near qualifiers on pointers and I do not rememeber
    significant restrictions on this (but other folks reported that
    some combinations did not work): for data model set defaults,
    but programmer could override it. So in Turbo Pascal one could
    use large pointers if desired (or maybe even by default), but
    static data was in a single 64 kB segment.

    Intel apparently assumed that programmers are willing to spend
    extra work to get good performance and IMO this was right
    as a general statement. Intel probably did not realize that
    programmers will be very reluctant to spent work on security
    features and in particular to spent work on making programs
    fast in 286 protected mode.

    80286 protected mode is never faster than real mode on the same CPU,
    so the way to make programs fast on the 286 is to stick with real
    mode; using the small memory model is an alternative, but as
    mentioned, the memory limits are too restrictive.

    Well, if program needs more than 1 MB total workarounds on 286
    may be more expensive than cost of protected mode. More to
    the point, if one needs security features, then doing them
    in real mode via sofware is likely to take more time than 286
    version. Intel clearly did not anticipate that large fraction
    of 286-s will be used in PC-s and that in PC vendors/developers
    will prefer speed gain (modest when protected mode version
    has enough tuning) to protection.

    Intel probably assumend that 286 would cover most needs,

    As far as protected mode was concerned, they hardly could have been
    more wrong.

    especially
    given that most system had much less memory than 16 MB theoreticlly
    allowed by 286.

    They provided 24 address pins, so they obviously assumed that there
    would be 80286 systems with >8MB. 64KB segments are already too
    limiting on systems with 1MB (which was supported by the 8086),
    probably even for anything beyond 128KB.

    IMO this is partially true: there
    is a class of programs which with some work fit into medium
    model, but using flat address space is easier. I think that
    on 286 (that is with 16 bit bus) those programs (assuming enough
    tuning) run faster than flat 32-bit version.

    Maybe in real mode. Certainly not in protected mode. Just run your
    tuned large-model protected-mode program against a 32-bit small-model
    program for the same task on a 386SX (which is reported as having a
    very similar speed to the 80286 on 16-bit programs).

    My instruction table show _longer_ times for several intructions
    on 386 compared to 286. For example real mode far call on 286
    has 13 clocks + penalty, on 386 17 clocks + the same penalty,
    protected mode call on 286 has 26 clocks + penalty, on 386 has
    34 clocks + penalty. Near call on both is 7 clocks + penalty.

    Anyway, if program consists of several procedures (or clusters
    of closely related procedures) each having few kilobytes then
    it can easily happen that there are thousends of instructions
    between far calls, so cost of far calls is going to be
    negligible (19 clocks per thousends of instructions). If
    program manages to do its work in single 64 kB data (not
    unreasonable for 1 MB code), than it will be faster than
    program using 32-bit addresses. More relevant, in multitaking
    situation with each task having its own data segment there
    will be reloading of segment registers on task switch,
    which is likely to be negligible. Again, each task will
    gain due to smaller pointers. With OS present there will
    be segment reloading due to system calls and this may
    be more significant. However, this mostly due to protection
    and not segmentation.

    And even if you
    find one case where the protected-mode program wins, nobody found it
    worth their time to do this nonsense.

    That is largely true. I wonder what will happen with x32 mode
    on x86_64. AFAIK x32 mode showed measurable performance gains,
    20-30% smaller programs and similar speed gains. In principle
    it should be cheap to support it as it is "just another 32-bit
    target". But some (for me important) programs do not work
    in this mode and there are voices to completly drop it.

    And so OS/2 flopped despite
    being backed by IBM and, until 1990, Microsoft.

    But I think that Intel segmentation had some
    attractive features during eighties.

    You are one of a tiny minority. Even Intel finally saw the light, as
    did everybody else, and nowadays segments are just a bad memory.

    Well, 16-bit segments clearly are too limited when one has several
    megabytes of memory. And consistently 32-bit segmented system
    increases memory use which is nontrivial cost. OTOH there is
    question how much customers are going to pay for security
    features. I think recent times show that secuity has significant
    costs. But lack of security may lead to big losses. So
    there is no easy choice.

    Now people talk more about capabilities. AFAICS capabilities
    offer more than segments, but are going to have higher cost.
    So abstractly, for some systems segments still may look
    attractive. OTOH we now understand that software ecosystem
    is much more varied than prevalent view in seventies, so
    system that fit well to segments are a tiny part.

    And considering bad memory, do you remember PAE? That had
    similar spirit to 8086 segmentation. I guess that due
    to bad feeling for segments among programmers (and possibly
    more relevant compatiblity troubles) Intel did not extend
    this to segments, but spirit was still there.

    --
    Waldek Hebisch

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From George Neuner@21:1/5 to Hebisch on Sun Jan 5 23:01:29 2025
    On Sun, 5 Jan 2025 21:49:20 -0000 (UTC), antispam@fricas.org (Waldek
    Hebisch) wrote:

    Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:

    Meanwhile, Microsoft introduced Windows/386 in September 1987 (in
    addition to the base (8086) variant of Windows 2.0, which was released
    in December 1987), which used 386 protected mode and virtual 8086 mode
    (which was missing in the "brain-damaged" (Bill Gates) 286). So
    Windows completely ignored 286 protected mode. Windows eventually
    became a big success.

    What I recall is a bit different. IIRC first successful version of
    Windows, that is Windows 3.0 had 3 modes of operation: 8086 compatible,
    286 protected mode and 386 protected mode. Only later Microsoft
    dropped requirement for 8086 compatiblity.

    They didn't drop 8086 so much as required 386. Windows and "DOS box"
    required the CPU to have "virtual 8086" mode.

    I think still later it dropped 286 support.

    I know 286 protected mode support continued at least through NT. Not
    sure about 2K.


    Windows 95 was supposed to be 32-bit, but contained quite a lot
    of 16-bit code.

    The GUI was 32-bit, the kernel and drivers were 16-bit. Weird, but it
    made some hardware interfacing easier.

    IIRC system interface to Windows 3.0 and 3.1 was 16-bit and only
    later Microsoft released extention allowing 32-bit system calls.

    Never programmed 3.0.

    3.1 and 3.11 (WfW) had a combination 16/32-bit kernel in which most
    device drivers were 16-bit, but the disk driver could be either 16 or
    32 bit. In WfW the network stack also was 32-bit and the NIC driver
    could be either.

    However the GUI in all 3.x versions was 16-bit 286 protected mode.

    You could run 32-bit "Win32s" programs (Win32s being a subset of
    Win32), but Win32s programs could not use graphics.


    I have no information about Windows internals except for some
    public statements by Microsoft and other people, but I think
    it reasonable to assume that Windows was actually a succesful
    example of 8086/286/386 compatibility. That is their 16 bit
    code could use real mode segmentation or protected mode
    segmentation the later both for 286 and 386. For 32-bit
    version they added translation layer to transform arguments
    between 16-bit world and 32-bit world. It is possible
    that this translation layer involved a lot of effort.

    For a number of years I worked on Windows based image processing
    systems that used OTS ISA-bus acceleration hardware. The drivers were
    16-bit DLLs, and /non-reentrant/. There was one "general" purpose
    board and several special purpose boards that could be combined with
    the general board in "stacks" that communicated via a private high
    speed bus. There could be multiple stacks of boards in the same
    system.

    [Our most complicated system had 7 boards in 2 stacks, one with 5
    boards and the other with 2. Our biggest system had 18 boards: 6
    stacks of 3 boards each. Ever see a 20 slot ISA backplane?]

    The non-reentrant driver made it difficult to simultaneously control
    separate stacks to do different tasks. We created a (reentrant)
    16 bit dispatching "thunk" DLL to translate calls for every
    function of every board that we might possibly want to use ...
    hundreds in all ... and then dynamically loaded multiple instances of
    the driver as required. PITA !!! Worked fine but very hard to debug, particularly when doing several different operations simultaneously.

    On 3.x we simulated threading in the shared 16-bit application space
    using multiple processes, messaging with hidden windows, and "far
    call" IPC using the main program as a kind of "shared library". Having
    real threads on 95 and later allowed actually consolidating everything
    into the same program and (at least initially) made everything easier.
    But then NT forced dealing with protected mode interrupts, while at
    the same time still using 16-bit drivers for everything else - and
    that became yet another PITA.

    We continued to use the image hardware until SIMD became fast enough
    to compete (circa GHz Pentium4 being available on SBC). Excepting
    NT3.x we had systems based on every Windows from 3.1 to NT4.


    Anyway, it seems that Windows was at least as tied to 286
    as OS/2 when it became sucessful and dropped 286 support
    later. And for long time after dropping 286 support
    Windows massively used 16-bit segments.

    I don't know exactly when 286 protected mode was dropped. I do know
    that, at least through NT4, 16-bit DOS mode and GUI applications would
    run so long as they relied on system calls and didn't directly try to
    touch hardware.

    I occasionally needed to run 16-bit VC++ on my NT4 machine.


    IIUC Microsoft Windows up to 3.0 and probably everbody who wanted
    to say "supported on Windows". That is Windows 3.0 on 286 almost
    surely used 286 protected mode and probably run "Windows" programs
    in protected mode. But Windows also supported 8086 and Microsoft
    guidelines insisted that proper "Windows program" should run on
    8086.

    Yes. I used - but never programmed - 3.0 on a V20 (8086 clone). It
    was painfully slow even with 1MB of RAM.


    ... Even Intel finally saw the light, as
    did everybody else, and nowadays segments are just a bad memory.

    Well, 16-bit segments clearly are too limited when one has several
    megabytes of memory. And consistently 32-bit segmented system
    increases memory use which is nontrivial cost. OTOH there is
    question how much customers are going to pay for security
    features. I think recent times show that secuity has significant
    costs. But lack of security may lead to big losses. So
    there is no easy choice.

    Now people talk more about capabilities. AFAICS capabilities
    offer more than segments, but are going to have higher cost.
    So abstractly, for some systems segments still may look
    attractive. OTOH we now understand that software ecosystem
    is much more varied than prevalent view in seventies, so
    system that fit well to segments are a tiny part.

    And considering bad memory, do you remember PAE? That had
    similar spirit to 8086 segmentation. I guess that due
    to bad feeling for segments among programmers (and possibly
    more relevant compatiblity troubles) Intel did not extend
    this to segments, but spirit was still there.

    The bad taste of segments is from exposure to Intel's half-assed
    implementation which exposed the segment selector as part of the
    address.

    Segments /should/ have been implemented similar to the way paging is
    done: the program using flat 32-bit addresses and the MMU (SMU?)
    consulting some kind of segment "database" [using the term loosely].

    Intel had a chance to do it right with the 386, but instead they
    doubled down and expanded the existing poor implementation to support
    larger segments.

    I realize that transistor counts at the time might have made an
    on-chip SMU impossible, but ISTM the SMU would have been a very small
    component that (if necessary) could have been implemented on-die as a coprocessor.

    <grin>Maybe my de-deuces are wild ...</grin>
    but there they are nonetheless.

    YMMV.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)