• 80286 protected mode (was: Byte ordering)

    From Anton Ertl@21:1/5 to Waldek Hebisch on Sun Jan 5 11:10:28 2025
    antispam@fricas.org (Waldek Hebisch) writes:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    antispam@fricas.org (Waldek Hebisch) writes:
    From my point of view main drawbacks of 286 is poor support for
    large arrays and problem for Lisp-like system which have a lot
    of small data structures and traverse then via pointers.

    Yes. In the first case the segments are too small, in the latter case
    there are too few segments (if you have one segment per object).

    In the second case one can pack several objects into single
    segment, so except for loct security properties this is not
    a big problem.

    If you go that way, you lose all the benefits of segments, and run
    into the "segments too small" problem. Which you then want to
    circumvent by using segment and offset in your addressing of the small
    data structures, which leads to:

    But there is a lot of loading segment registers
    and slow loading is a problem.

    ...
    Using 16-bit offsets for jumps inside procedure and
    segment-offset pair for calls is likely to lead to better
    or similar performance as purely 32-bit machine.

    With the 80286's segments and their slowness, that is very doubtful.
    The 8086 has branches with 8-bit offsets and branches and calls with
    16-bit offsets. The 386 in 32-bit mode has branches with 8-bit
    offsets and branches and calls with 32-bit offsets; if 16-bit offsets
    for branches would be useful enough for performance, they could
    instead have designed the longer branch length to be 16 bits, and
    maybe a prefix for 32-bit branch offsets.

    At that time Intel apparently wanted to avoid having too many
    instructions.

    Looking in my Pentium manual, the section on CALL has a 20 lines for
    "call intersegment", "call gate" (with priviledge variants) and "call
    to task" instructions, 10 of which probably already existed on the 286 (compared to 2 lines for "call near" instructions that existed on the
    286), and the "Operation" section (the specification in pseudocode)
    consumes about 4 pages, followed by a 1.5 page "Description" section.

    9 of these 10 far call variants deal with protected-mode things, so
    Intel obviously had no qualms about adding instruction variants. If
    they instead had no protected mode, but some 32-bit support, including
    the near call with 32-bit offset that I suggest, that would have
    reduced the number of instruction variants.

    I used Xenix on a 286 in 1986 or 1987; my impression is that programs
    were limited to 64KB code and 64KB data size, exactly the PDP-11 model
    you denounce.

    Maybe. I have seen many cases where sofware essentiallt "wastes"
    good things offered by hardware.

    Which "good things offered by hardware" do you see "wasted" by this
    usage in Xenix? To me this seems to be the only workable way to use
    the 286 protected mode. Ok, the medium model (near data, far code)
    may also have been somewhat workable, but looking at the cycle counts
    for the protected-mode far calls on the Pentium (and on the 286 they
    were probably even more costly), which start at 22 cycles for a "call
    gate, same priviledge" (compared to 1 cycle on the Pentium for a
    direct call near), one would strongly prefer the small model.

    Every successful software used direct access to hardware because of
    performance; the rest waned. Using BIOS calls was just too slow.
    Lotus 1-2-3 won out over VisiCalc and Multiplan by being faster from
    writing directly to video.

    For most early graphic cards direct screen access could be allowed
    just by allocating appropriate segment. And most non-games
    could gain good performance with better system interface.
    I think that variaty of tricks used in games and their
    popularity made protected mode system much less appealing
    to vendors. And that discouraged work on better interfaces
    for non-games.

    MicroSoft and IBM invested lots of work in a 286 protected-mode
    interface: OS/2 1.x. It was limited to the 286 at the insistence of
    IBM, even though work started in August 1985, when they already knew
    that the 386 was coming soon. OS/2 1.0 was released in April 1987,
    1.5 years after the 386.

    OS/2 1.x flopped, and by the time OS/2 was adjusted to the 386, it was
    too late, so the 286 killed OS/2; here we have a case of a software
    project being death-marched by tying itself to "good things offered by hardware" (except that Microsoft defected from the death march after a
    few years).

    Meanwhile, Microsoft introduced Windows/386 in September 1987 (in
    addition to the base (8086) variant of Windows 2.0, which was released
    in December 1987), which used 386 protected mode and virtual 8086 mode
    (which was missing in the "brain-damaged" (Bill Gates) 286). So
    Windows completely ignored 286 protected mode. Windows eventually
    became a big success.

    Also, Microsoft started NT OS/2 in November 1988 to target the 386
    while IBM was still working on 286 OS/2. Eventually Microsoft and IBM
    parted ways, NT OS/2 became Windows NT, which is the starting point of
    all remaining Windowses from Windows XP onwards.

    Xenix, apart from OS/2 the only other notable protected-mode OS for
    the 286, was ported to the 386 in 1987, after SCO secured "knowledge
    from Microsoft insiders that Microsoft was no longer developing
    Xenix", so SCO (or Microsoft) might have done it even earlier if the
    commercial situation had been less muddled; in any case, Xenix jumped
    the 286 ship ASAP.

    The verdict is: The only good use of the 286 is as a faster 8086;
    small memory model multi-tasking use is possible, but the 64KB
    segments are so limiting that everybody who understood software either
    decided to skip this twist (MicroSoft, except on their OS/2 death
    march), or jumped ship ASAP (SCO).

    More generally, vendors could release separate versions of
    programs for 8086 and 286 but few did so.

    Were there any who released software both as 8086 and a protected-mode
    80286 variants? Microsoft/SCO with Xenix, anyone else?

    And users having
    only binaries wanted to use 8086 on their new systems which
    led to heroic efforts like OS/2 DOS box and later Linux
    dosemu. But integration of 8086 programs with protected
    mode was solved too late for 286 model to gain traction
    (and on 286 "DOS box" had to run in real mode, breaking
    normal system protection).

    Linux never ran on a 80286, and DOSemu uses the virtual 8086 mode,
    which does not require heroic efforts AFAIK.

    There was various segmented hardware around, first and foremost (for
    the designers of the 80286), the iAPX432. And as you write, all the
    good reasons that resulted in segments on the iAPX432 also persisted
    in the 80286. However, given the slowness of segmentation, only the
    tiny (all in one segment), small (one segment for code and one for
    data), and maybe medium memory models (one data segment) are
    competetive in protected mode compared to real mode.

    AFAICS that covered wast majority of programs during eighties.

    The "vast majority" is not enough; if a key application like Lotus
    1-2-3 or Wordperfect did not work on the DOS alternative, the DOS
    alternative was not used. And Lotus 1-2-3 and Wordperfect certainly
    did not limit themselves to 64KB of data.

    Turbo Pascal offered only medium memory model

    Acoording to Terje Mathiesen, it also offered the large memory model.
    On its Wikipedia page, I find: "Besides allowing applications larger
    than 64 KB, Byte in 1988 reported ... for version 4.0". So apparently
    Turbo Pascal 4.0 introduced support for the large memory model in
    1988.

    Intel apparently assumed that programmers are willing to spend
    extra work to get good performance and IMO this was right
    as a general statement. Intel probably did not realize that
    programmers will be very reluctant to spent work on security
    features and in particular to spent work on making programs
    fast in 286 protected mode.

    80286 protected mode is never faster than real mode on the same CPU,
    so the way to make programs fast on the 286 is to stick with real
    mode; using the small memory model is an alternative, but as
    mentioned, the memory limits are too restrictive.

    Intel probably assumend that 286 would cover most needs,

    As far as protected mode was concerned, they hardly could have been
    more wrong.

    especially
    given that most system had much less memory than 16 MB theoreticlly
    allowed by 286.

    They provided 24 address pins, so they obviously assumed that there
    would be 80286 systems with >8MB. 64KB segments are already too
    limiting on systems with 1MB (which was supported by the 8086),
    probably even for anything beyond 128KB.

    IMO this is partially true: there
    is a class of programs which with some work fit into medium
    model, but using flat address space is easier. I think that
    on 286 (that is with 16 bit bus) those programs (assuming enough
    tuning) run faster than flat 32-bit version.

    Maybe in real mode. Certainly not in protected mode. Just run your
    tuned large-model protected-mode program against a 32-bit small-model
    program for the same task on a 386SX (which is reported as having a
    very similar speed to the 80286 on 16-bit programs). And even if you
    find one case where the protected-mode program wins, nobody found it
    worth their time to do this nonsense. And so OS/2 flopped despite
    being backed by IBM and, until 1990, Microsoft.

    But I think that Intel segmentation had some
    attractive features during eighties.

    You are one of a tiny minority. Even Intel finally saw the light, as
    did everybody else, and nowadays segments are just a bad memory.

    Another thing is 386. I think that designers of 286 thought
    that 386 will remove some limitations. And 386 allowed
    bigger segmensts removing one major limitation. OTOH
    for 32-bit processor with segementation it would be natural
    to have 32-bit segment registers. It is not clear to
    me if 16-bit segment registers in 386 were deemed necessary
    for backward compatibility or maybe in 386 period flat
    fraction in Intel won and they kept segmentation mostly
    for compatibility.

    The latter (read the 386 oral history). The 386 designers knew that
    segments have no future, and they were right, so they kept them at a
    minimum.

    If they had gone for 32-bit segment registers (and 64-bit segment
    registers for AMD64), would segments have fared any better? I doubt
    it. Using segments would have stayed slow, and would have been
    ignored by nearly all programmers.

    These days we see segment-like things in security extensions of
    instruction sets, but slowness still plagues these extensions, and
    security researchers often find ways to subvert the promised security
    (and sometimes even more).

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Robert Swindells@21:1/5 to Anton Ertl on Sun Jan 5 18:30:41 2025
    On Sun, 05 Jan 2025 11:10:28 GMT, Anton Ertl wrote:

    Xenix, apart from OS/2 the only other notable protected-mode OS for the
    286, was ported to the 386 in 1987, after SCO secured "knowledge from Microsoft insiders that Microsoft was no longer developing Xenix", so
    SCO (or Microsoft) might have done it even earlier if the commercial situation had been less muddled; in any case, Xenix jumped the 286 ship
    ASAP.

    Microport Systems had UNIX System V for the 286.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to George Neuner on Mon Jan 6 08:24:43 2025
    George Neuner <gneuner2@comcast.net> writes:
    The bad taste of segments is from exposure to Intel's half-assed >implementation which exposed the segment selector as part of the
    address.

    Segments /should/ have been implemented similar to the way paging is
    done: the program using flat 32-bit addresses and the MMU (SMU?)
    consulting some kind of segment "database" [using the term loosely].

    What benefits do you expect from segments? One benefit usually
    mentioned is security, in particular, protection against out-of-bounds
    accesses (e.g., buffer overflow exploits).

    If the program uses 32-bit (or nowadays 64-bit) addresses, and the
    segment number is just part of that, you don't get that protection: An out-of-bounds access could not be distinguished from a valid access to
    a different segment. There might be some addresses that are in no
    segment, and that would lead to a segment violation, but the same is
    true for paging-based "security" now; the important part is that there
    would be (and is) no guarantee that an out-of-bounds access is caught.

    The 286 segments catch out-of-segment accesses. The size granularity
    of the 386's 32-bit segments is coarse, but at least out-of-bounds
    accesses do not intrude into other segments.

    On the 286 and 386 segment numbers are stored in memory just like any
    other data, so an attacker may be able to change the segment number in
    addition to (or instead of) the offset, and thus gain access to
    sensitive data, so the security provided by 286/386 segments is
    limited. I have not looked closely into CHERI, but I dimly remember
    some claims that they protect against manipulation of the extra data
    (what would be the segment number in the 286) in the 128-bit address.

    Intel had a chance to do it right with the 386, but instead they
    doubled down and expanded the existing poor implementation to support
    larger segments.

    It looks to me that they took the right choices: Support 286 protected
    mode, add virtual 8086 mode, support a flat memory model like
    everybody else has done in modern computers (S/360, PDP-11); to
    combine these requirements, they added support for segments up to 4GB
    in size, so people wanting to use flat 32-bit addressing could just
    use the tiny memory model (CS=DS=SS) and forget about segments.

    I realize that transistor counts at the time might have made an
    on-chip SMU impossible, but ISTM the SMU would have been a very small >component that (if necessary) could have been implemented on-die as a >coprocessor.

    How would the addresses be divided into segment and offset in your
    model? What kind of addresses would you have used on the 286? What
    would the SMU have to do? Would a PC have used such an SMU if it was
    a separate chip?

    If they had made the 286 a kind of real-mode-only 386SX-like CPU, I
    think that PCs would have been designed without SMU. And one problem
    would have been that you probably would want 32 address bits to flow
    from the CPU to the SMU, but the 286 and 386SX only have 24 address
    pins, and additional pins are expensive.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Anton Ertl on Mon Jan 6 14:41:22 2025
    On Mon, 06 Jan 2025 08:24:43 GMT
    anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:


    How would the addresses be divided into segment and offset in your
    model? What would the SMU have to do?


    - anton

    Those are sort of questions that in the past I several times asked Nick Maclaren when he was still active on c.a. Never got an answer that I
    was able to understand.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Terje Mathisen@21:1/5 to Anton Ertl on Mon Jan 6 16:05:02 2025
    Anton Ertl wrote:
    George Neuner <gneuner2@comcast.net> writes:
    The bad taste of segments is from exposure to Intel's half-assed
    implementation which exposed the segment selector as part of the
    address.

    Segments /should/ have been implemented similar to the way paging is
    done: the program using flat 32-bit addresses and the MMU (SMU?)
    consulting some kind of segment "database" [using the term loosely].

    What benefits do you expect from segments? One benefit usually
    mentioned is security, in particular, protection against out-of-bounds accesses (e.g., buffer overflow exploits).

    The best idea I have seen to help detect out of bounds accesses, is to
    round all requested memory blocks up to the next 4K boundary and mark
    the next page as unavailable, then return a skewed pointer back, so that
    the end of the requested region coincides with the end of the (last)
    allocated page. This does require at least 8kB for every allocation, but
    I guess they can all share a single trapping segment?

    (This idea does not help locate negative buffer overruns (underruns?)
    but they seem to be much less common?)

    Terje

    --
    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Terje Mathisen on Mon Jan 6 16:36:41 2025
    Terje Mathisen <terje.mathisen@tmsw.no> writes:
    The best idea I have seen to help detect out of bounds accesses, is to
    round all requested memory blocks up to the next 4K boundary and mark
    the next page as unavailable, then return a skewed pointer back, so that
    the end of the requested region coincides with the end of the (last) >allocated page. This does require at least 8kB for every allocation, but
    I guess they can all share a single trapping segment?

    (This idea does not help locate negative buffer overruns (underruns?)
    but they seem to be much less common?)

    It also does not help for out-of-bounds accesses that are not just
    adjacent to an earlier in-bounds access. That may also be a less
    common vulnerability than adjacent positive-stride buffer overflows.
    But if we throw hardware on the problem, do we want to spend hardware
    on something that does not catch all out-of-bounds accesses?

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Mon Jan 6 18:58:16 2025
    According to George Neuner <gneuner2@comcast.net>:
    The bad taste of segments is from exposure to Intel's half-assed >implementation which exposed the segment selector as part of the
    address.

    Segments /should/ have been implemented similar to the way paging is
    done: the program using flat 32-bit addresses and the MMU (SMU?)
    consulting some kind of segment "database" [using the term loosely].

    The whole point of a segmented architecture is that the segments are visible and
    meaningful. You put a thing (for some definition of thing) in a segment to control access to the thing. So if it's an array, all of the address calculations are relative to the segment and out of bounds references fail because they point to a non-existent part of the segment. Similiarly if it's code, a jump outside the segment's boundaries fails.

    Muitics and the Burroughs machines had (still have, I suppose for emulated Burroughs) visible segments and programmers liked them just fine. The problems were that the segment sizes were too small as memories got bigger, and that they
    weren't byte addressed which these days is practically mandatory. The 286 added
    additional flaws that there weren't enough segment registers and segment loads were very slow.

    What you're describing is multi-level page tables. Every virtual memory system has them. Sometimes the operating systems make the higher level tables visible to applications, sometimes they don't. For example, in IBM mainframes the second
    level page table entries, which they call segments, can be shared between applications.




    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Anton Ertl on Mon Jan 6 19:49:34 2025
    On Mon, 6 Jan 2025 16:36:41 +0000, Anton Ertl wrote:

    Terje Mathisen <terje.mathisen@tmsw.no> writes:
    The best idea I have seen to help detect out of bounds accesses, is to >>round all requested memory blocks up to the next 4K boundary and mark
    the next page as unavailable, then return a skewed pointer back, so that >>the end of the requested region coincides with the end of the (last) >>allocated page. This does require at least 8kB for every allocation, but
    I guess they can all share a single trapping segment?

    (This idea does not help locate negative buffer overruns (underruns?)
    but they seem to be much less common?)

    It also does not help for out-of-bounds accesses that are not just
    adjacent to an earlier in-bounds access. That may also be a less
    common vulnerability than adjacent positive-stride buffer overflows.
    But if we throw hardware on the problem, do we want to spend hardware
    on something that does not catch all out-of-bounds accesses?

    An IBM guy once told me::

    "If you are going to put it in HW, put it in in such a way that you
    never have to change the definition of what you put in.

    So, to answer the above question:: you want to check absolutely
    all boundaries on all multi-container data objects, including
    array bounds within a structure::

    struct { integer a,b,c,d;
    double l[max],m[max],n[max][max]; } k;

    Any access to m[] is checked to be within the substructure
    of m[*], so you cannot touch l[] or n[][], or a,b,c, or d.

    Try doing that with segmentation bounds checking...or
    capabilities...

    - anton

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to John Levine on Mon Jan 6 19:45:43 2025
    John Levine <johnl@taugh.com> writes:
    According to George Neuner <gneuner2@comcast.net>:
    The bad taste of segments is from exposure to Intel's half-assed >>implementation which exposed the segment selector as part of the
    address.

    Segments /should/ have been implemented similar to the way paging is
    done: the program using flat 32-bit addresses and the MMU (SMU?)
    consulting some kind of segment "database" [using the term loosely].

    The whole point of a segmented architecture is that the segments are visible and
    meaningful. You put a thing (for some definition of thing) in a segment to >control access to the thing. So if it's an array, all of the address >calculations are relative to the segment and out of bounds references fail >because they point to a non-existent part of the segment. Similiarly if it's >code, a jump outside the segment's boundaries fails.

    Muitics and the Burroughs machines had (still have, I suppose for emulated

    The original HP-3000 also had segments.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Terje Mathisen on Mon Jan 6 19:41:49 2025
    On Mon, 6 Jan 2025 15:05:02 +0000, Terje Mathisen wrote:

    Anton Ertl wrote:
    George Neuner <gneuner2@comcast.net> writes:
    The bad taste of segments is from exposure to Intel's half-assed
    implementation which exposed the segment selector as part of the
    address.

    Segments /should/ have been implemented similar to the way paging is
    done: the program using flat 32-bit addresses and the MMU (SMU?)
    consulting some kind of segment "database" [using the term loosely].

    What benefits do you expect from segments? One benefit usually
    mentioned is security, in particular, protection against out-of-bounds
    accesses (e.g., buffer overflow exploits).

    The best idea I have seen to help detect out of bounds accesses, is to
    round all requested memory blocks up to the next 4K boundary and mark
    the next page as unavailable, then return a skewed pointer back, so that
    the end of the requested region coincides with the end of the (last) allocated page.
    This does require at least 8kB for every allocation, but
    I guess they can all share a single trapping segment?

    You allocate no more actual memory, but you do consume an additional
    virtual address PTE on those pages marked no-access. If, later, you
    expand that allocated area, you can THEN allocate a page and update
    the PTE.

    (This idea does not help locate negative buffer overruns (underruns?)
    but they seem to be much less common?)

    Use an unallocated page prior to the buffer, too.

    Terje

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to John Levine on Mon Jan 6 19:48:46 2025
    John Levine <johnl@taugh.com> writes:
    According to George Neuner <gneuner2@comcast.net>:
    The bad taste of segments is from exposure to Intel's half-assed >>implementation which exposed the segment selector as part of the
    address.

    Segments /should/ have been implemented similar to the way paging is
    done: the program using flat 32-bit addresses and the MMU (SMU?)
    consulting some kind of segment "database" [using the term loosely].

    The whole point of a segmented architecture is that the segments are visible and
    meaningful. You put a thing (for some definition of thing) in a segment to >control access to the thing. So if it's an array, all of the address >calculations are relative to the segment and out of bounds references fail >because they point to a non-existent part of the segment. Similiarly if it's >code, a jump outside the segment's boundaries fails.

    Muitics and the Burroughs machines had (still have, I suppose for emulated >Burroughs) visible segments and programmers liked them just fine. The problems >were that the segment sizes were too small as memories got bigger, and that they
    weren't byte addressed which these days is practically mandatory. The 286 added
    additional flaws that there weren't enough segment registers and segment loads >were very slow.

    What you're describing is multi-level page tables. Every virtual memory system >has them. Sometimes the operating systems make the higher level tables visible >to applications, sometimes they don't. For example, in IBM mainframes the second
    level page table entries, which they call segments, can be shared between >applications.

    There have been a number of attempts to use capabilities to describe
    individual data items (the aforementioned Burrougsh systems are the
    canonical examples).

    There are investigations into adapting such schemes to modern
    microprocessors, one of which is CHERI which uses 128-bit
    pointers to encode various attributes, including the size
    of the object.

    https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Terje Mathisen on Mon Jan 6 22:02:30 2025
    Terje Mathisen <terje.mathisen@tmsw.no> schrieb:

    The best idea I have seen to help detect out of bounds accesses, is to
    round all requested memory blocks up to the next 4K boundary and mark
    the next page as unavailable, then return a skewed pointer back, so that
    the end of the requested region coincides with the end of the (last) allocated page. This does require at least 8kB for every allocation, but
    I guess they can all share a single trapping segment?

    (This idea does not help locate negative buffer overruns (underruns?)
    but they seem to be much less common?)

    It is also problematic to allocate 8K (or more) for a small entity, or
    on the stack.

    Bounds checking should ideally impart minimum overhead so that it
    can be enabled in production code.

    Hmm... a beginning of an idea (for which I am ready to be shot
    down, this is comp.arch :-)

    This would work best for languages which explicitly pass
    array bounds or sizes (like Fortran's assumed size arrays,
    or, if I read this correctly, Rust's slices).

    Assume a class of load and store instructions containing

    - One source or destination register
    - One base register
    - One index register
    - One ubound register

    Memory access is to base + index, with one additional point:
    If index > ubound, then the instruction raises an exception.

    This works less well with C's pointers, for which you would have
    to pass some sort of fat pointer. Compilers would have to make
    sure that the address of the base object is passed.

    Comments?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Thomas Koenig on Mon Jan 6 22:57:11 2025
    Thomas Koenig <tkoenig@netcologne.de> writes:
    Terje Mathisen <terje.mathisen@tmsw.no> schrieb:

    The best idea I have seen to help detect out of bounds accesses, is to
    round all requested memory blocks up to the next 4K boundary and mark
    the next page as unavailable, then return a skewed pointer back, so that
    the end of the requested region coincides with the end of the (last)
    allocated page. This does require at least 8kB for every allocation, but
    I guess they can all share a single trapping segment?

    (This idea does not help locate negative buffer overruns (underruns?)
    but they seem to be much less common?)

    It is also problematic to allocate 8K (or more) for a small entity, or
    on the stack.

    Bounds checking should ideally impart minimum overhead so that it
    can be enabled in production code.

    Hmm... a beginning of an idea (for which I am ready to be shot
    down, this is comp.arch :-)

    This would work best for languages which explicitly pass
    array bounds or sizes (like Fortran's assumed size arrays,
    or, if I read this correctly, Rust's slices).

    Assume a class of load and store instructions containing

    - One source or destination register
    - One base register
    - One index register
    - One ubound register

    See aforementioned CHERI.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Thomas Koenig on Mon Jan 6 23:41:41 2025
    On Mon, 6 Jan 2025 22:02:30 +0000, Thomas Koenig wrote:

    Hmm... a beginning of an idea (for which I am ready to be shot
    down, this is comp.arch :-)

    This would work best for languages which explicitly pass
    array bounds or sizes (like Fortran's assumed size arrays,
    or, if I read this correctly, Rust's slices).

    Assume a class of load and store instructions containing

    - One source or destination register
    - One base register
    - One index register
    - One ubound register

    Memory access is to base + index, with one additional point:
    If index > ubound, then the instruction raises an exception.

    Now, you are only checking the ubound and not the lbound; so,
    you only stumble over ½ the bound errors.

    Where you should START is with a data structure that defines
    the memory region::

    First Byte accessible Possibly lbound
    Last Byte accessible Possibly ubound
    other stuff as needed

    Then figure out how to efficiently perform the checks in ISA
    of choice (or add to ISA).

    This works less well with C's pointers, for which you would have
    to pass some sort of fat pointer. Compilers would have to make
    sure that the address of the base object is passed.

    I blame the programmers for not using FAT pointers (and then
    teaching the compilers how to get rid of most of the checks.)
    Nothing is preventing C programmers from using FAT pointers,
    and thereby avoid all those buffer overruns.

    Comments?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lynn Wheeler@21:1/5 to John Levine on Mon Jan 6 17:28:11 2025
    John Levine <johnl@taugh.com> writes:
    What you're describing is multi-level page tables. Every virtual
    memory system has them. Sometimes the operating systems make the
    higher level tables visible to applications, sometimes they don't. For example, in IBM mainframes the second level page table entries, which
    they call segments, can be shared between applications.

    initial adding virtual memory to all IBM 370s was similar to 24bit
    360/67 but had options for 16 1mbyte segments or 256 64kbyte segments
    and either 4kbyte or 2kbyte pages. Initial mapping of 360 MVT to VS2/SVS
    was single 16mbyte address space ... very similar to running MVT in a
    CP/67 16mbyte virtual machine.

    The upgrade to VS2/MVS gave each region its own 16mbyte virtual address
    space. However, OS/360 MVT API heritage was pointer passing API ... so
    they mapped a common 8mbyte image of the "MVS" kernel into every 16mbyte virtual address space (leaving 8mbytes for application code), kernel API
    call code could still directly access user code API parameters
    (basically same code from MVT days).

    However, MVT subsystems were also moved into their separate 16mbyte
    virtual address space ... making it harder to access application API
    calling parameters. So they defined a common segment area (CSA), 1mbyte
    segment mapped into every 16mbyte virtual address space, application
    code would get space in the CSA for API parameter information calling subsystesm.

    Problem was the requirement for subsystem API parameter (CSA) space was proportional to number of concurrent applications plus number of
    subsystems and quickly exceed 1mbyte ... and it morphs into
    multi-megabyte common system area. By the end of the 70s, CSAs were
    running 5-6mbytes (leaving 2-3mbytes for programs) and threatening to
    become 8mbytes (leaving zero mbytes for programs)... part of the mad
    rush to XA/370 and 31-bit virtual addressing (as well as access
    registers, and multiple concurrent virtual address spaces ... "Program
    Call" instruction had a table of MVS/XA address space pointers for
    subsystems, the PC instruction whould move the caller's address space
    pointer to secondary and load the subsystem address space pointer into
    primary ... program return instruction reversed the processes and moved
    the secondary pointer back to primary).

    --
    virtualization experience starting Jan1968, online at home since Mar1970

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)