• Re: Fun with a Vax, Cost of handling misaligned access

    From John Levine@21:1/5 to All on Sun Feb 2 20:35:30 2025
    According to Thomas Koenig <tkoenig@netcologne.de>:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    problem these days, but the 48 pages or so potentially needed by VAX
    complicated the OS.

    48 pages? What instruction would need that?

    I think it was actually 50.

    The MOVTC and MOVTUC instructions had six operands, five
    of which were multibyte, and one of which was one byte.
    Each of those multibyte operands could cross a page
    boundary, so that's 11 pages.

    But all of the operands could use indirect addressing, each of which
    could cross a page boundary, so that's 12 more pages.

    The instruction itself could cross a page boundary, two more pages,
    for a total of 25.

    The user mode page tables on a Vax were in kernel virtual memory,
    so by carefully pessimized memory allocation, each of those 25
    pages could need a separate page table page, for another
    25, total of 50.

    I am not sure how far along the Vax's design was when they noticed this.
    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to John Levine on Sun Feb 2 16:39:58 2025
    John Levine wrote:
    According to Thomas Koenig <tkoenig@netcologne.de>:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    problem these days, but the 48 pages or so potentially needed by VAX
    complicated the OS.
    48 pages? What instruction would need that?

    I think it was actually 50.

    I believe it's 54 PTE's that must be marked Present.

    The MOVTC and MOVTUC instructions had six operands, five
    of which were multibyte, and one of which was one byte.
    Each of those multibyte operands could cross a page
    boundary, so that's 11 pages.

    But all of the operands could use indirect addressing, each of which
    could cross a page boundary, so that's 12 more pages.

    Yes, each memory operand could use deferred indirect (register contains the address of address of operand) and the addresses could be misaligned and straddle two pages, so that is 5 virtual addresses per memory operand.

    The instruction itself could cross a page boundary, two more pages,
    for a total of 25.

    5 operands gives 25 virtual addresses, +2 for the instruction straddle = 27.

    Then we look at how virtual addresses are translated.

    The user mode page tables on a Vax were in kernel virtual memory,
    so by carefully pessimized memory allocation, each of those 25
    pages could need a separate page table page, for another
    25, total of 50.

    Yes, because the page table base register for user process P0 space
    (the first lowest 1GB) was a *virtual* address in process P1 space
    (the second 1GB), and P1 space PTE virtual address was also a *virtual*
    address in system S0 space (the third 1GB).

    (The net result is for VAX to effect a reverse page table walk similar
    to Intel's caching the interior PTE nodes on its top down walk,
    then checking them in reverse bottom-up order on a TLB miss.)

    So each user virtual address required 2 PTE's be Present, giving 54 pages
    and potentially 27*3=81 memory accesses.

    Note that 32-bit x86 with a 3 level page table requires 3 PTE's be
    marked Present for each virtual address, which may also straddle.
    So potentially 2 VA for the instruction, 2 VA for the operand,
    = 12 PTE's Present and 16 memory accesses.

    Its VAX's 5 memory operands and misaligned address of misaligned address
    of operand addressing modes that causes the scaling.

    I am not sure how far along the Vax's design was when they noticed this.

    It just seems to affect the minimum working set size.
    I had earlier thought they all must be loaded into the TLB at once but
    later realized that is not so as the table walker should just roll over
    the TLB contents if the number exceeds the TLB size. Which is also why
    the 780 could use a 2-way assoc TLB because a conflict evict of a PTE
    that is needed again in the same instruction just gets reloaded.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to EricP on Sun Feb 2 17:19:12 2025
    EricP wrote:
    John Levine wrote:
    According to Thomas Koenig <tkoenig@netcologne.de>:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    problem these days, but the 48 pages or so potentially needed by VAX
    complicated the OS.
    48 pages? What instruction would need that?

    I think it was actually 50.

    I believe it's 54 PTE's that must be marked Present.

    Oops, no I mucked it up. Try again. Maybe 42?

    The MOVTC and MOVTUC instructions had six operands, five
    of which were multibyte, and one of which was one byte.
    Each of those multibyte operands could cross a page
    boundary, so that's 11 pages.

    MOVTC had 6 operands, one in a register, one address of byte table,
    and 4 multibyte operands which could straddle.

    But all of the operands could use indirect addressing, each of which
    could cross a page boundary, so that's 12 more pages.

    Yes, each memory operand could use deferred indirect (register contains the address of address of operand) and the addresses could be misaligned and straddle two pages, so that is 5 virtual addresses per memory operand.

    That should be 4 virtual addresses per multibyte operand
    and 3 addresses for the byte table.

    The instruction itself could cross a page boundary, two more pages,
    for a total of 25.

    5 operands gives 25 virtual addresses, +2 for the instruction straddle =
    27.

    That should be
    (4 operands * 4 VA) + (1 operand * 3 VA) + 2 VA for instruction = 21.

    Then we look at how virtual addresses are translated.

    The user mode page tables on a Vax were in kernel virtual memory,
    so by carefully pessimized memory allocation, each of those 25
    pages could need a separate page table page, for another
    25, total of 50.

    Yes, because the page table base register for user process P0 space
    (the first lowest 1GB) was a *virtual* address in process P1 space
    (the second 1GB), and P1 space PTE virtual address was also a *virtual* address in system S0 space (the third 1GB).

    (The net result is for VAX to effect a reverse page table walk similar
    to Intel's caching the interior PTE nodes on its top down walk,
    then checking them in reverse bottom-up order on a TLB miss.)

    So each user virtual address required 2 PTE's be Present, giving 54 pages
    and potentially 27*3=81 memory accesses.

    So 21*2 = 42 pages Present and 63 memory references for 1 itteration.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Mon Feb 3 02:51:12 2025
    According to EricP <ThatWouldBeTelling@thevillage.com>:
    The MOVTC and MOVTUC instructions had six operands, five
    of which were multibyte, and one of which was one byte.
    Each of those multibyte operands could cross a page
    boundary, so that's 11 pages.

    MOVTC had 6 operands, one in a register, one address of byte table,
    and 4 multibyte operands which could straddle.

    According to my VAX Architecture Handbook, the operands were:

    opcode srclen.rw, srcaddr.ab, fill.rb, tbladdr.ab, dstlen.rw, dstaddr.ab

    The fill.rb was a single byte fill character which could be in memory, the rest all multibyte.

    But all of the operands could use indirect addressing, each of which
    could cross a page boundary, so that's 12 more pages.

    Yes, each memory operand could use deferred indirect (register contains the >> address of address of operand) and the addresses could be misaligned and
    straddle two pages, so that is 5 virtual addresses per memory operand.

    That should be 4 virtual addresses per multibyte operand
    and 3 addresses for the byte table.

    Pretty close, the deferred modes only did one level of indirection.
    There were a bunch of flavors, @(R)+ @B(R) @W(R) @L(R) but they were
    all single indirect, so it's four for most operands and three for the
    fill byte.

    The instruction itself could cross a page boundary, two more pages,
    for a total of 25.

    5 operands gives 25 virtual addresses, +2 for the instruction straddle =

    I get 4 addresses for the five long operands plus 3 for the fill byte plus two for the instruction, total of 25.

    Double that for the page tables and it's 50.

    Yes, because the page table base register for user process P0 space
    (the first lowest 1GB) was a *virtual* address in process P1 space
    (the second 1GB), and P1 space PTE virtual address was also a *virtual*
    address in system S0 space (the third 1GB).

    (The net result is for VAX to effect a reverse page table walk similar
    to Intel's caching the interior PTE nodes on its top down walk,
    then checking them in reverse bottom-up order on a TLB miss.)

    Right, it also had to look at the PTEs to find the P0 or P1 table pages.

    R's,
    John
    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)