• Re: Inverted Page Tables / Background TLB flush

    From Robert Finch@robfi680@gmail.com to comp.arch on Tue Feb 3 09:45:04 2026
    From Newsgroup: comp.arch

    Background TLB flushing via HW state machine? I am not sure about the
    merits of this approach. Because the TLB is implemented using BRAM the
    valid bits are not individually accessible. The design is too low cost
    to support a separate valid bit array. Therefore, to flush the entire
    TLB there is a background process *1 that reads the TLB entries, clears
    the valid bit then writes it back to the TLB.

    *1 a couple of states in the page walking state machine.

    The automatic flushing for the entire TLB in HW allows translations to continue to take place while the background HW process runs. If the
    entire TLB needs to be flushed, a master TLB count is incremented. TLB
    entries are considered valid only if the entry count matches the master
    TLB count. The background process invalidates TLB entries where the
    entry count does not match the TLB count.

    As long as the background process can cycle through (on average) all the
    TLB entries before the TLB is flushed again there should be no issues
    with stale translations. The master TLB count is a six-bit counter. The
    flush cycle rate can be controlled. Setting the rate to zero disables automatic flushes. Setting it to all ones flushes at the maximum rate, effectively disabling translations while the flush takes place.

    It may be an option to not automatically flush global translations.

    Read up on the TLB in the Linux docs. It seems the TLB may be entirely
    flushed or flushed page by page. Thinking about doing the page-by-page
    flush as background HW process. Flush all the entries matching an ASID.


    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Wed Feb 4 21:14:15 2026
    From Newsgroup: comp.arch


    Robert Finch <robfi680@gmail.com> posted:

    Background TLB flushing via HW state machine?

    Because My 66000 MMU is defined as coherent, there is no flushing
    of the TLB. A write that damages TLB state will invalidate the
    entry all by itself.

    I am not sure about the
    merits of this approach. Because the TLB is implemented using BRAM the
    valid bits are not individually accessible. The design is too low cost
    to support a separate valid bit array. Therefore, to flush the entire
    TLB there is a background process *1 that reads the TLB entries, clears
    the valid bit then writes it back to the TLB.

    *1 a couple of states in the page walking state machine.

    The automatic flushing for the entire TLB in HW allows translations to continue to take place while the background HW process runs. If the
    entire TLB needs to be flushed, a master TLB count is incremented. TLB entries are considered valid only if the entry count matches the master
    TLB count. The background process invalidates TLB entries where the
    entry count does not match the TLB count.

    As long as the background process can cycle through (on average) all the
    TLB entries before the TLB is flushed again there should be no issues
    with stale translations. The master TLB count is a six-bit counter. The flush cycle rate can be controlled. Setting the rate to zero disables automatic flushes. Setting it to all ones flushes at the maximum rate, effectively disabling translations while the flush takes place.

    It may be an option to not automatically flush global translations.

    Read up on the TLB in the Linux docs. It seems the TLB may be entirely flushed or flushed page by page. Thinking about doing the page-by-page
    flush as background HW process. Flush all the entries matching an ASID.


    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Robert Finch@robfi680@gmail.com to comp.arch on Wed Feb 4 19:53:56 2026
    From Newsgroup: comp.arch

    On 2026-02-04 4:14 p.m., MitchAlsup wrote:

    Robert Finch <robfi680@gmail.com> posted:

    Background TLB flushing via HW state machine?

    Because My 66000 MMU is defined as coherent, there is no flushing
    of the TLB. A write that damages TLB state will invalidate the
    entry all by itself.

    So, there is no need to invalidate the entire the TLB? The Linux docs
    refer to flushing the TLB I think as a synonym for invalidating the TLB.
    Is there a difference? When one says 'flush' I think of writing out
    entries back to memory. For instance the accessed and modified flags bits.

    So does invlpg (x86) flush the entries or just mark them invalid? Same
    for invall.

    Thinking a bit to myself, to keep the TLB coherent the flags would need
    to be written to memory on the first setting. The TLB entry would also
    be written to memory on a load that clears the accessed and modified
    flags. Also if the page table were modified the corresponding TLB entry
    would also need to be synced.

    Blew a few LUTs caching address translations in the MMU. The cache
    entries would also need to be kept coherent.

    I am not sure about the
    merits of this approach. Because the TLB is implemented using BRAM the
    valid bits are not individually accessible. The design is too low cost
    to support a separate valid bit array. Therefore, to flush the entire
    TLB there is a background process *1 that reads the TLB entries, clears
    the valid bit then writes it back to the TLB.

    *1 a couple of states in the page walking state machine.

    The automatic flushing for the entire TLB in HW allows translations to
    continue to take place while the background HW process runs. If the
    entire TLB needs to be flushed, a master TLB count is incremented. TLB
    entries are considered valid only if the entry count matches the master
    TLB count. The background process invalidates TLB entries where the
    entry count does not match the TLB count.

    As long as the background process can cycle through (on average) all the
    TLB entries before the TLB is flushed again there should be no issues
    with stale translations. The master TLB count is a six-bit counter. The
    flush cycle rate can be controlled. Setting the rate to zero disables
    automatic flushes. Setting it to all ones flushes at the maximum rate,
    effectively disabling translations while the flush takes place.

    It may be an option to not automatically flush global translations.

    Read up on the TLB in the Linux docs. It seems the TLB may be entirely
    flushed or flushed page by page. Thinking about doing the page-by-page
    flush as background HW process. Flush all the entries matching an ASID.



    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Thu Feb 5 01:31:32 2026
    From Newsgroup: comp.arch


    Robert Finch <robfi680@gmail.com> posted:

    On 2026-02-04 4:14 p.m., MitchAlsup wrote:

    Robert Finch <robfi680@gmail.com> posted:

    Background TLB flushing via HW state machine?

    Because My 66000 MMU is defined as coherent, there is no flushing
    of the TLB. A write that damages TLB state will invalidate the
    entry all by itself.

    So, there is no need to invalidate the entire the TLB?

    My 66000 Software of any privilege level need to expend absolutely
    0.00E-49 instructions per year on TLB maintenance.

    The Linux docs
    refer to flushing the TLB I think as a synonym for invalidating the TLB.
    Is there a difference? When one says 'flush' I think of writing out
    entries back to memory. For instance the accessed and modified flags bits.

    I do not know of any TLB that has a <semi-transient> TLB state equivalent
    to MODIFED (like a normal DCache); so when used and modified TLB entries
    are migrated back to DRAM immediately, there is no need to flush--there
    may be need to invalidate from TLB while maintaining a PTE in the tables themselves.

    So does invlpg (x86) flush the entries or just mark them invalid? Same
    for invall.

    Invalidate. {Assuming Intel has not started using MODIFIED state in
    TLB entries.}

    Thinking a bit to myself, to keep the TLB coherent the flags would need
    to be written to memory on the first setting.

    On first, any, and all settings.

    The TLB entry would also
    be written to memory on a load that clears the accessed and modified
    flags.

    How does a Load, all by itself, cause a PTE to be written to memory ??

    Also if the page table were modified the corresponding TLB entry would also need to be synced.

    Memory updated on use and modify.

    Blew a few LUTs caching address translations in the MMU. The cache
    entries would also need to be kept coherent.

    I am not sure about the
    merits of this approach. Because the TLB is implemented using BRAM the
    valid bits are not individually accessible. The design is too low cost
    to support a separate valid bit array. Therefore, to flush the entire
    TLB there is a background process *1 that reads the TLB entries, clears
    the valid bit then writes it back to the TLB.

    *1 a couple of states in the page walking state machine.

    The automatic flushing for the entire TLB in HW allows translations to
    continue to take place while the background HW process runs. If the
    entire TLB needs to be flushed, a master TLB count is incremented. TLB
    entries are considered valid only if the entry count matches the master
    TLB count. The background process invalidates TLB entries where the
    entry count does not match the TLB count.

    As long as the background process can cycle through (on average) all the >> TLB entries before the TLB is flushed again there should be no issues
    with stale translations. The master TLB count is a six-bit counter. The
    flush cycle rate can be controlled. Setting the rate to zero disables
    automatic flushes. Setting it to all ones flushes at the maximum rate,
    effectively disabling translations while the flush takes place.

    It may be an option to not automatically flush global translations.

    Read up on the TLB in the Linux docs. It seems the TLB may be entirely
    flushed or flushed page by page. Thinking about doing the page-by-page
    flush as background HW process. Flush all the entries matching an ASID.



    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Thu Feb 5 02:02:13 2026
    From Newsgroup: comp.arch

    MitchAlsup <user5857@newsgrouper.org.invalid> writes:
    Robert Finch <robfi680@gmail.com> posted:


    <snip>
    The Linux docs
    refer to flushing the TLB I think as a synonym for invalidating the TLB.
    Is there a difference? When one says 'flush' I think of writing out
    entries back to memory. For instance the accessed and modified flags bits.

    I do not know of any TLB that has a <semi-transient> TLB state equivalent
    to MODIFED (like a normal DCache); so when used and modified TLB entries
    are migrated back to DRAM immediately, there is no need to flush--there
    may be need to invalidate from TLB while maintaining a PTE in the tables >themselves.


    How do you handle hardware access and/or dirty flag updates? If you
    store that in the TLB entry, it will need to be written back to
    the page table in DRAM at some point; either at the time of transition
    or when the TLB entry is evicted.


    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Thu Feb 5 02:14:12 2026
    From Newsgroup: comp.arch


    scott@slp53.sl.home (Scott Lurndal) posted:

    MitchAlsup <user5857@newsgrouper.org.invalid> writes:
    Robert Finch <robfi680@gmail.com> posted:


    <snip>
    The Linux docs
    refer to flushing the TLB I think as a synonym for invalidating the TLB. >> Is there a difference? When one says 'flush' I think of writing out
    entries back to memory. For instance the accessed and modified flags bits.

    I do not know of any TLB that has a <semi-transient> TLB state equivalent >to MODIFED (like a normal DCache); so when used and modified TLB entries >are migrated back to DRAM immediately, there is no need to flush--there
    may be need to invalidate from TLB while maintaining a PTE in the tables >themselves.


    How do you handle hardware access and/or dirty flag updates? If you
    store that in the TLB entry, it will need to be written back to
    the page table in DRAM at some point; either at the time of transition
    or when the TLB entry is evicted.

    A message* is sent to DRAM when used or modified gets set--maybe even
    a few cycles before TLB u/m gets over-written.

    (*) message uses a special interconnect OpCode and carries modified PTE(s),
    so other TLBs may play along without watching everything on the 'bus'.


    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Robert Finch@robfi680@gmail.com to comp.arch on Mon Feb 9 00:36:38 2026
    From Newsgroup: comp.arch

    There used to be separate two bits in my MMU project PTE to encode the
    type and a shortcut page indicator. They were combined into a single
    two-bit field, 0=PTE,1=PTP, and 2=shortcut. That leaves an extra unused
    code. I am wondering what to use the code for. The only thing I can
    think of ATM is a meta-data record indicator.

    I have also been considering PTEs that are not a power-of-two in size.
    For instance, 48-bits or possibly 56 bits. The PTEs would be stored in
    the last part of a page with a page header and other information at the beginning of the page. For an 8kB page the last (or middle) 6kB would be
    PTEs. IDK what the header would be used for, other than perhaps some
    sort of error management. That is, if the first part or the last part of
    the page was overwritten, it could generate an error. Same as a header
    on a block of memory.

    --- Synchronet 3.21b-Linux NewsLink 1.2