Forum: Too Lazy BBS

Re: Inverted Page Tables / Background TLB flush

From Robert Finch@robfi680@gmail.com to comp.arch on Tue Feb 3 09:45:04 2026

From Newsgroup: comp.arch

Background TLB flushing via HW state machine? I am not sure about the
merits of this approach. Because the TLB is implemented using BRAM the
valid bits are not individually accessible. The design is too low cost
to support a separate valid bit array. Therefore, to flush the entire
TLB there is a background process *1 that reads the TLB entries, clears
the valid bit then writes it back to the TLB.

*1 a couple of states in the page walking state machine.

The automatic flushing for the entire TLB in HW allows translations to continue to take place while the background HW process runs. If the
entire TLB needs to be flushed, a master TLB count is incremented. TLB
entries are considered valid only if the entry count matches the master
TLB count. The background process invalidates TLB entries where the
entry count does not match the TLB count.

As long as the background process can cycle through (on average) all the
TLB entries before the TLB is flushed again there should be no issues
with stale translations. The master TLB count is a six-bit counter. The
flush cycle rate can be controlled. Setting the rate to zero disables automatic flushes. Setting it to all ones flushes at the maximum rate, effectively disabling translations while the flush takes place.

It may be an option to not automatically flush global translations.

Read up on the TLB in the Linux docs. It seems the TLB may be entirely
flushed or flushed page by page. Thinking about doing the page-by-page
flush as background HW process. Flush all the entries matching an ASID.

--- Synchronet 3.21b-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Wed Feb 4 21:14:15 2026

From Newsgroup: comp.arch

Robert Finch <robfi680@gmail.com> posted:

Background TLB flushing via HW state machine?

Because My 66000 MMU is defined as coherent, there is no flushing
of the TLB. A write that damages TLB state will invalidate the
entry all by itself.

I am not sure about the
merits of this approach. Because the TLB is implemented using BRAM the
valid bits are not individually accessible. The design is too low cost
to support a separate valid bit array. Therefore, to flush the entire
TLB there is a background process *1 that reads the TLB entries, clears
the valid bit then writes it back to the TLB.

*1 a couple of states in the page walking state machine.

The automatic flushing for the entire TLB in HW allows translations to continue to take place while the background HW process runs. If the
entire TLB needs to be flushed, a master TLB count is incremented. TLB entries are considered valid only if the entry count matches the master
TLB count. The background process invalidates TLB entries where the
entry count does not match the TLB count.

As long as the background process can cycle through (on average) all the
TLB entries before the TLB is flushed again there should be no issues
with stale translations. The master TLB count is a six-bit counter. The flush cycle rate can be controlled. Setting the rate to zero disables automatic flushes. Setting it to all ones flushes at the maximum rate, effectively disabling translations while the flush takes place.

It may be an option to not automatically flush global translations.

Read up on the TLB in the Linux docs. It seems the TLB may be entirely flushed or flushed page by page. Thinking about doing the page-by-page
flush as background HW process. Flush all the entries matching an ASID.

--- Synchronet 3.21b-Linux NewsLink 1.2

From Robert Finch@robfi680@gmail.com to comp.arch on Wed Feb 4 19:53:56 2026

From Newsgroup: comp.arch

On 2026-02-04 4:14 p.m., MitchAlsup wrote:

Robert Finch <robfi680@gmail.com> posted:

Background TLB flushing via HW state machine?

Because My 66000 MMU is defined as coherent, there is no flushing
of the TLB. A write that damages TLB state will invalidate the
entry all by itself.

So, there is no need to invalidate the entire the TLB? The Linux docs
refer to flushing the TLB I think as a synonym for invalidating the TLB.
Is there a difference? When one says 'flush' I think of writing out
entries back to memory. For instance the accessed and modified flags bits.

So does invlpg (x86) flush the entries or just mark them invalid? Same
for invall.

Thinking a bit to myself, to keep the TLB coherent the flags would need
to be written to memory on the first setting. The TLB entry would also
be written to memory on a load that clears the accessed and modified
flags. Also if the page table were modified the corresponding TLB entry
would also need to be synced.

Blew a few LUTs caching address translations in the MMU. The cache
entries would also need to be kept coherent.

I am not sure about the
merits of this approach. Because the TLB is implemented using BRAM the
valid bits are not individually accessible. The design is too low cost
to support a separate valid bit array. Therefore, to flush the entire
TLB there is a background process *1 that reads the TLB entries, clears
the valid bit then writes it back to the TLB.

*1 a couple of states in the page walking state machine.

The automatic flushing for the entire TLB in HW allows translations to
continue to take place while the background HW process runs. If the
entire TLB needs to be flushed, a master TLB count is incremented. TLB
entries are considered valid only if the entry count matches the master
TLB count. The background process invalidates TLB entries where the
entry count does not match the TLB count.

As long as the background process can cycle through (on average) all the
TLB entries before the TLB is flushed again there should be no issues
with stale translations. The master TLB count is a six-bit counter. The
flush cycle rate can be controlled. Setting the rate to zero disables
automatic flushes. Setting it to all ones flushes at the maximum rate,
effectively disabling translations while the flush takes place.

It may be an option to not automatically flush global translations.

Read up on the TLB in the Linux docs. It seems the TLB may be entirely
flushed or flushed page by page. Thinking about doing the page-by-page
flush as background HW process. Flush all the entries matching an ASID.

--- Synchronet 3.21b-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Thu Feb 5 01:31:32 2026

From Newsgroup: comp.arch

Robert Finch <robfi680@gmail.com> posted:

On 2026-02-04 4:14 p.m., MitchAlsup wrote:

Robert Finch <robfi680@gmail.com> posted:

Background TLB flushing via HW state machine?

Because My 66000 MMU is defined as coherent, there is no flushing
of the TLB. A write that damages TLB state will invalidate the
entry all by itself.

So, there is no need to invalidate the entire the TLB?

My 66000 Software of any privilege level need to expend absolutely
0.00E-49 instructions per year on TLB maintenance.

The Linux docs
refer to flushing the TLB I think as a synonym for invalidating the TLB.
Is there a difference? When one says 'flush' I think of writing out
entries back to memory. For instance the accessed and modified flags bits.

I do not know of any TLB that has a <semi-transient> TLB state equivalent
to MODIFED (like a normal DCache); so when used and modified TLB entries
are migrated back to DRAM immediately, there is no need to flush--there
may be need to invalidate from TLB while maintaining a PTE in the tables themselves.

So does invlpg (x86) flush the entries or just mark them invalid? Same
for invall.

Invalidate. {Assuming Intel has not started using MODIFIED state in
TLB entries.}

Thinking a bit to myself, to keep the TLB coherent the flags would need
to be written to memory on the first setting.

On first, any, and all settings.

The TLB entry would also
be written to memory on a load that clears the accessed and modified
flags.

How does a Load, all by itself, cause a PTE to be written to memory ??

Also if the page table were modified the corresponding TLB entry would also need to be synced.

Memory updated on use and modify.

Blew a few LUTs caching address translations in the MMU. The cache
entries would also need to be kept coherent.

I am not sure about the
merits of this approach. Because the TLB is implemented using BRAM the
valid bits are not individually accessible. The design is too low cost
to support a separate valid bit array. Therefore, to flush the entire
TLB there is a background process *1 that reads the TLB entries, clears
the valid bit then writes it back to the TLB.

*1 a couple of states in the page walking state machine.

The automatic flushing for the entire TLB in HW allows translations to
continue to take place while the background HW process runs. If the
entire TLB needs to be flushed, a master TLB count is incremented. TLB
entries are considered valid only if the entry count matches the master
TLB count. The background process invalidates TLB entries where the
entry count does not match the TLB count.

As long as the background process can cycle through (on average) all the >> TLB entries before the TLB is flushed again there should be no issues
with stale translations. The master TLB count is a six-bit counter. The
flush cycle rate can be controlled. Setting the rate to zero disables
automatic flushes. Setting it to all ones flushes at the maximum rate,
effectively disabling translations while the flush takes place.

It may be an option to not automatically flush global translations.

Read up on the TLB in the Linux docs. It seems the TLB may be entirely
flushed or flushed page by page. Thinking about doing the page-by-page
flush as background HW process. Flush all the entries matching an ASID.

--- Synchronet 3.21b-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Thu Feb 5 02:02:13 2026

From Newsgroup: comp.arch

MitchAlsup <user5857@newsgrouper.org.invalid> writes:

Robert Finch <robfi680@gmail.com> posted:

<snip>

The Linux docs
refer to flushing the TLB I think as a synonym for invalidating the TLB.
Is there a difference? When one says 'flush' I think of writing out
entries back to memory. For instance the accessed and modified flags bits.

I do not know of any TLB that has a <semi-transient> TLB state equivalent
to MODIFED (like a normal DCache); so when used and modified TLB entries
are migrated back to DRAM immediately, there is no need to flush--there
may be need to invalidate from TLB while maintaining a PTE in the tables >themselves.

How do you handle hardware access and/or dirty flag updates? If you
store that in the TLB entry, it will need to be written back to
the page table in DRAM at some point; either at the time of transition
or when the TLB entry is evicted.

--- Synchronet 3.21b-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Thu Feb 5 02:14:12 2026

From Newsgroup: comp.arch

scott@slp53.sl.home (Scott Lurndal) posted:

MitchAlsup <user5857@newsgrouper.org.invalid> writes:

Robert Finch <robfi680@gmail.com> posted:

<snip>

The Linux docs
refer to flushing the TLB I think as a synonym for invalidating the TLB. >> Is there a difference? When one says 'flush' I think of writing out
entries back to memory. For instance the accessed and modified flags bits.

I do not know of any TLB that has a <semi-transient> TLB state equivalent >to MODIFED (like a normal DCache); so when used and modified TLB entries >are migrated back to DRAM immediately, there is no need to flush--there
may be need to invalidate from TLB while maintaining a PTE in the tables >themselves.

How do you handle hardware access and/or dirty flag updates? If you
store that in the TLB entry, it will need to be written back to
the page table in DRAM at some point; either at the time of transition
or when the TLB entry is evicted.

A message* is sent to DRAM when used or modified gets set--maybe even
a few cycles before TLB u/m gets over-written.

(*) message uses a special interconnect OpCode and carries modified PTE(s),
so other TLBs may play along without watching everything on the 'bus'.

--- Synchronet 3.21b-Linux NewsLink 1.2

From Robert Finch@robfi680@gmail.com to comp.arch on Mon Feb 9 00:36:38 2026

From Newsgroup: comp.arch

There used to be separate two bits in my MMU project PTE to encode the
type and a shortcut page indicator. They were combined into a single
two-bit field, 0=PTE,1=PTP, and 2=shortcut. That leaves an extra unused
code. I am wondering what to use the code for. The only thing I can
think of ATM is a meta-data record indicator.

I have also been considering PTEs that are not a power-of-two in size.
For instance, 48-bits or possibly 56 bits. The PTEs would be stored in
the last part of a page with a page header and other information at the beginning of the page. For an 8kB page the last (or middle) 6kB would be
PTEs. IDK what the header would be used for, other than perhaps some
sort of error management. That is, if the first part or the last part of
the page was overwritten, it could generate an error. Same as a header
on a block of memory.

--- Synchronet 3.21b-Linux NewsLink 1.2

Who's Online

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	59
Nodes:	6 (0 / 6)
Uptime:	00:15:46
Calls:	810
Files:	1,287
Messages:	197,327

Re: Inverted Page Tables / Background TLB flush

Who's Online

System Info