Forum: Too Lazy BBS

Return of the Overlays!

From Don Y@blockedofcourse@foo.invalid to comp.arch.embedded on Tue Jan 13 17:03:54 2026

From Newsgroup: comp.arch.embedded

The use of overlays was a hack to get around small address spaces.
But, it disciplined developers to partition their code into
self-consistent chunks -- you couldn't flip back and forth
between overlaid images (like you could with bank switching).
You had to ensure any indentifiers/labels you needed "now"
were accessible "from here".

In a VMM environment, you don't care -- you just let the
page get faulted in while you wait (assuming it's not presently
in place).

My apps tend to run "forever". Yet, lots of resources they
use are only transitory; no need for them to persist beyond
the point where they were used.

[Think of initialization code; once done, you're not going
to revisit it until the application is restarted/reloaded]

I'd like to be able to shed resources that are no longer
needed. But, have some assurances that they truly *aren't*
needed (referenced) going forward. This lets the system free
up those resources for use by other applications (and runtime
diagnostics, once you know "no one" is using a resource).

[I can do this automatically or let the developer do it "on demand"
by lumping resources in specific sections with judicious use
of linker scripts, etc. Applications developed with this
in mind will tend to be more resilient because they will tend
to be allowed to continue execution when the system is running
in overload; applications that hold onto unneeded resources
will look "more wasteful" and selected for removal.]

But, there is nothing in the traditional build process that
ensures references from "some point" in the code don't refer
back to some *other* point (that you thought you were done with).

The overlay build process had to ensure THIS overlay didn't refer to
anything in THAT overlay. (Bankswitching code had to rely on "trampoline" logic in some shared/persistent area to get from one bank to another,
but, there was nothing prohibiting such references!)

Does anyone still develop with overlays (where the overlay has to
replace the existing overlay at run time)? Besides personal
familiarity with the codebase, how do you select code portions
to place in each overlay?

Note, this is different than paged VMM where individual pages are
swapped in and out on demand (I have no backing store so once "out",
coming back *in* is costly).

--- Synchronet 3.21a-Linux NewsLink 1.2

From George Neuner@gneuner2@comcast.net to comp.arch.embedded on Wed Jan 14 01:11:21 2026

From Newsgroup: comp.arch.embedded

Hi Don,

On Tue, 13 Jan 2026 17:03:54 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

The use of overlays was a hack to get around small address spaces.
But, it disciplined developers to partition their code into
self-consistent chunks -- you couldn't flip back and forth
between overlaid images (like you could with bank switching).
You had to ensure any indentifiers/labels you needed "now"
were accessible "from here".

In a VMM environment, you don't care -- you just let the
page get faulted in while you wait (assuming it's not presently
in place).

My apps tend to run "forever". Yet, lots of resources they
use are only transitory; no need for them to persist beyond
the point where they were used.

[Think of initialization code; once done, you're not going
to revisit it until the application is restarted/reloaded]

I'd like to be able to shed resources that are no longer
needed. But, have some assurances that they truly *aren't*
needed (referenced) going forward. This lets the system free
up those resources for use by other applications (and runtime
diagnostics, once you know "no one" is using a resource).

[I can do this automatically or let the developer do it "on demand"
by lumping resources in specific sections with judicious use
of linker scripts, etc. Applications developed with this
in mind will tend to be more resilient because they will tend
to be allowed to continue execution when the system is running
in overload; applications that hold onto unneeded resources
will look "more wasteful" and selected for removal.]

But, there is nothing in the traditional build process that
ensures references from "some point" in the code don't refer
back to some *other* point (that you thought you were done with).

Granted most existing compilers do not understand the concept of
"overlay" per se, but they do certainly understand both definition and execution scope [in many languages these are identical, but in some
they can be different].

The problems with overlays almost all are related to closures and
[escape] continuations - in C terms: stored function pointers and
exceptions (or longjmp).

The overlay build process had to ensure THIS overlay didn't refer to
anything in THAT overlay. (Bankswitching code had to rely on "trampoline" >logic in some shared/persistent area to get from one bank to another,
but, there was nothing prohibiting such references!)

Just scope handling.

The issue with (classic) C was it had only global and local scopes.
Supporting overlays required the addition of non-global-non-local
scopes [at the load module level at least], and a bit of runtime magic
behind the scenes to load/unload them.

Does anyone still develop with overlays ...

I haven't seen an overlay compiler since 8-bit. Of course, 8-bit
cpus/mpus still are used in the embedded world.

More broadly: explicity loaded DLLs [ using dlopen()/dlsym(), LoadLibrary()/GetProcAddress(), etc. ] could be considered modern
moral equivalents of overlays.

Though they may not share memory (or other resources) and their
coming/going is not handled automagically by the compiler runtime,
they are similar [in the geometric meaning] in that: they might not be loaded/mapped when needed, they may be nested, and storing references
to anything defined or created by them can be frought with danger.

Granted, few developers ever muck with explicit DLL management ...
most will just let the linker/runtime/loader handle it.

... (where the overlay has to
replace the existing overlay at run time)? Besides personal
familiarity with the codebase, how do you select code portions
to place in each overlay?

Deliberately ignoring namespaces and "static", the problem with C is
that functions all are defined globally, so any function potentially
can be called from anywhere.

Think instead how you would code it in Pascal ... or any language
having nested functions but without using closures. Functions that
support specific computations (as opposed to generally useful) should
be defined / in scope only where they need to be callable.

Disjoint definition scopes then are exactly analogous to overlays, and
the functions defined within them can be packaged appropriately.

Note that grafting this onto a language lacking the notion of nested
functions will not be easy.

Note, this is different than paged VMM where individual pages are
swapped in and out on demand (I have no backing store so once "out",
coming back *in* is costly).

George
--- Synchronet 3.21a-Linux NewsLink 1.2

From Don Y@blockedofcourse@foo.invalid to comp.arch.embedded on Wed Jan 14 13:39:07 2026

From Newsgroup: comp.arch.embedded

Hi George,

Hoping all is well (Mom?)

I'd like to be able to shed resources that are no longer
needed. But, have some assurances that they truly *aren't*
needed (referenced) going forward. This lets the system free
up those resources for use by other applications (and runtime
diagnostics, once you know "no one" is using a resource).

[I can do this automatically or let the developer do it "on demand"
by lumping resources in specific sections with judicious use
of linker scripts, etc. Applications developed with this
in mind will tend to be more resilient because they will tend
to be allowed to continue execution when the system is running
in overload; applications that hold onto unneeded resources
will look "more wasteful" and selected for removal.]

But, there is nothing in the traditional build process that
ensures references from "some point" in the code don't refer
back to some *other* point (that you thought you were done with).

Granted most existing compilers do not understand the concept of
"overlay" per se, but they do certainly understand both definition and execution scope [in many languages these are identical, but in some
they can be different].

The problems with overlays almost all are related to closures and
[escape] continuations - in C terms: stored function pointers and
exceptions (or longjmp).

I am hoping to assume "modules" can't be split (so the compiler
always knows where local targets reside) and confine the "problem"
to the linkage editor -- assuming the targeted module can "need
help" being accessed.

I further assume that "overlay" is a misnomer; that the address space
is large enough that all targets have unique addresses and just need
assistance being accessed. (or, prohibitions against FURTHER access
if they have been "discarded"/marked "extinct")

The overlay build process had to ensure THIS overlay didn't refer to
anything in THAT overlay. (Bankswitching code had to rely on "trampoline" >> logic in some shared/persistent area to get from one bank to another,
but, there was nothing prohibiting such references!)

Just scope handling.

If they can coexist in a shared address space, that's an issue.
But, if you're just trying to ensure nothing "here" ever refers
to something "there" (marked extinct), I don't think it is as
much of an issue.

The problem becomes one of discipline -- "inherently" knowing how to
organize your modules/references so you can be "sure" when you execute
one specific line of code (that marks a section as "extinct") that you
will NEVER reference anything that it containED (past tense because it
no longer exists)

Think in terms of FORTRAN's "COMMON" and "CHAIN"; you want to be able
to put any shared data created in "section 1" into COMMON and then
fall into the code in the next "section", having the benefit of
all that "common" data.

Knowing that the code from the first section is forever gone.

[Ideally, I would like to be able ot section off more varied instances
of resources instead of just chopping a program into discrete CONSECUTIVE sections.]

The issue with (classic) C was it had only global and local scopes. Supporting overlays required the addition of non-global-non-local
scopes [at the load module level at least], and a bit of runtime magic
behind the scenes to load/unload them >

Does anyone still develop with overlays ...

I haven't seen an overlay compiler since 8-bit. Of course, 8-bit
cpus/mpus still are used in the embedded world.

More broadly: explicity loaded DLLs [ using dlopen()/dlsym(), LoadLibrary()/GetProcAddress(), etc. ] could be considered modern
moral equivalents of overlays.

Yes. Though you can REload a DLL if you decide you need it
later. The CHAIN/COMMON distinction was that prior sections
are gone -- until you restart the program.

Imagine constant pressure on pages to be swapped out. So,
they effectively end up swapped out once they are no longer
being referenced. Now, at some point, disable the backing
store so they can never be faulted back *in*.

This is fine -- if there is never a NEED to re-reference
them. But, a disaster (SIGSEGV) if something forgot that
there was one more outstanding reference.

If you test, thoroughly, you should be able to catch all
such references. But, chances are, you'll miss some exception
(not "Exception") that requires access to something in that
"old" section. But, the runtime won't miss it -- it will
segfault and crash your "program".

Though they may not share memory (or other resources) and their
coming/going is not handled automagically by the compiler runtime,
they are similar [in the geometric meaning] in that: they might not be loaded/mapped when needed, they may be nested, and storing references
to anything defined or created by them can be frought with danger.

Exactly. And, you can "never" be sure that you have examined/tested
all such use cases. How many little utility/helper routines do you
reference throughout your code without ever remembering to place them
in the "SHARED" section so they are *always* present?

Granted, few developers ever muck with explicit DLL management ...
most will just let the linker/runtime/loader handle it.

And, loading a DLL still, conceptually, requires a trip to the
disk subsystem (even if cached).

... (where the overlay has to
replace the existing overlay at run time)? Besides personal
familiarity with the codebase, how do you select code portions
to place in each overlay?

Deliberately ignoring namespaces and "static", the problem with C is
that functions all are defined globally, so any function potentially
can be called from anywhere.

Think instead how you would code it in Pascal ... or any language
having nested functions but without using closures. Functions that
support specific computations (as opposed to generally useful) should
be defined / in scope only where they need to be callable.

Disjoint definition scopes then are exactly analogous to overlays, and
the functions defined within them can be packaged appropriately.

But, that assumes those objects are NEVER accessed -- despite the
obvious need to access them /at least once/ to gain entry to that domain.

Note that grafting this onto a language lacking the notion of nested functions will not be easy.

I can handle some cases easily:

main() {
Initialize()

Extinctify(&Initialize)

DoWork()
}

and place "Initialize" in its own section, commanding the linker to locate
it "conveniently" (likely on a page-frame boundary soas to maximize the
amount of usable space in that page-frame).

But, aside from such obvious choices, I think it is hard to mentally
subdivide a piece of code for such a partitioning. And, then, remembering
to "extinctify" portions that you no longer need. You'd have to be
keenly aware of the cost of each such "portion" of the algorithm so
you could identify things that could/should be excised.

[Remember, you can group several small resources into a larger
resource -- e.g., page -- that would merit excision whereas
any of the smaller resources would seem insignificant. Look
at command line parsing -- and ALL the related code -- to see
how quickly this adds up (don't forget crt0.s!)]

The link map could help MANUALLY verify that there are no other, LATER references to something in one of those extinct-ed areas. But, I
don't see an easy way (language agnostic) to automate this to provide
a high degree of confidence that you won't throw a SIGSEGV at some
*distant* future date when some combination of events conspires to
invoke an identifier (data or text) that you'd previously thought
"no longer of use".

Note, this is different than paged VMM where individual pages are
swapped in and out on demand (I have no backing store so once "out",
coming back *in* is costly).

George

--- Synchronet 3.21a-Linux NewsLink 1.2

From John Levine@johnl@taugh.com to comp.arch.embedded on Wed Jan 14 20:39:54 2026

From Newsgroup: comp.arch.embedded

According to Don Y <blockedofcourse@foo.invalid>:

But, there is nothing in the traditional build process that
ensures references from "some point" in the code don't refer
back to some *other* point (that you thought you were done with).

Depends on your tools.

Does anyone still develop with overlays (where the overlay has to
replace the existing overlay at run time)? Besides personal
familiarity with the codebase, how do you select code portions
to place in each overlay?

Still develop, no, but back in the 1980s I was one of the developers of Javelin,
a modelling package that ran on MS-DOS. We wrote it in Wizard C, which later became Turbo C, in medium model so all of the static data was in one segment that was always resident, with the code in each module in its own segment. We used a third party linker that provided an overlay scheme similar to the OS/360 one I described in "Linkers and Loaders."

The linker let us (well, me) assign code modules to overlays and complained if it saw calls that weren't strictly up or down the overlay tree. The digital origami involved a lot of trial and error and guessing to see what in fact caused overlay swapping, largely affected in our case by the way the users used the program. There were about six views of the model of which you could put any two on the screen at once, so I had to try and guess which views were likely to be used together so I could put them in the same overlay.
--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
--- Synchronet 3.21a-Linux NewsLink 1.2

From Grant Edwards@invalid@invalid.invalid to comp.arch.embedded on Wed Jan 14 21:00:06 2026

From Newsgroup: comp.arch.embedded

On 2026-01-14, John Levine <johnl@taugh.com> wrote:

According to Don Y <blockedofcourse@foo.invalid>:

But, there is nothing in the traditional build process that
ensures references from "some point" in the code don't refer
back to some *other* point (that you thought you were done with).

Depends on your tools.

Indeed, it's pretty simple to do something like that with gcc and
binutils. You can link each "chunk" separately so that it exposes
only a few (or even no) global symbols.

Then you link those chunks together to build an executable, you know
there's a very limited (or empty) set of "connections" between
them. However, that set of possible connections is static and doesn't
change over time the way it does when you swap overlays in/out.

I do this regularly to eliminate the possibilty of accidental name
collisions and so that I know exactly what functions can by called by
"others" outside a chunk/module.

That's not quite the same as overlays, because all the chunks/modules
are resident in memory all the time, and there might be multiple
threads active in various different chunks at any point in time --
even though they can't access each other's functions or variables.

--
Grant
--- Synchronet 3.21a-Linux NewsLink 1.2

From Don Y@blockedofcourse@foo.invalid to comp.arch.embedded on Wed Jan 14 18:48:28 2026

From Newsgroup: comp.arch.embedded

On 1/14/2026 1:39 PM, John Levine wrote:

According to Don Y <blockedofcourse@foo.invalid>:

But, there is nothing in the traditional build process that
ensures references from "some point" in the code don't refer
back to some *other* point (that you thought you were done with).

Depends on your tools.

Yes. I can create the overlays easily (with current tools as well as
legacy tools).

But, these typically resemble "slow bank switching" -- you can conceivably
flip back and forth between overlays at will (Borland's implementation
even allows multiple overlays to be in place concurrently, depending on
their sizes and the size of the memory segment you've previously set aside
for them to share).

So, where you allocate each identifier (text & data) becomes important
in determining performance as switching between overlays effectively
limits throughput.

In my use case, once you abandon an overlay, it is no longer accessible.
I.e., this is more similar to the CHAIN command in FORTRAN; "the past is past".

Does anyone still develop with overlays (where the overlay has to
replace the existing overlay at run time)? Besides personal
familiarity with the codebase, how do you select code portions
to place in each overlay?

Still develop, no, but back in the 1980s I was one of the developers of Javelin,
a modelling package that ran on MS-DOS. We wrote it in Wizard C, which later became Turbo C, in medium model so all of the static data was in one segment that was always resident, with the code in each module in its own segment. We used a third party linker that provided an overlay scheme similar to the OS/360
one I described in "Linkers and Loaders."

Yes, but bank switching hardware is orders of magnitude more efficient than overlays (that have to be loaded from disk) and considerably more capable than "load once" (my case).

If you choose a bad mapping (identifiers to sections), bank switching will still allow you to make progress through the code, albeit slower than an "optimal" mapping/access pattern.

In my case, if you "leave" a mapped section and then try to reference the addresses that it occupied, you SIGSEGV. So, you have to KNOW if there are
any lingering references to ANY of the identifiers in that section before
you unmap it.

The linker let us (well, me) assign code modules to overlays and complained if
it saw calls that weren't strictly up or down the overlay tree. The digital origami involved a lot of trial and error and guessing to see what in fact caused overlay swapping, largely affected in our case by the way the users used
the program. There were about six views of the model of which you could put any
two on the screen at once, so I had to try and guess which views were likely to
be used together so I could put them in the same overlay.

I've never had to deal with "tool complaints" as where the identifiers resided simply affected how quickly those references could be resolved. Switching to another bank just looked like "slow memory"... not DREADFULLY slow memory (because of an intervening disk access)

As I mentioned in a reply to George, there are some low hanging fruit that
you can address:

main() {
Initialize()

Extinctify(&Initialize)

DoStuff()
}

My loader will drag in the resources for Initialize. That code will be invoked. Then, the OS hook that permanently marks that address range as inaccessible and discards its contents (so other parts of the system can
make use of them).

Conceptually, this is easy for a user to understand because ALL references
that Initialize() makes are encapsulated in that invocation. And,
presumably, main() only invokes Initialize exactly once.

[Note that Initialize can invoke other sections and those won't be
affected by the "extinction event"]

--- Synchronet 3.21a-Linux NewsLink 1.2

From Don Y@blockedofcourse@foo.invalid to comp.arch.embedded on Wed Jan 14 18:55:22 2026

From Newsgroup: comp.arch.embedded

On 1/14/2026 2:00 PM, Grant Edwards wrote:

On 2026-01-14, John Levine <johnl@taugh.com> wrote:

According to Don Y <blockedofcourse@foo.invalid>:

But, there is nothing in the traditional build process that
ensures references from "some point" in the code don't refer
back to some *other* point (that you thought you were done with).

Depends on your tools.

Indeed, it's pretty simple to do something like that with gcc and
binutils. You can link each "chunk" separately so that it exposes
only a few (or even no) global symbols.

Then you link those chunks together to build an executable, you know
there's a very limited (or empty) set of "connections" between
them. However, that set of possible connections is static and doesn't
change over time the way it does when you swap overlays in/out.

That only tackles half of the problem: it tells you the identifiers
that *can* be referenced (i.e., are exported). You then have to find each potential reference to those and verify that all of them occur before
("in execution time") the invocation of the mechanism that permanently
flushes them (code+data).

I do this regularly to eliminate the possibilty of accidental name
collisions and so that I know exactly what functions can by called by "others" outside a chunk/module.

That's not quite the same as overlays, because all the chunks/modules
are resident in memory all the time, and there might be multiple
threads active in various different chunks at any point in time --
even though they can't access each other's functions or variables.

--- Synchronet 3.21a-Linux NewsLink 1.2

From George Neuner@gneuner2@comcast.net to comp.arch.embedded on Thu Jan 15 03:06:51 2026

From Newsgroup: comp.arch.embedded

On Wed, 14 Jan 2026 13:39:07 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

Hi George,

Hoping all is well (Mom?)

Past expiration date.

But, there is nothing in the traditional build process that
ensures references from "some point" in the code don't refer
back to some *other* point (that you thought you were done with).

With discipline it (miserably) can be done.
See below.

Granted most existing compilers do not understand the concept of
"overlay" per se, but they do certainly understand both definition and
execution scope [in many languages these are identical, but in some
they can be different].

The problems with overlays almost all are related to closures and
[escape] continuations - in C terms: stored function pointers and
exceptions (or longjmp).

I am hoping to assume "modules" can't be split (so the compiler
always knows where local targets reside) and confine the "problem"
to the linkage editor -- assuming the targeted module can "need
help" being accessed.

I further assume that "overlay" is a misnomer; that the address space
is large enough that all targets have unique addresses and just need >assistance being accessed. (or, prohibitions against FURTHER access
if they have been "discarded"/marked "extinct")

Technically, "overlay" does not require sharing of addresses. It was
often necessary to reuse address space in the limited memory available
to early micros, but the real defining aspect of "overlay" is that the
code is transient and so may not be there (have to be loaded) when it
is called.

Actually, one quite common use of overlays involved customization such
that calling a particular function x() did something different
depending on configuration or user settings. A lot of early word
processors relied on this for formatting and printing - bringing in
different code depending on what they wanted to do. Today we'd do this
with DLLs - but they (often) did it with code overlays.

The overlay build process had to ensure THIS overlay didn't refer to
anything in THAT overlay. (Bankswitching code had to rely on "trampoline" >>> logic in some shared/persistent area to get from one bank to another,
but, there was nothing prohibiting such references!)

Just scope handling.

If they can coexist in a shared address space, that's an issue.
But, if you're just trying to ensure nothing "here" ever refers
to something "there" (marked extinct), I don't think it is as
much of an issue.

The problem becomes one of discipline -- "inherently" knowing how to
organize your modules/references so you can be "sure" when you execute
one specific line of code (that marks a section as "extinct") that you
will NEVER reference anything that it containED (past tense because it
no longer exists)

Think in terms of FORTRAN's "COMMON" and "CHAIN"; you want to be able
to put any shared data created in "section 1" into COMMON and then
fall into the code in the next "section", having the benefit of
all that "common" data.

Knowing that the code from the first section is forever gone.

Data produced by X and consumed later by Y is not a problem unless X
and Y disagree on the data's format.

Problems are only possible when there can exist stored references to
things which no longer exist. In Fortran that was not possible: a
COMMON block could not include pointers [even in the Fortrans that had pointers], and so CHAIN with COMMON could not fail in that way.

Similarly, creating a data structure stored by the main program with
overlay X and then swapping in overlay Y to process the data will not
be a problem so long as the data structure contains no references to
anything inside X.

[Ideally, I would like to be able ot section off more varied instances
of resources instead of just chopping a program into discrete CONSECUTIVE >sections.]

:
More broadly: explicity loaded DLLs [ using dlopen()/dlsym(),
LoadLibrary()/GetProcAddress(), etc. ] could be considered modern
moral equivalents of overlays.

Yes. Though you can REload a DLL if you decide you need it
later. The CHAIN/COMMON distinction was that prior sections
are gone -- until you restart the program.

You're confused.

With explicit DLL management, the programmer has to deliberately write
code to load a DLL and map its exported API to function pointers. If
the DLL then is unloaded, the mapped pointers become garbage: trying
to call the functions will _NOT_ reload the DLL - rather it will,
almost certainly, crash the program.

If the program was written in a CHAIN/COMMON fashion, using and
disgarding a progression of DLLs, then there will be NO programmer
supplied code to "go back" and reload one of them.

... (where the overlay has to
replace the existing overlay at run time)? Besides personal
familiarity with the codebase, how do you select code portions
to place in each overlay?

Deliberately ignoring namespaces and "static", the problem with C is
that functions all are defined globally, so any function potentially
can be called from anywhere.

Think instead how you would code it in Pascal ... or any language
having nested functions but without using closures. Functions that
support specific computations (as opposed to generally useful) should
be defined / in scope only where they need to be callable.

Disjoint definition scopes then are exactly analogous to overlays, and
the functions defined within them can be packaged appropriately.

But, that assumes those objects are NEVER accessed -- despite the
obvious need to access them /at least once/ to gain entry to that domain.

Again confused. Consider the following and [for simplicity] assume
Pascal scoping rules:

MAIN
F1
G1, H1
F2
G2, H2

For the purposes of illustration here we will disallow recursive calls
to the F's. With that stipulation, it should be obvious that G1/H1
and G2/H2 exist in disjoint scopes - unable to see each other.

Eg., G2 can't call G1 or H1, or reference any data internal to F1.
[In Pascal, G2 could _call_ F1, but we're disallowing that here.]

All that is needed for an overlay is the entry point. There may be
more than one, but that is not relevant here. F1 is the entry point
that gets you to G1 and H1. (see below).

The analogy is not perfect: Pascal doesn't understand overlays. But
its scope rules make overlays relatively easy to support.

You'd have even more control using scope rules from Scheme or ML, but
I assume [perhaps wrongly] that - in this forum - readers likely would
be more familiar with Pascal.

Note that grafting this onto a language lacking the notion of nested
functions will not be easy.

I can handle some cases easily:

main() {
Initialize()

Extinctify(&Initialize)

DoWork()
}

and place "Initialize" in its own section, commanding the linker to locate
it "conveniently" (likely on a page-frame boundary soas to maximize the >amount of usable space in that page-frame).

But, aside from such obvious choices, I think it is hard to mentally >subdivide a piece of code for such a partitioning. And, then, remembering
to "extinctify" portions that you no longer need. You'd have to be
keenly aware of the cost of each such "portion" of the algorithm so
you could identify things that could/should be excised.

It's hard because you work in languages that make it hard(er).

The most straightforward way to arrange the code in C would be to
separate the disjoint "overlay" scopes by source file / compilation
unit.

pseudo'ing the example above in C:

----------------
F1();
F2();
main()
{
F1();
F2();
}
----------------
static G1() {}
static H1() {}
F1()
{
...
}
----------------
static G2() {}
static H2() {}
F2()
{
...
}
----------------

where the horizontal lines denote source file / compilation unit
boundaries.

This way the compiler will catch most problems, and the code will be
in modules that potentially could be loaded/unloaded independently
(assuming you have a way to do that).

But, as you said, it takes some dicipline.

The best way is to just use DLLs if you can. Export only the entry
points and everything else will be hidden.

George
--- Synchronet 3.21a-Linux NewsLink 1.2

From Don Y@blockedofcourse@foo.invalid to comp.arch.embedded on Thu Jan 15 03:25:30 2026

From Newsgroup: comp.arch.embedded

Past expiration date.

Sorry to hear that.

But, there is nothing in the traditional build process that
ensures references from "some point" in the code don't refer
back to some *other* point (that you thought you were done with).

With discipline it (miserably) can be done.
See below.

In addition to knowing what MIGHT be accessible (exported), you
also have to track down every reference to each of those and
identify when, in time, they occur (or might occur) relative to
your explicitly declaring them to be "no longer needed".

THAT is the tough part -- a link map tells you "what talks to
what" but says nothing about when, in time, those interactions occur.
If FOO calls BAR and BAR calls BAZ, then anything that calls
BAZ, BAR *or* FOO, at a point in time AFTER you have declared
BAZ to no longer be needed will SIGSEGV.

You have to keep track of these interdependencies when making
decisions declaring particular "identifiers" (code or data)
to be no longer needed.

The "Initialization()" example I provided is easily to conceive of
occurring exactly once in a "program's" execution. So, as long as
it is only invoked once AND THERE ARE NO PATHS BACK TO Main() -- in
my example -- there is no way it can be reaccessed/referenced after
the "extinctify" invocation that follows it.

Can you say that about string(3C) functions -- can you decide
when those library functions are no longer needed, thereby
freeing up the resources that they require? Or, some function
that you wrote to handle a particular error condition?

The overlay build process had to ensure THIS overlay didn't refer to
anything in THAT overlay. (Bankswitching code had to rely on "trampoline" >>>> logic in some shared/persistent area to get from one bank to another,
but, there was nothing prohibiting such references!)

Just scope handling.

If they can coexist in a shared address space, that's an issue.
But, if you're just trying to ensure nothing "here" ever refers
to something "there" (marked extinct), I don't think it is as
much of an issue.

The problem becomes one of discipline -- "inherently" knowing how to
organize your modules/references so you can be "sure" when you execute
one specific line of code (that marks a section as "extinct") that you
will NEVER reference anything that it containED (past tense because it
no longer exists)

Think in terms of FORTRAN's "COMMON" and "CHAIN"; you want to be able
to put any shared data created in "section 1" into COMMON and then
fall into the code in the next "section", having the benefit of
all that "common" data.

Knowing that the code from the first section is forever gone.

Data produced by X and consumed later by Y is not a problem unless X
and Y disagree on the data's format.

The format isn't the issue. Rather, that X /no longer exists/.
References to the address(es) that it occupied aren't mapped to
anything, anymore.

You must be sure Y (and every other X consumer) have made use
of X before you unmap its resources. THAT is what is hard to track
because tools don't lay out the temporal relationships of various
identifiers.

Problems are only possible when there can exist stored references to
things which no longer exist.

The references need not be explicitly "stored" but, rather, can be
part of an instruction stream generated by a compiler.

In Fortran that was not possible: a
COMMON block could not include pointers [even in the Fortrans that had pointers], and so CHAIN with COMMON could not fail in that way.

I mention COMMON and CHAIN because they implement the mechanisms that
must be present for one part of a "program" to do work and pass those
results to a followup part (via COMMON) while execution is passed
to that followup part (via CHAIN).

You (typ) need data and code bridges to connect separate parts of a program together, esp if you are deliberately trying to cut a program into smaller pieces to reduce its "current" footprint (for any value of "current")

Similarly, creating a data structure stored by the main program with
overlay X and then swapping in overlay Y to process the data will not
be a problem so long as the data structure contains no references to
anything inside X.

That doesn't address having the data structure overlaid (or, in
my case, unmapped) because *IT* was considered "no longer needed".

At some point in the "earlier" execution of the "program", that data
structure had meaning. But, one the developer declares that data to
no longer be needed, it disappears. Any code that later references it
(in error!) crashes. No, it doesn't get the wrong values for the data
or interpret the values incorrectly. The memory reference simply FAILS.

[Ideally, I would like to be able ot section off more varied instances
of resources instead of just chopping a program into discrete CONSECUTIVE
sections.]

:
More broadly: explicity loaded DLLs [ using dlopen()/dlsym(),
LoadLibrary()/GetProcAddress(), etc. ] could be considered modern
moral equivalents of overlays.

Yes. Though you can REload a DLL if you decide you need it
later. The CHAIN/COMMON distinction was that prior sections
are gone -- until you restart the program.

You're confused.

Reread the above.

With explicit DLL management, the programmer has to deliberately write
code to load a DLL and map its exported API to function pointers. If
the DLL then is unloaded, the mapped pointers become garbage: trying
to call the functions will _NOT_ reload the DLL - rather it will,
almost certainly, crash the program.

That is EXACTLY the situation I am describing. The developer explicitly decided the DLL was NO LONGER NEEDED. *He* unloaded it. If he wasn't disciplined enough to know that he wasn't yet done with it, then his
program crashes.

However, he can choose to unload the DLL (to reduce his resource
usage) and then, at some later time, explicitly REload it as if
for the first time and make continued use of it.

With CHAIN/COMMON, the past is past. The code that preceded is no longer available (unless you rerun the program).

Bank switching, overlays, etc. don't suffer that fate.

Virtual memory that has been swapped out doesn't, either.

But, my deliberate marking of portions of that memory as "no longer
needed" (mapped) discards them and their contents, irretrievably.
The only way to recreate the data and code is to restart the
program.

I.e., there is no remedy for "extinctifying" resources prematurely.
You *must* know where every possible reference exists in space AND TIME
(so you don't "prematurely" declare it unneeded).

If the program was written in a CHAIN/COMMON fashion, using and
disgarding a progression of DLLs, then there will be NO programmer
supplied code to "go back" and reload one of them.

As in my intended use case.

Note that grafting this onto a language lacking the notion of nested
functions will not be easy.

I can handle some cases easily:

main() {
Initialize()

Extinctify(&Initialize)

DoWork()
}

and place "Initialize" in its own section, commanding the linker to locate >> it "conveniently" (likely on a page-frame boundary soas to maximize the
amount of usable space in that page-frame).

But, aside from such obvious choices, I think it is hard to mentally
subdivide a piece of code for such a partitioning. And, then, remembering >> to "extinctify" portions that you no longer need. You'd have to be
keenly aware of the cost of each such "portion" of the algorithm so
you could identify things that could/should be excised.

It's hard because you work in languages that make it hard(er).

You have some codebase. Tools will tell you where a particular
identifier is referenced. You can use that to build a list
of other objects that implicitly reference that identifier.

*What* is going to tell you WHEN any of those objects are
invoked, relative to a particular statement that is executed
at a specific point in time:
unmap the memory used by identifier X

If your chosen language and programming style arranges all
statements sequentially, then you can just look for any
references to the identifier in question (and all other
identifiers that directly or indirectly reference it)
and verify that they all occur BEFORE the statement in
question.

If your language allows for multiple modules to be
defined in different source files (including libraries)
and allows them to be invoked in arbitrary time orderings,
then what's your approach? Besides inspecting each such
reference?

The most straightforward way to arrange the code in C would be to
separate the disjoint "overlay" scopes by source file / compilation
unit.

pseudo'ing the example above in C:

----------------
F1();
F2();
main()
{
F1();
F2();
}
----------------
static G1() {}
static H1() {}
F1()
{
...
}
----------------
static G2() {}
static H2() {}
F2()
{
...
}
----------------

X4()
{
X7()
}
------
X5()
{
F2()
}
-----
X6()
{
if (blah)
X4()
}
-----
X7()
{
F1()
}
----
X8()
{
X4()
}

[This isn't really far-fetched with libraries that can be interrelated or interact in other ways]

At some point in time, in some module, I decide to unmap the resources that
F1 consumes. Maybe that happens conditionally.

But, at some later point in time, X8() is invoked. Or, X4. Or, X7, Or,
X6 with blah being true.

You have to track the dependency hierarchy for all of these "objects"
and ensure that none of them are invoked *after* you have unmapped F1.
Because you can't recreate F1, its resources or its actual "value"/meaning.

And, you can't exhaustively test to ensure that every possible path through
the application is exercised. You could easily have a latent bug that
only surfaces in some particular set of circumstances that you didn't
consider, test or encounter, before.

And, "you" likely won't be handling the segfault so can't do anything
about it.

We aren't accustomed to having objects (code or data) disappear during
the course of developing or executing a piece of code. If "foo"
existed at some point in the program -- and you are executing in
the same context and scope, then why would you NOT expect it to still
be there?

You can do an RPC and then be surprised that the NEXT invocation
fails -- because the remote host disappeared. You code against
that possibility.

But, you expect everything that YOU defined/created to remain
accessible. When you wrote that "late" X8() invocation, you likely
weren't thinking "have I already unmapped any of the resources
on which it relies?"

where the horizontal lines denote source file / compilation unit
boundaries.

This way the compiler will catch most problems, and the code will be
in modules that potentially could be loaded/unloaded independently
(assuming you have a way to do that).

But, as you said, it takes some dicipline.

The best way is to just use DLLs if you can. Export only the entry
points and everything else will be hidden.

I just delete objects and whatever they contain/represent goes away
in that action. Whether it is a collection of data that I no longer
need, a group of functions, a subassembly, etc. The developer
has to decide how to group "things" for their utility and temporal
pertinence.

You'd not want to group atexit() and command line processing in the same
object because you would likely want to shed the command line processing resources early in execution -- just after processing the argument list.
Yet, atexit() would want to linger until the literal last gasp, regardless
of how/when that occurs.

--- Synchronet 3.21a-Linux NewsLink 1.2

From George Neuner@gneuner2@comcast.net to comp.arch.embedded on Sat Jan 17 01:23:38 2026

From Newsgroup: comp.arch.embedded

On Thu, 15 Jan 2026 03:25:30 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

But, there is nothing in the traditional build process that
ensures references from "some point" in the code don't refer
back to some *other* point (that you thought you were done with).

With discipline it (miserably) can be done.
See below.

In addition to knowing what MIGHT be accessible (exported), you
also have to track down every reference to each of those and
identify when, in time, they occur (or might occur) relative to
your explicitly declaring them to be "no longer needed".

THAT is the tough part -- a link map tells you "what talks to
what" but says nothing about when, in time, those interactions occur.
If FOO calls BAR and BAR calls BAZ, then anything that calls
BAZ, BAR *or* FOO, at a point in time AFTER you have declared
BAZ to no longer be needed will SIGSEGV.

You have to keep track of these interdependencies when making
decisions declaring particular "identifiers" (code or data)
to be no longer needed.

No ... the module simply need to reference count active calls.
[Unless there can be circular dependencies among modules - which you
should disallow (or see below).]

If you load the module explicitly, calls into it will have to be made
through mapped function pointers, so if you null[*] the pointers when
first *requesting* an unload then you can prevent new calls while
waiting for active calls to complete.

When you load the module you initialize its count to 1.
When you unload you subtract 1 from the count (allowing it to zero).

Each *exported* function increments the count on entry and decrements
it on exit. When the last active call exits the count will to go to
zero, and at that point the module is not in use and can be unloaded.
You can track the count inside the module as cleanup to each function,
or you can poll it from outside the module. Works either way.

This is the way COM controls work, and the way Corba used to work.
[Haven't looked at Corba for quite a while.]

[*] or point them to an error handler

The "Initialization()" example I provided is easily to conceive of
occurring exactly once in a "program's" execution. So, as long as
it is only invoked once AND THERE ARE NO PATHS BACK TO Main() -- in
my example -- there is no way it can be reaccessed/referenced after
the "extinctify" invocation that follows it.

Can you say that about string(3C) functions -- can you decide
when those library functions are no longer needed, thereby
freeing up the resources that they require? Or, some function
that you wrote to handle a particular error condition?

Yes, I can. BTDTGTTS.

Think in terms of FORTRAN's "COMMON" and "CHAIN"; you want to be able
to put any shared data created in "section 1" into COMMON and then
fall into the code in the next "section", having the benefit of
all that "common" data.

Knowing that the code from the first section is forever gone.

Data produced by X and consumed later by Y is not a problem unless X
and Y disagree on the data's format.

The format isn't the issue. Rather, that X /no longer exists/.
References to the address(es) that it occupied aren't mapped to
anything, anymore.

If you're following the Fortran metaphor, there won't be any
references to missing modules BECAUSE there won't be any function
pointers contained in the data.

Modules should export only functions - not data objects.

Data pointers can be allowed because the data exists outside of any
transient module. The "main" program has to provide the storage.

You must be sure Y (and every other X consumer) have made use
of X before you unmap its resources. THAT is what is hard to track
because tools don't lay out the temporal relationships of various >identifiers.

Problems are only possible when there can exist stored references to
things which no longer exist.

The references need not be explicitly "stored" but, rather, can be
part of an instruction stream generated by a compiler.

No they can't be. The only way to reference a module is via function
pointers that are explicitly mapped. Yes, the compiler could try to
call a function through an unmapped pointer ... and that would be bad.

So what? That's a bug in the program.

In Fortran that was not possible: a
COMMON block could not include pointers [even in the Fortrans that had
pointers], and so CHAIN with COMMON could not fail in that way.

I mention COMMON and CHAIN because they implement the mechanisms that
must be present for one part of a "program" to do work and pass those
results to a followup part (via COMMON) while execution is passed
to that followup part (via CHAIN).

You (typ) need data and code bridges to connect separate parts of a program >together, esp if you are deliberately trying to cut a program into smaller >pieces to reduce its "current" footprint (for any value of "current")

Remember DOS? Again, BTDTGTTS.

But realize that, even with Fortran, you could invoke the same
subprogram repeatedly if the COMMON data structure it expected was
there.

With Fortran90's modules it became easier, but could be done with
COMMON and CHAIN also.

Similarly, creating a data structure stored by the main program with
overlay X and then swapping in overlay Y to process the data will not
be a problem so long as the data structure contains no references to
anything inside X.

That doesn't address having the data structure overlaid (or, in
my case, unmapped) because *IT* was considered "no longer needed".

That is up to the loader code, which ALWAYS should preserve the
program's heap. Again, no data should be stored BY the modules on
behalf of the program. Data for the main program always should be
returned to the main program for storage.

The main program doesn't necessarily need to understand the data ...
it can be treated as opaque and require some module(s) to do anything
with it ... but the main program has to be responsible for the
storage.

At some point in the "earlier" execution of the "program", that data >structure had meaning. But, one the developer declares that data to
no longer be needed, it disappears. Any code that later references it
(in error!) crashes. No, it doesn't get the wrong values for the data
or interpret the values incorrectly. The memory reference simply FAILS.

Again, that's a bug in the program.

With explicit DLL management, the programmer has to deliberately write
code to load a DLL and map its exported API to function pointers. If
the DLL then is unloaded, the mapped pointers become garbage: trying
to call the functions will _NOT_ reload the DLL - rather it will,
almost certainly, crash the program.

That is EXACTLY the situation I am describing. The developer explicitly >decided the DLL was NO LONGER NEEDED. *He* unloaded it. If he wasn't >disciplined enough to know that he wasn't yet done with it, then his
program crashes.

So what?

However, he can choose to unload the DLL (to reduce his resource
usage) and then, at some later time, explicitly REload it as if
for the first time and make continued use of it.

With CHAIN/COMMON, the past is past. The code that preceded is no longer >available (unless you rerun the program).

Maybe. Again, a Fortran subprogram (or module) can be invoked any
time its initial conditions are met.

... my deliberate marking of portions of that memory as "no longer
needed" (mapped) discards them and their contents, irretrievably.
The only way to recreate the data and code is to restart the
program.

Then you are doing it wrong. Transient modules can't be allowed to
store data.

You have some codebase. Tools will tell you where a particular
identifier is referenced. You can use that to build a list
of other objects that implicitly reference that identifier.

*What* is going to tell you WHEN any of those objects are
invoked, relative to a particular statement that is executed
at a specific point in time:
unmap the memory used by identifier X

Reference count. Better yet a tracing GC, but that would have to be
embedded into the runtime or operating system. Refcount can be done
by the modules themselves.

------
X4()
{
X7()
}
------
X5()
{
F2()
}
-----
X6()
{
if (blah)
X4()
}
-----
X7()
{
F1()
}
----
X8()
{
X4()
}

[This isn't really far-fetched with libraries that can be interrelated or >interact in other ways]

At some point in time, in some module, I decide to unmap the resources that >F1 consumes. Maybe that happens conditionally.

But, at some later point in time, X8() is invoked. Or, X4. Or, X7, Or,
X6 with blah being true.

You have to track the dependency hierarchy for all of these "objects"
and ensure that none of them are invoked *after* you have unmapped F1. >Because you can't recreate F1, its resources or its actual "value"/meaning.

And, you can't exhaustively test to ensure that every possible path through >the application is exercised. You could easily have a latent bug that
only surfaces in some particular set of circumstances that you didn't >consider, test or encounter, before.

You don't need to test other than the program runs, but if you have
circular dependencies you need to TRACE/TRACK them AT RUNTIME because
refcount won't work (at least not without a tracing backup to break
circles).

We aren't accustomed to having objects (code or data) disappear during
the course of developing or executing a piece of code. If "foo"
existed at some point in the program -- and you are executing in
the same context and scope, then why would you NOT expect it to still
be there?

If I loaded it and haven't yet unloaded it, it ought to be there
regardless of what any other code has done with it. If this is not
true, then the system is unstable.

You can do an RPC and then be surprised that the NEXT invocation
fails -- because the remote host disappeared. You code against
that possibility.

If you are treating EVERY function call as if RPC, then your programs
will be 2% functionality and 98% error handling. And few people ever
will be able to write code for your system.

I just delete objects and whatever they contain/represent goes away
in that action. Whether it is a collection of data that I no longer
need, a group of functions, a subassembly, etc. The developer
has to decide how to group "things" for their utility and temporal >pertinence.

And so you have to guide programmers toward a structure that works -
not just let them do whatever they wish.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Don Y@blockedofcourse@foo.invalid to comp.arch.embedded on Sun Jan 25 01:46:33 2026

From Newsgroup: comp.arch.embedded

On 1/13/2026 5:03 PM, Don Y wrote:

Does anyone still develop with overlays (where the overlay has to
replace the existing overlay at run time)?-a Besides personal
familiarity with the codebase, how do you select code portions
to place in each overlay?

This actually wasn't as difficult as I thought it would be.
It's (unfortunately!) more of a "mindset" issue than anything else.

We had an offsite, this past week that let me poke at their code and
demo my prototypes.

The same BASIC problem keeps coming up: "internalizing" the technology. Understanding at an intellectual level hasn't lead to an intuitive "feel"
for it. The "closed box" design mentality (where everything is present and under your SOLE control) seems hard to shake!

Treating "overlays" as SEQUENTIAL portions of an application is a big
win as it cuts the application "width-wise" (across the time axis)
while multiple threads cut it LENGTH-wise (into concurrent actions).

The first big partitioning is separating the "startup/initialization"
phase from run-time activities. ALL of the housekeeping to set up a service/agency/client (name resolution, probing hardware complement,
looking up configuration options, delayed binding, instantiation, etc.)
is done during "startup". But, once done, the resources used to do that
can be discarded AS the "run-time" code is being loaded.

The (static linked portion of the) binary is effectively diced up
*after* linkage, so there is no cost to loading an overlay "late"
(as long as this is done before any ACTIVE references into it); it
just looks like that "section" of the code was faulted in, as a
block (even if the pages are individually faulted in!)

Section names are available to the application as memory objects via
its private namespace. So, you can choose to locate pieces of code
(as well as stubs for specific object classes) in individual sections
that you can then shed (or avoid loading) when you're sure you won't
be accessing (or servicing!) any more objects of that particular type
(or associated methods)

*Within* a section, I leave it to the developer to sort out which
resources it can shed, and when. This seems a decent tradeoff as service/agency/clients that are most concerned with resource usage
will likely welcome fine-grained control -- not wanting the OS to
assume responsibilities for things that they can directly manage
(and, that tools can't "mind read")

E.g., if the LOCAL hardware probe discovers the attached camera
doesn't have PTZ capabilities, then why bear the cost of the methods
for them as they'll always return "FAIL". Or, if the node is
currently not *configured* to record audio, then why bear those
costs? Some OTHER use of the node may require them so they can't be
elided from the compile-time binary but THIS instance doesn't need them!

A compiler will never be able to ascertain these things from an
examination of the sources as it doesn't know how the user/site
will deploy or configure the application. Code that will NEVER
be executed (in this example) still has to be present to resolve
identifiers that will never actually be referenced. There's no
possibility of a SIGSEGV because the code (those portions of the
address space that it occupies) is effectively dead.

[Unmapping those parts of the address space ensures any ERRONEOUS
reference to them is caught by the OS!]

The developer, OTOH, can "notice" those option settings and take
active steps to shed resources it knows won't be accessed. His
familiarity with the codebase determines how well he can partition
resources -- assuming tools to do so are available at build and
run times.

In a typical "sealed box" product, the code for ALL possibilities
has to be present -- even if never referenced -- because you can't
change that reality after the box has been "sealed"! (and, there's
nothing to be gained if the actual hardware resources must still
be present AND CAN'T BE REPURPOSED FOR OTHER TASKS!)

--- Synchronet 3.21a-Linux NewsLink 1.2

From Don Y@blockedofcourse@foo.invalid to comp.arch.embedded on Sun Jan 25 02:10:18 2026

From Newsgroup: comp.arch.embedded

On 1/25/2026 1:46 AM, Don Y wrote:

On 1/13/2026 5:03 PM, Don Y wrote:

Does anyone still develop with overlays (where the overlay has to
replace the existing overlay at run time)?-a Besides personal
familiarity with the codebase, how do you select code portions
to place in each overlay?

This actually wasn't as difficult as I thought it would be.
It's (unfortunately!) more of a "mindset" issue than anything else.

We had an offsite, this past week that let me poke at their code and
demo my prototypes.

I prepared this FGA as a gentle tickler. The analogies are strained but
it serves to readily refresh the underlying issues (once understood)!

<https://mega.nz/file/cjJSGJyD#dktL5vQ19tCXIXWtY4Tuetb4WMFzo6P7SV0NDVz0NoY>

--- Synchronet 3.21a-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Geek2
  Thu Jul 2 11:41:05 2026
  from Euclid, Oh via Telnet
- Hannibal
  Thu Jul 2 05:49:27 2026
  from Des Moines via SSH
- Geek2
  Wed Jul 1 16:31:20 2026
  from Euclid, Oh via Telnet
- Hannibal
  Tue Jun 30 16:45:42 2026
  from Des Moines via SSH

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	70
Nodes:	6 (0 / 6)
Uptime:	39:17:18
Calls:	948
Calls today:	2
Files:	1,325
Messages:	280,644

Return of the Overlays!

Who's Online

Recent Visitors

System Info