Forum: Too Lazy BBS

Comparing algorithm costs

From Don Y@blockedofcourse@foo.invalid to sci.electronics.design on Wed Apr 15 13:44:31 2026

From Newsgroup: sci.electronics.design

I'm trying to sort out the "nominal" cost of using AVL trees to store a "dictionary".

It's relatively easy to sort out the cost of constructing it -- esp
if that occurs prior to load time.

But, I'm trying to come up with a model that reflects dynamic changes
to the structure "at runtime". So I can sort out a break-even point
for it vs. other algorithms.

Of course, it depends on the ACTUAL keys inserted/removed and the
current context. But, so would every other algorithm!

Pointers? Suggestions?
--- Synchronet 3.21f-Linux NewsLink 1.2

From Ross Finlayson@ross.a.finlayson@gmail.com to sci.electronics.design on Wed Apr 15 13:57:00 2026

From Newsgroup: sci.electronics.design

On 04/15/2026 01:44 PM, Don Y wrote:

I'm trying to sort out the "nominal" cost of using AVL trees to store a "dictionary".

It's relatively easy to sort out the cost of constructing it -- esp
if that occurs prior to load time.

But, I'm trying to come up with a model that reflects dynamic changes
to the structure "at runtime". So I can sort out a break-even point
for it vs. other algorithms.

Of course, it depends on the ACTUAL keys inserted/removed and the
current context. But, so would every other algorithm!

Pointers? Suggestions?

Usually called "maintenance" of the data structure, the
"organization" and "re-organization".

The, "adaptive algorithms" or "adaptive containers", is about
data structures reorganizing according to feedback-directed
optimization and the like.

--- Synchronet 3.21f-Linux NewsLink 1.2

From joegwinn@joegwinn@comcast.net to sci.electronics.design on Wed Apr 15 18:19:50 2026

From Newsgroup: sci.electronics.design

On Wed, 15 Apr 2026 13:44:31 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

I'm trying to sort out the "nominal" cost of using AVL trees to store a >"dictionary".

It's relatively easy to sort out the cost of constructing it -- esp
if that occurs prior to load time.

But, I'm trying to come up with a model that reflects dynamic changes
to the structure "at runtime". So I can sort out a break-even point
for it vs. other algorithms.

Of course, it depends on the ACTUAL keys inserted/removed and the
current context. But, so would every other algorithm!

Pointers? Suggestions?

It's O[log n] no matter what, so you won't do better, so just use it
and be happy.

Ref: <https://en.wikipedia.org/wiki/AVL_tree>

Joe
--- Synchronet 3.21f-Linux NewsLink 1.2

From john larkin@jl@glen--canyon.com to sci.electronics.design on Wed Apr 15 16:26:31 2026

From Newsgroup: sci.electronics.design

On Wed, 15 Apr 2026 13:44:31 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

I'm trying to sort out the "nominal" cost of using AVL trees to store a >"dictionary".

It's relatively easy to sort out the cost of constructing it -- esp
if that occurs prior to load time.

But, I'm trying to come up with a model that reflects dynamic changes
to the structure "at runtime". So I can sort out a break-even point
for it vs. other algorithms.

Of course, it depends on the ACTUAL keys inserted/removed and the
current context. But, so would every other algorithm!

Pointers? Suggestions?

Post a schematic.

John Larkin
Highland Tech Glen Canyon Design Center
Lunatic Fringe Electronics
--- Synchronet 3.21f-Linux NewsLink 1.2

From Buzz McCool@buzz_mccool@yahoo.com to sci.electronics.design on Wed Apr 15 17:32:17 2026

From Newsgroup: sci.electronics.design

On 4/15/26 1:44 PM, Don Y wrote:

I'm trying to sort out the "nominal" cost of using AVL trees to store a "dictionary".

...

Pointers?-a Suggestions?

Perhaps a book on tree data structures would compare/contrast different
types of trees and already have determined which is least compute and/or memory intensive rather than having to experimentally find this out for yourself.

Knuth's TAOCP might be a place to start.
--- Synchronet 3.21f-Linux NewsLink 1.2

From Don Y@blockedofcourse@foo.invalid to sci.electronics.design on Wed Apr 15 17:56:06 2026

From Newsgroup: sci.electronics.design

On 4/15/2026 3:19 PM, joegwinn@comcast.net wrote:

Of course, it depends on the ACTUAL keys inserted/removed and the
current context. But, so would every other algorithm!

Pointers? Suggestions?

It's O[log n] no matter what, so you won't do better, so just use it
and be happy.

Ref: <https://en.wikipedia.org/wiki/AVL_tree>

You assume a (potentially) relatively high cost of rebalancing the tree
on each insertion or deletion. So, the win for search is offset by
that hit -- which only pays off if you then do other searches.

E.g., if you are adding/removing lots of entries in creating a tree, there
is a relatively high cost that may never pay off (as you may later only
opt to do one "lookup").

And, of course, the order in which you add entries has an optimal case
(as well as pathological).

I.e., naive structures can prove to be more efficient (time/space),
depending on their contents and manner of construction.

E.g., locating file descriptor 38 is likely more efficient than
locating file "38".

--- Synchronet 3.21f-Linux NewsLink 1.2

From Don Y@blockedofcourse@foo.invalid to sci.electronics.design on Wed Apr 15 18:06:27 2026

From Newsgroup: sci.electronics.design

On 4/15/2026 5:32 PM, Buzz McCool wrote:

On 4/15/26 1:44 PM, Don Y wrote:

I'm trying to sort out the "nominal" cost of using AVL trees to store a
"dictionary".

...

Pointers?-a Suggestions?

Perhaps a book on tree data structures would compare/contrast different
types of trees and already have determined which is least compute and/or memory
intensive rather than having to experimentally find this out for yourself.

That's not the issue. Search/sort/lookup algorithms have all been *generally* characterized as to performance (typ and worst case). But, in practice,
how they are actually *used* dramatically alters ACTUAL performance.

Imagine creating a directory (folder) on a storage medium. How should
it's "list of contents" be constructed to make for the most efficient
use? *You*, the creator, likely know what that usage pattern (and
list of potential contents) will likely be. Yet, you (the creator)
aren't given any input into the actual implementation chosen for
a folder with 4 entries or 400,000. Or, a folder in which some
particular entries are "favored" (in use) over others.

You want the structure to be optimized by the *access* order (weighted by
the types of accesses), not some arbitrary order that humans conceive of
in structuring data (e.g., alphabetic, etc.)

Knuth's TAOCP might be a place to start.

--- Synchronet 3.21f-Linux NewsLink 1.2

From joegwinn@joegwinn@comcast.net to sci.electronics.design on Thu Apr 16 12:05:43 2026

From Newsgroup: sci.electronics.design

On Wed, 15 Apr 2026 17:32:17 -0700, Buzz McCool
<buzz_mccool@yahoo.com> wrote:

On 4/15/26 1:44 PM, Don Y wrote:

I'm trying to sort out the "nominal" cost of using AVL trees to store a
"dictionary".

...

Pointers?a Suggestions?

Perhaps a book on tree data structures would compare/contrast different
types of trees and already have determined which is least compute and/or >memory intensive rather than having to experimentally find this out for >yourself.

Knuth's TAOCP might be a place to start.

That's exactly right. .<https://en.wikipedia.org/wiki/The_Art_of_Computer_Programming>

I have a paper copy of Volumes 1-3 bought in the late 1970s. I made
much use of it.

Joe
--- Synchronet 3.21f-Linux NewsLink 1.2

From Don Y@blockedofcourse@foo.invalid to sci.electronics.design on Thu Apr 16 09:39:57 2026

From Newsgroup: sci.electronics.design

On 4/16/2026 9:05 AM, joegwinn@comcast.net wrote:

On Wed, 15 Apr 2026 17:32:17 -0700, Buzz McCool
<buzz_mccool@yahoo.com> wrote:

On 4/15/26 1:44 PM, Don Y wrote:

I'm trying to sort out the "nominal" cost of using AVL trees to store a
"dictionary".

...

Pointers?-a Suggestions?

Perhaps a book on tree data structures would compare/contrast different
types of trees and already have determined which is least compute and/or
memory intensive rather than having to experimentally find this out for
yourself.

Knuth's TAOCP might be a place to start.

That's exactly right. .<https://en.wikipedia.org/wiki/The_Art_of_Computer_Programming>

I have a paper copy of Volumes 1-3 bought in the late 1970s. I made
much use of it.

All computer science text books suffer from a lack of practicality.
Why isn't Raita used universally -- if it so much more efficient?
(esp wrt brute force). Or, Red-Black trees? B trees? H trees?
AVL trees? Why so many solutions as their performance characteristics
(in the GENERAL sense) can easily be ranked.

As I said, "in practice, how they are actually *used* dramatically alters ACTUAL performance."

Pesky little details like:
- NUMA (let alone NORMA!)
- multiple threads concurrently accessing LARGE data structures
- cost of the "compare" operator relative to the structure itself
- failure resilience
etc.

Things that make a "school boy" approach perform terribly in a real-world application.
--- Synchronet 3.21f-Linux NewsLink 1.2

From John R Walliker@jrwalliker@gmail.com to sci.electronics.design on Thu Apr 16 19:20:43 2026

From Newsgroup: sci.electronics.design

On 16/04/2026 17:39, Don Y wrote:

On 4/16/2026 9:05 AM, joegwinn@comcast.net wrote:

On Wed, 15 Apr 2026 17:32:17 -0700, Buzz McCool
<buzz_mccool@yahoo.com> wrote:

On 4/15/26 1:44 PM, Don Y wrote:

I'm trying to sort out the "nominal" cost of using AVL trees to store a >>>> "dictionary".

...

Pointers?-a Suggestions?

Perhaps a book on tree data structures would compare/contrast different
types of trees and already have determined which is least compute and/or >>> memory intensive rather than having to experimentally find this out for
yourself.

Knuth's TAOCP might be a place to start.

That's exactly right.
.<https://en.wikipedia.org/wiki/The_Art_of_Computer_Programming>

I have a paper copy of Volumes 1-3 bought in the late 1970s.-a I made
much use of it.

All computer science text books suffer from a lack of practicality.
Why isn't Raita used universally -- if it so much more efficient?
(esp wrt brute force).-a Or, Red-Black trees?-a B trees?-a H trees?
AVL trees?-a Why so many solutions as their performance characteristics
(in the GENERAL sense) can easily be ranked.

As I said, "in practice, how they are actually *used* dramatically alters ACTUAL performance."

Pesky little details like:
- NUMA (let alone NORMA!)
- multiple threads concurrently accessing LARGE data structures
- cost of the "compare" operator relative to the structure itself
- failure resilience
etc.

Things that make a "school boy" approach perform terribly in a real-world application.

Knuth documented a huge amount a very long time ago.
How much more has really been done since then in terms
of algorithm development?
I don't have my own copy, but I know several others whose
bookshelves are bending under the weight of Knuth.

Back in the real world, I discovered the hard way that unless
it is a huge project it is better to keep things simple
and not worry to much about efficiency. CPU cycles are cheap
compared with the time spent on designing a clever algorithm
and making sure that it really works reliably.
John

--- Synchronet 3.21f-Linux NewsLink 1.2

From =?UTF-8?Q?Niocl=C3=A1s_P=C3=B3l_Caile=C3=A1n?= de Ghloucester@thanks-to@Taf.com to sci.electronics.design on Thu Apr 16 19:08:47 2026

From Newsgroup: sci.electronics.design

Don Y <blockedOFCOURSE@foo.invalid> wrote: |-------------------------------------------------------------------------| |"Imagine creating a directory (folder)" | |-------------------------------------------------------------------------|

Electronic engineers of course do not need to be told that "directory"
is "folder"!

|-------------------------------------------------------------------------| |"How should | |it's "list of contents" be constructed to make for the most efficient | |use? *You*, the creator, likely know what that usage pattern (and | |list of potential contents) will likely be. Yet, you (the creator) | |aren't given any input into the actual implementation chosen for |
|a folder with 4 entries or 400,000. [. . .] |
|[. . .] |
| | |You want the structure to be optimized by the *access* order (weighted by| |the types of accesses)[. ..]" | |-------------------------------------------------------------------------|

I used Wget or whatever in circa 2003 to download the comp.arch.fpga
archive that SIGDA used to host. It is not an mbox of many posts -
instead each post therein is a single file called like 10001.txt and
10002.txt etc.

They are in many subdirectories, so I decided to burn a version to a
CD but with them all in one directory.

This reorganization does not transpire to be an
improvement. Contrarily it slows down accesses via e.g.
ls *
by many MINUTES. So I decided to burn another copy - preserving these subdirectories of SIGDA.

(I actually could not copy them all into the same directory (excluding subdirectories) without renamings, because after I started I detected
that a few subdirectories had identically named different posts. SIGDA
uniquely named most of them, so I copied most of them into a single
directory.)
(S. HTTP://Gloucester.Insomnia247.NL/ fuer Kontaktdaten!)
--- Synchronet 3.21f-Linux NewsLink 1.2

From joegwinn@joegwinn@comcast.net to sci.electronics.design on Thu Apr 16 15:13:53 2026

From Newsgroup: sci.electronics.design

On Thu, 16 Apr 2026 09:39:57 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

On 4/16/2026 9:05 AM, joegwinn@comcast.net wrote:

On Wed, 15 Apr 2026 17:32:17 -0700, Buzz McCool
<buzz_mccool@yahoo.com> wrote:

On 4/15/26 1:44 PM, Don Y wrote:

I'm trying to sort out the "nominal" cost of using AVL trees to store a >>>> "dictionary".

...

Pointers?a Suggestions?

Perhaps a book on tree data structures would compare/contrast different
types of trees and already have determined which is least compute and/or >>> memory intensive rather than having to experimentally find this out for
yourself.

Knuth's TAOCP might be a place to start.

That's exactly right.
.<https://en.wikipedia.org/wiki/The_Art_of_Computer_Programming>

I have a paper copy of Volumes 1-3 bought in the late 1970s. I made
much use of it.

All computer science text books suffer from a lack of practicality.
Why isn't Raita used universally -- if it so much more efficient?
(esp wrt brute force). Or, Red-Black trees? B trees? H trees?
AVL trees? Why so many solutions as their performance characteristics
(in the GENERAL sense) can easily be ranked.

As I said, "in practice, how they are actually *used* dramatically alters >ACTUAL performance."

Pesky little details like:
- NUMA (let alone NORMA!)
- multiple threads concurrently accessing LARGE data structures
- cost of the "compare" operator relative to the structure itself
- failure resilience
etc.

Things that make a "school boy" approach perform terribly in a real-world >application.

I take it you had not heard of Knuth. Not a school boy at all. Full
professor and known to be the top of that field, and has personally
written many very useful applications.

TAOCP covers the foundations at a deep level, so all kinds and classes
of solutions are there, compared and contrasted.

Anyway, the various sections are now available online in pdf form, so
there is no reason why you cannot read the section about binary trees
and the like. This is often good for creating new approaches.

.<https://www-cs-faculty.stanford.edu/~knuth/taocp.html#:~:text=These%20volumes%20are%20now%20available%20also%20in%20portable>

As for searching binary trees, here is the Wiki summary: .<https://en.wikipedia.org/wiki/Optimal_binary_search_tree>

Joe
--- Synchronet 3.21f-Linux NewsLink 1.2

From Don Y@blockedofcourse@foo.invalid to sci.electronics.design on Thu Apr 16 13:07:15 2026

From Newsgroup: sci.electronics.design

On 4/16/2026 12:13 PM, joegwinn@comcast.net wrote:

On Thu, 16 Apr 2026 09:39:57 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

On 4/16/2026 9:05 AM, joegwinn@comcast.net wrote:

On Wed, 15 Apr 2026 17:32:17 -0700, Buzz McCool
<buzz_mccool@yahoo.com> wrote:

On 4/15/26 1:44 PM, Don Y wrote:

I'm trying to sort out the "nominal" cost of using AVL trees to store a >>>>> "dictionary".

...

Pointers?-a Suggestions?

Perhaps a book on tree data structures would compare/contrast different >>>> types of trees and already have determined which is least compute and/or >>>> memory intensive rather than having to experimentally find this out for >>>> yourself.

Knuth's TAOCP might be a place to start.

That's exactly right.
.<https://en.wikipedia.org/wiki/The_Art_of_Computer_Programming>

I have a paper copy of Volumes 1-3 bought in the late 1970s. I made
much use of it.

All computer science text books suffer from a lack of practicality.
Why isn't Raita used universally -- if it so much more efficient?
(esp wrt brute force). Or, Red-Black trees? B trees? H trees?
AVL trees? Why so many solutions as their performance characteristics
(in the GENERAL sense) can easily be ranked.

As I said, "in practice, how they are actually *used* dramatically alters
ACTUAL performance."

Pesky little details like:
- NUMA (let alone NORMA!)
- multiple threads concurrently accessing LARGE data structures
- cost of the "compare" operator relative to the structure itself
- failure resilience
etc.

Things that make a "school boy" approach perform terribly in a real-world
application.

I take it you had not heard of Knuth.

I have first editions (beige linen hardcover) or 1-3. He started on 4
while I was in school but them refocused on TeX (of course, I also
have all of the TeX books ("Computers and Typesetting") in hardcover
(and those that weren't available as such in paperback)

Not a school boy at all. Full
professor and known to be the top of that field, and has personally
written many very useful applications.

But disconnected with real-world issues encountered in "appliances".
The algorithms are largely described in a sterile, "workstation" (MIX) environment.

Ask yourself how two competing threads would access a Red-Black tree
when one is inserting/searching/deleting while the other is performing
any of the above operations. Do you put a big lock on the entire
structure? And, prevent the task holding the lock from being preempted?
What if the tree is stored in FLASH where a write/update is costly (wear)
and time consuming?

How does paging affect the performance of various sort algorithms?
Caching?

These are real world issues that are completely ignored in the texts.

An algorithm running on an abstract machine is just a teaching tool;
it doesn't provide practical information.

[Remember when "loop unrolling" was an optimization technique? And
then came caches... <frown> It's not enough to be able to get a piece
of code to produce an answer if HOW it produces it is "uninspired".
Witness the floating point number thread.]

TAOCP covers the foundations at a deep level, so all kinds and classes
of solutions are there, compared and contrasted.

Anyway, the various sections are now available online in pdf form, so
there is no reason why you cannot read the section about binary trees
and the like. This is often good for creating new approaches.

.<https://www-cs-faculty.stanford.edu/~knuth/taocp.html#:~:text=These%20volumes%20are%20now%20available%20also%20in%20portable>

As for searching binary trees, here is the Wiki summary: .<https://en.wikipedia.org/wiki/Optimal_binary_search_tree>

Joe

--- Synchronet 3.21f-Linux NewsLink 1.2

From Don Y@blockedofcourse@foo.invalid to sci.electronics.design on Thu Apr 16 13:53:15 2026

From Newsgroup: sci.electronics.design

On 4/16/2026 11:20 AM, John R Walliker wrote:

Things that make a "school boy" approach perform terribly in a real-world
application.

Knuth documented a huge amount a very long time ago.
How much more has really been done since then in terms
of algorithm development?

There are things that are now commercially viable/practical
that weren't really part of the conversation 50 years ago.
Public key cryptography post-dated Knuth's work. As did most
AI. (My copy of Winston was 8.5x11" photocopied sheets as
he was revising the text as he was teaching it -- he welcomed
bug reports in class)

E.g., it is now common for cheap processors to have memory
onboard, large register files, cache, address spaces with
different access characteristics, etc. Memory management
units, ECC syndrome handling, floating point and cryptographic
accelerators, vector/SIMD processors, etc.

These capabilities alter how you approach problems.

I don't have my own copy, but I know several others whose
bookshelves are bending under the weight of Knuth.

I've scanned all of my textbooks -- bits are much lighter! :>

Back in the real world, I discovered the hard way that unless
it is a huge project it is better to keep things simple
and not worry to much about efficiency.-a CPU cycles are cheap
compared with the time spent on designing a clever algorithm
and making sure that it really works reliably.

The bigger risk is the effort subsequent developers will have
to invest to understand what you have done and how it can
be expected to behave. People "look at" code but seldom
with a critical eye (witness the number of latent bugs found
in FOSS where dozens/scores/hundreds of eyes have had a chance
to discover them -- but failed!)

[I designed a front panel for a Nova minicomputer that
used 7-segment decoders as a hack to implement the required
logic *AND* O.C. output stages. My boss was so afraid of
it's magic that he made me redesign it with a handful of
devices instead of the one or two I'd originally used]

CPU cycles (and memory, and other resources) should be spent
making code more accurate and reliable. Along with adding
features and mechanisms that make the *product* more reliable.

[E.g., I treat memory much like a disk drive. If a page
starts throwing too many ECC errors, then I mark it as
defective and map a "working" page in its place. If
too many pages are retired in this way, I alert the user.
Much like SMART does.]

A lot depends on scale. Unfortunately, how you start often
limits the extent to which you can realistically scale a
solution. And, inertia tends to constrain subsequent
projects, as well -- as folks try to reuse code and designs.
The belief (delusion?) that "it works" is a strong motivator
for leaving things as they are.

E.g., this post sought information to help in the design of
namespaces -- specifically, (key, capability) tuples.
If you expect to have only a few entries in a "Context"
(the equivalent of a "folder" within a namespace), then
you can use a simple array to store the capabilities
with the key acting as an index. And, do a lookup in
constant time (so much for the "efficiency" of AVL trees!)

Sharing such an object is considerably easier (less expensive)
as you can create a critical region that just includes
the actual array access -- any other work can be done before
(or after) the critical region. Compare that effort to
storing, say, just *10* items in a tree (ANY sort of tree).

[Remember, the thread holding a lock can be preempted so
you REALLY want to keep the critical region small to
minimize the chance of preemption WHILE the lock is held]

I have a "camera" namespace (managed by the camera server
and distributed to individual camera clients). it
has keys like:
- NW looking East
- NW looking South
- NE looking West
- NE looking South
- SW looking East
- SW looking North
...
- Facing Front Door
- Facing Visitor
- Rooftop PTZ
- Back yard PTZ
- Front yard PTZ
- Garage Door Riser, Left Side
- Garage Door Riser, Right Side
- Garage Door Traveler, Left Side
- Garage Door Traveler, Right Side
...
- Watching Sky
- Watching Mailbox
- Watching Driveway

(I use a lot of cameras as once you can process video imagery,
you can do a *lot* with a cheap camera!)

Searching this namespace requires matching strings, even though there
are only a few dozen entries. So, operations that access that
structure likely take longer than accessing an indexed array.

But, once a particular entry is located, the controller can
spawn a "Camera Monitor" with a namespace that consists of just:
- Camera
- PTZ Controller
- IR illuminator
binding the corresponding resources derived from the controller's
namespace.

And, the "Camera Monitor" *knows* that there will be a successful
result when it tries to resolve any of these three names! Not
a guarantee that can be made of the controller's namespace.
I could just as easily call them 1, 2 and 3 (with a more efficient
namespace implementation as an array) with:
enum {"Camera"=1, "PTZ Controller", "IR illuminator"}
to let the developer dress up his code.

OTOH, imagine having 100,000 tuples in the dictionary.
And, keys that want to have special meaning to a developer
(beyond simple integers/indices). Now, you need an efficient
way to examine those 100,000 keys to locate the tuple of
interest (or, determine that it doesn't exist). So, how
you share *that* structure becomes a crucial issue (unless
you subscribe to the "intermittent, hard to track down,
bug" way of coding!)

If the namespace is effectively read-only, then you would
approach its design differently than if you expected a large
number of insert/delete operations.

If the namespace is stored, *scattered* on rotating rust,
there is a cost to accessing it in its entirety and a
risk of loss if those operations are interrupted or
corrupted.

The developer knows what the range of names/keys will likely
be and how they will be accessed. So, *he* should determine how
the NameSpace object is instantiated instead of one-size-fits-all.
Show me where Knuth addresses any of these real-world issues...
--- Synchronet 3.21f-Linux NewsLink 1.2

From Don Y@blockedofcourse@foo.invalid to sci.electronics.design on Thu Apr 16 14:01:20 2026

From Newsgroup: sci.electronics.design

|-------------------------------------------------------------------------| |"How should | |it's "list of contents" be constructed to make for the most efficient | |use? *You*, the creator, likely know what that usage pattern (and | |list of potential contents) will likely be. Yet, you (the creator) | |aren't given any input into the actual implementation chosen for | |a folder with 4 entries or 400,000. [. . .] | |[. . .] |
| | |You want the structure to be optimized by the *access* order (weighted by| |the types of accesses)[. ..]" | |-------------------------------------------------------------------------|

I used Wget or whatever in circa 2003 to download the comp.arch.fpga
archive that SIGDA used to host. It is not an mbox of many posts -
instead each post therein is a single file called like 10001.txt and 10002.txt etc.

They are in many subdirectories, so I decided to burn a version to a
CD but with them all in one directory.

Often, large datasets are broken into a series of smaller subgroups
to reduce the size of each directory along the path: 1997/January/firstweek/entries for jan 1-7
1997/January/secondweek/entries for jan 8-14
etc.

This takes some of the complexity out of the final directory level and
handles it in higher level directories in the hierarchy.

The risk with such an approach is the temptation to name the actual
entries "1, 2, 3, 4..." in each of the "week" directories. Then,
trying to combine them results in a January with five "1" entries
(one for the 1st-7th, another for the 8th-14th, etc.)

This reorganization does not transpire to be an
improvement. Contrarily it slows down accesses via e.g.
ls *
by many MINUTES. So I decided to burn another copy - preserving these subdirectories of SIGDA.

Many of the BSDs operate repositories of third party applications
from which users can fetch "packages" (source or binary). Sadly,
these all reside in a ".../distfiles" directory -- thousands of
them! So, access to its contents is time consuming (and don't
even think of "ls -al"!)

Of course, if you want to mirror said contents, you are stuck with the
came constraint!

(I actually could not copy them all into the same directory (excluding subdirectories) without renamings, because after I started I detected
that a few subdirectories had identically named different posts. SIGDA uniquely named most of them, so I copied most of them into a single directory.)

--- Synchronet 3.21f-Linux NewsLink 1.2

From Ross Finlayson@ross.a.finlayson@gmail.com to sci.electronics.design on Fri Apr 17 08:46:16 2026

From Newsgroup: sci.electronics.design

On 04/15/2026 05:56 PM, Don Y wrote:

On 4/15/2026 3:19 PM, joegwinn@comcast.net wrote:

Of course, it depends on the ACTUAL keys inserted/removed and the
current context. But, so would every other algorithm!

Pointers? Suggestions?

It's O[log n] no matter what, so you won't do better, so just use it
and be happy.

Ref: <https://en.wikipedia.org/wiki/AVL_tree>

You assume a (potentially) relatively high cost of rebalancing the tree
on each insertion or deletion. So, the win for search is offset by
that hit -- which only pays off if you then do other searches.

E.g., if you are adding/removing lots of entries in creating a tree, there
is a relatively high cost that may never pay off (as you may later only
opt to do one "lookup").

And, of course, the order in which you add entries has an optimal case
(as well as pathological).

I.e., naive structures can prove to be more efficient (time/space),
depending on their contents and manner of construction.

E.g., locating file descriptor 38 is likely more efficient than
locating file "38".

If the tree is of a bounded size, you could put it into an array
instead of off the heap.

A usual idea is figuring out attributes of unbounded size,
yet usually _small_. For example, something like map-reduce
has expectations that the count of keys is about uniform size
and furthermore that it's not a lot, while often there are
like "modes: principals, majors, minors, and the long tail".

So, if the attribute has count 1 and sometimes count 2-3,
then it might make sense to just make the attribute size
4, then to have it that if the first or last pointer is
null, then the other side is an offset to a store on
the heap, or for example a balanced binary tree of those,
vis-a-vis each being on (off) the heap.

Matters of locality, proximity, and affinity, in many-core
architectures which are usual these days since the like
of AMD "Bulldozer" after dual-core, makes for that modern
architectures like Arm/Thumb, Intel/AMD, or MIPS/RISC-V
are much alike in that the neighborhood topology is a thing
then that it's always a model of a distributed system.

About "data" and "summary", and, the "tractable", tractable
summary, begins usually enough with "counts", and for example
about checksums, yet though that "counts" indicate the size,
where the size is the usual idea for Big-O, Big-Theta, and
Little-Theta notation, of the "asymptotics", since also size
is always assumed to grow "unboundedly" in the consideration
of the "asymptotics" or the "amortization" over time, which
is usually given as the only factor of cost. So, that's not
a complete account, and, for example, then whether a linked-list
or an array is the better backing for any sort of iterative
access and organization and maintenance of a data structure,
has that it varies.

Indeed it's so that if the distribution as of what would be
a random sample is completely known, then an entirely
demonstrably optimum sort of data structure can be made
for it. Then, that the _library algorithms_ and the
_library data structures_ are instead very common and
unspecialized, is about specializations and types in
a sense, and probabilistic accounts.

About the massive-wide and the parallel, then how to make
it so that the data is stored in summary, then usually
for partitions with "uniform random access" by the
discrimination of partitions, has that the organization
of data, for example in hierarchical naming and directory
interfaces or hierarchical time-series data, those being
about opposites yet always in terms of each other,
is much for summary and statistic.

Then, at least one idea has that any data structure
is half-data / half-summary, about usual ideas of
state and scope, and the locality of reference,
then for example having a system that just constantly
updates those and is always re-organizing / re-structuring /
re-sorting / re-building, then that reads and writes
have the same cost.

A usual account is for making reads free, and consistent,
and writes atomic, and independent, vis-a-vis the granular
and bulk and items and batches.

So, vis-a-vis the school-boy or class-room accounts,
for something like Aho Sethi Ullman and "a parser"
vis-a-vis "parsing", modern architecures are themselves
models of self-contained distributed systems.

--- Synchronet 3.21f-Linux NewsLink 1.2

From Ross Finlayson@ross.a.finlayson@gmail.com to sci.electronics.design on Fri Apr 17 09:16:36 2026

From Newsgroup: sci.electronics.design

On 04/17/2026 08:46 AM, Ross Finlayson wrote:

On 04/15/2026 05:56 PM, Don Y wrote:

On 4/15/2026 3:19 PM, joegwinn@comcast.net wrote:

Of course, it depends on the ACTUAL keys inserted/removed and the
current context. But, so would every other algorithm!

Pointers? Suggestions?

It's O[log n] no matter what, so you won't do better, so just use it
and be happy.

Ref: <https://en.wikipedia.org/wiki/AVL_tree>

You assume a (potentially) relatively high cost of rebalancing the tree
on each insertion or deletion. So, the win for search is offset by
that hit -- which only pays off if you then do other searches.

E.g., if you are adding/removing lots of entries in creating a tree,
there
is a relatively high cost that may never pay off (as you may later only
opt to do one "lookup").

And, of course, the order in which you add entries has an optimal case
(as well as pathological).

I.e., naive structures can prove to be more efficient (time/space),
depending on their contents and manner of construction.

E.g., locating file descriptor 38 is likely more efficient than
locating file "38".

If the tree is of a bounded size, you could put it into an array
instead of off the heap.

A usual idea is figuring out attributes of unbounded size,
yet usually _small_. For example, something like map-reduce
has expectations that the count of keys is about uniform size
and furthermore that it's not a lot, while often there are
like "modes: principals, majors, minors, and the long tail".

So, if the attribute has count 1 and sometimes count 2-3,
then it might make sense to just make the attribute size
4, then to have it that if the first or last pointer is
null, then the other side is an offset to a store on
the heap, or for example a balanced binary tree of those,
vis-a-vis each being on (off) the heap.

Matters of locality, proximity, and affinity, in many-core
architectures which are usual these days since the like
of AMD "Bulldozer" after dual-core, makes for that modern
architectures like Arm/Thumb, Intel/AMD, or MIPS/RISC-V
are much alike in that the neighborhood topology is a thing
then that it's always a model of a distributed system.

About "data" and "summary", and, the "tractable", tractable
summary, begins usually enough with "counts", and for example
about checksums, yet though that "counts" indicate the size,
where the size is the usual idea for Big-O, Big-Theta, and
Little-Theta notation, of the "asymptotics", since also size
is always assumed to grow "unboundedly" in the consideration
of the "asymptotics" or the "amortization" over time, which
is usually given as the only factor of cost. So, that's not
a complete account, and, for example, then whether a linked-list
or an array is the better backing for any sort of iterative
access and organization and maintenance of a data structure,
has that it varies.

Indeed it's so that if the distribution as of what would be
a random sample is completely known, then an entirely
demonstrably optimum sort of data structure can be made
for it. Then, that the _library algorithms_ and the
_library data structures_ are instead very common and
unspecialized, is about specializations and types in
a sense, and probabilistic accounts.

About the massive-wide and the parallel, then how to make
it so that the data is stored in summary, then usually
for partitions with "uniform random access" by the
discrimination of partitions, has that the organization
of data, for example in hierarchical naming and directory
interfaces or hierarchical time-series data, those being
about opposites yet always in terms of each other,
is much for summary and statistic.

Then, at least one idea has that any data structure
is half-data / half-summary, about usual ideas of
state and scope, and the locality of reference,
then for example having a system that just constantly
updates those and is always re-organizing / re-structuring /
re-sorting / re-building, then that reads and writes
have the same cost.

A usual account is for making reads free, and consistent,
and writes atomic, and independent, vis-a-vis the granular
and bulk and items and batches.

So, vis-a-vis the school-boy or class-room accounts,
for something like Aho Sethi Ullman and "a parser"
vis-a-vis "parsing", modern architecures are themselves
models of self-contained distributed systems.

About some ideas from distributed systems in "space and time"
of "accesors and mutators" of "vertical and horizontal scaling"
of the "inherently serial and embarrassingly parallel" are
the "eventually consistent" vis-a-vis the "optimistic locking"
about "vending universally unique ID's" and "retry-logic",
issues that arise in any model of coherency and consistency
about "sources of truth and a source of truth" and "points
of change and propagation of change".

--- Synchronet 3.21f-Linux NewsLink 1.2

From =?UTF-8?Q?Niocl=C3=A1s_P=C3=B3l_Caile=C3=A1n?= de Ghloucester@thanks-to@Taf.com to sci.electronics.design on Fri Apr 17 18:30:56 2026

From Newsgroup: sci.electronics.design

Don Y <blockedofcourse@foo.invalid> wrote: |----------------------------------------------------------------|
|"The bigger risk is the effort subsequent developers will have |
|to invest to understand what you have done and how it can |
|be expected to behave. People "look at" code but seldom |
|with a critical eye (witness the number of latent bugs found |
|in FOSS where dozens/scores/hundreds of eyes have had a chance |
|to discover them -- but failed!)" | |----------------------------------------------------------------|

True. Cf. Knuth said to write a paper assuming that a reader does not
bother to read its equations before the 2nd time thru.

|----------------------------------------------------------------|
|"CPU cycles (and memory, and other resources) should be spent |
|making code more accurate and reliable. Along with adding |
|features and mechanisms that make the *product* more reliable. |
| |
|[E.g., I treat memory much like a disk drive. If a page |
|starts throwing too many ECC errors, then I mark it as |
|defective and map a "working" page in its place. If |
|too many pages are retired in this way, I alert the user. |
|Much like SMART does.]" | |----------------------------------------------------------------|

Good.

|-----------------------------------------------------------------|
|"Show me where Knuth addresses any of these real-world issues..."| |-----------------------------------------------------------------|

These very big very late books would be even bigger later if Knuth
would add more considerations to them.
(S. HTTP://Gloucester.Insomnia247.NL/ fuer Kontaktdaten!)
--- Synchronet 3.21f-Linux NewsLink 1.2

From Don Y@blockedofcourse@foo.invalid to sci.electronics.design on Fri Apr 17 15:36:39 2026

From Newsgroup: sci.electronics.design

On 4/17/2026 11:30 AM, Niocl|is P||l Caile|in de Ghloucester wrote:

|-----------------------------------------------------------------|
|"Show me where Knuth addresses any of these real-world issues..."| |-----------------------------------------------------------------|

These very big very late books would be even bigger later if Knuth
would add more considerations to them.

There's nothing that says how large a book can be. Or, how many
"spines" it can have.

Knuth published collections of lecture notes, much later in his
career. But, they touch on issues that aren't really related
to real implementations. I.e., more "school boy" approach where
your hands never really get dirty.
--- Synchronet 3.21f-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Sykotik
  Tue Apr 14 16:22:53 2026
  from Canada via Telnet
- Morningstarr
  Sun Apr 12 15:27:15 2026
  from Concord, Nc via Telnet
- Rixter
  Fri Apr 3 06:57:42 2026
  from Madison, Nc via SSH
- Voltman
  Wed Apr 1 11:17:50 2026
  from Calgary, Alberta via Telnet

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	63
Nodes:	6 (0 / 6)
Uptime:	493001:38:55
Calls:	840
Files:	1,302
D/L today:	9 files (7,185K bytes)
Messages:	267,829

Comparing algorithm costs

Who's Online

Recent Visitors

System Info