• Comparing algorithm costs

    From Don Y@blockedofcourse@foo.invalid to sci.electronics.design on Wed Apr 15 13:44:31 2026
    From Newsgroup: sci.electronics.design

    I'm trying to sort out the "nominal" cost of using AVL trees to store a "dictionary".

    It's relatively easy to sort out the cost of constructing it -- esp
    if that occurs prior to load time.

    But, I'm trying to come up with a model that reflects dynamic changes
    to the structure "at runtime". So I can sort out a break-even point
    for it vs. other algorithms.

    Of course, it depends on the ACTUAL keys inserted/removed and the
    current context. But, so would every other algorithm!

    Pointers? Suggestions?
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Ross Finlayson@ross.a.finlayson@gmail.com to sci.electronics.design on Wed Apr 15 13:57:00 2026
    From Newsgroup: sci.electronics.design

    On 04/15/2026 01:44 PM, Don Y wrote:
    I'm trying to sort out the "nominal" cost of using AVL trees to store a "dictionary".

    It's relatively easy to sort out the cost of constructing it -- esp
    if that occurs prior to load time.

    But, I'm trying to come up with a model that reflects dynamic changes
    to the structure "at runtime". So I can sort out a break-even point
    for it vs. other algorithms.

    Of course, it depends on the ACTUAL keys inserted/removed and the
    current context. But, so would every other algorithm!

    Pointers? Suggestions?

    Usually called "maintenance" of the data structure, the
    "organization" and "re-organization".

    The, "adaptive algorithms" or "adaptive containers", is about
    data structures reorganizing according to feedback-directed
    optimization and the like.


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From joegwinn@joegwinn@comcast.net to sci.electronics.design on Wed Apr 15 18:19:50 2026
    From Newsgroup: sci.electronics.design

    On Wed, 15 Apr 2026 13:44:31 -0700, Don Y
    <blockedofcourse@foo.invalid> wrote:

    I'm trying to sort out the "nominal" cost of using AVL trees to store a >"dictionary".

    It's relatively easy to sort out the cost of constructing it -- esp
    if that occurs prior to load time.

    But, I'm trying to come up with a model that reflects dynamic changes
    to the structure "at runtime". So I can sort out a break-even point
    for it vs. other algorithms.

    Of course, it depends on the ACTUAL keys inserted/removed and the
    current context. But, so would every other algorithm!

    Pointers? Suggestions?

    It's O[log n] no matter what, so you won't do better, so just use it
    and be happy.

    Ref: <https://en.wikipedia.org/wiki/AVL_tree>

    Joe
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From john larkin@jl@glen--canyon.com to sci.electronics.design on Wed Apr 15 16:26:31 2026
    From Newsgroup: sci.electronics.design

    On Wed, 15 Apr 2026 13:44:31 -0700, Don Y
    <blockedofcourse@foo.invalid> wrote:

    I'm trying to sort out the "nominal" cost of using AVL trees to store a >"dictionary".

    It's relatively easy to sort out the cost of constructing it -- esp
    if that occurs prior to load time.

    But, I'm trying to come up with a model that reflects dynamic changes
    to the structure "at runtime". So I can sort out a break-even point
    for it vs. other algorithms.

    Of course, it depends on the ACTUAL keys inserted/removed and the
    current context. But, so would every other algorithm!

    Pointers? Suggestions?

    Post a schematic.


    John Larkin
    Highland Tech Glen Canyon Design Center
    Lunatic Fringe Electronics
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Buzz McCool@buzz_mccool@yahoo.com to sci.electronics.design on Wed Apr 15 17:32:17 2026
    From Newsgroup: sci.electronics.design

    On 4/15/26 1:44 PM, Don Y wrote:
    I'm trying to sort out the "nominal" cost of using AVL trees to store a "dictionary".
    ...
    Pointers?-a Suggestions?

    Perhaps a book on tree data structures would compare/contrast different
    types of trees and already have determined which is least compute and/or memory intensive rather than having to experimentally find this out for yourself.

    Knuth's TAOCP might be a place to start.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Don Y@blockedofcourse@foo.invalid to sci.electronics.design on Wed Apr 15 17:56:06 2026
    From Newsgroup: sci.electronics.design

    On 4/15/2026 3:19 PM, joegwinn@comcast.net wrote:
    Of course, it depends on the ACTUAL keys inserted/removed and the
    current context. But, so would every other algorithm!

    Pointers? Suggestions?

    It's O[log n] no matter what, so you won't do better, so just use it
    and be happy.

    Ref: <https://en.wikipedia.org/wiki/AVL_tree>

    You assume a (potentially) relatively high cost of rebalancing the tree
    on each insertion or deletion. So, the win for search is offset by
    that hit -- which only pays off if you then do other searches.

    E.g., if you are adding/removing lots of entries in creating a tree, there
    is a relatively high cost that may never pay off (as you may later only
    opt to do one "lookup").

    And, of course, the order in which you add entries has an optimal case
    (as well as pathological).

    I.e., naive structures can prove to be more efficient (time/space),
    depending on their contents and manner of construction.

    E.g., locating file descriptor 38 is likely more efficient than
    locating file "38".

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Don Y@blockedofcourse@foo.invalid to sci.electronics.design on Wed Apr 15 18:06:27 2026
    From Newsgroup: sci.electronics.design

    On 4/15/2026 5:32 PM, Buzz McCool wrote:
    On 4/15/26 1:44 PM, Don Y wrote:
    I'm trying to sort out the "nominal" cost of using AVL trees to store a
    "dictionary".
    ...
    Pointers?-a Suggestions?

    Perhaps a book on tree data structures would compare/contrast different
    types of trees and already have determined which is least compute and/or memory
    intensive rather than having to experimentally find this out for yourself.

    That's not the issue. Search/sort/lookup algorithms have all been *generally* characterized as to performance (typ and worst case). But, in practice,
    how they are actually *used* dramatically alters ACTUAL performance.

    Imagine creating a directory (folder) on a storage medium. How should
    it's "list of contents" be constructed to make for the most efficient
    use? *You*, the creator, likely know what that usage pattern (and
    list of potential contents) will likely be. Yet, you (the creator)
    aren't given any input into the actual implementation chosen for
    a folder with 4 entries or 400,000. Or, a folder in which some
    particular entries are "favored" (in use) over others.

    You want the structure to be optimized by the *access* order (weighted by
    the types of accesses), not some arbitrary order that humans conceive of
    in structuring data (e.g., alphabetic, etc.)

    Knuth's TAOCP might be a place to start.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From joegwinn@joegwinn@comcast.net to sci.electronics.design on Thu Apr 16 12:05:43 2026
    From Newsgroup: sci.electronics.design

    On Wed, 15 Apr 2026 17:32:17 -0700, Buzz McCool
    <buzz_mccool@yahoo.com> wrote:

    On 4/15/26 1:44 PM, Don Y wrote:
    I'm trying to sort out the "nominal" cost of using AVL trees to store a
    "dictionary".
    ...
    Pointers?a Suggestions?

    Perhaps a book on tree data structures would compare/contrast different
    types of trees and already have determined which is least compute and/or >memory intensive rather than having to experimentally find this out for >yourself.

    Knuth's TAOCP might be a place to start.

    That's exactly right. .<https://en.wikipedia.org/wiki/The_Art_of_Computer_Programming>

    I have a paper copy of Volumes 1-3 bought in the late 1970s. I made
    much use of it.

    Joe
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Don Y@blockedofcourse@foo.invalid to sci.electronics.design on Thu Apr 16 09:39:57 2026
    From Newsgroup: sci.electronics.design

    On 4/16/2026 9:05 AM, joegwinn@comcast.net wrote:
    On Wed, 15 Apr 2026 17:32:17 -0700, Buzz McCool
    <buzz_mccool@yahoo.com> wrote:

    On 4/15/26 1:44 PM, Don Y wrote:
    I'm trying to sort out the "nominal" cost of using AVL trees to store a
    "dictionary".
    ...
    Pointers?-a Suggestions?

    Perhaps a book on tree data structures would compare/contrast different
    types of trees and already have determined which is least compute and/or
    memory intensive rather than having to experimentally find this out for
    yourself.

    Knuth's TAOCP might be a place to start.

    That's exactly right. .<https://en.wikipedia.org/wiki/The_Art_of_Computer_Programming>

    I have a paper copy of Volumes 1-3 bought in the late 1970s. I made
    much use of it.

    All computer science text books suffer from a lack of practicality.
    Why isn't Raita used universally -- if it so much more efficient?
    (esp wrt brute force). Or, Red-Black trees? B trees? H trees?
    AVL trees? Why so many solutions as their performance characteristics
    (in the GENERAL sense) can easily be ranked.

    As I said, "in practice, how they are actually *used* dramatically alters ACTUAL performance."

    Pesky little details like:
    - NUMA (let alone NORMA!)
    - multiple threads concurrently accessing LARGE data structures
    - cost of the "compare" operator relative to the structure itself
    - failure resilience
    etc.

    Things that make a "school boy" approach perform terribly in a real-world application.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From John R Walliker@jrwalliker@gmail.com to sci.electronics.design on Thu Apr 16 19:20:43 2026
    From Newsgroup: sci.electronics.design

    On 16/04/2026 17:39, Don Y wrote:
    On 4/16/2026 9:05 AM, joegwinn@comcast.net wrote:
    On Wed, 15 Apr 2026 17:32:17 -0700, Buzz McCool
    <buzz_mccool@yahoo.com> wrote:

    On 4/15/26 1:44 PM, Don Y wrote:
    I'm trying to sort out the "nominal" cost of using AVL trees to store a >>>> "dictionary".
    ...
    Pointers?-a Suggestions?

    Perhaps a book on tree data structures would compare/contrast different
    types of trees and already have determined which is least compute and/or >>> memory intensive rather than having to experimentally find this out for
    yourself.

    Knuth's TAOCP might be a place to start.

    That's exactly right.
    .<https://en.wikipedia.org/wiki/The_Art_of_Computer_Programming>

    I have a paper copy of Volumes 1-3 bought in the late 1970s.-a I made
    much use of it.

    All computer science text books suffer from a lack of practicality.
    Why isn't Raita used universally -- if it so much more efficient?
    (esp wrt brute force).-a Or, Red-Black trees?-a B trees?-a H trees?
    AVL trees?-a Why so many solutions as their performance characteristics
    (in the GENERAL sense) can easily be ranked.

    As I said, "in practice, how they are actually *used* dramatically alters ACTUAL performance."

    Pesky little details like:
    - NUMA (let alone NORMA!)
    - multiple threads concurrently accessing LARGE data structures
    - cost of the "compare" operator relative to the structure itself
    - failure resilience
    etc.

    Things that make a "school boy" approach perform terribly in a real-world application.

    Knuth documented a huge amount a very long time ago.
    How much more has really been done since then in terms
    of algorithm development?
    I don't have my own copy, but I know several others whose
    bookshelves are bending under the weight of Knuth.

    Back in the real world, I discovered the hard way that unless
    it is a huge project it is better to keep things simple
    and not worry to much about efficiency. CPU cycles are cheap
    compared with the time spent on designing a clever algorithm
    and making sure that it really works reliably.
    John

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From =?UTF-8?Q?Niocl=C3=A1s_P=C3=B3l_Caile=C3=A1n?= de Ghloucester@thanks-to@Taf.com to sci.electronics.design on Thu Apr 16 19:08:47 2026
    From Newsgroup: sci.electronics.design

    Don Y <blockedOFCOURSE@foo.invalid> wrote: |-------------------------------------------------------------------------| |"Imagine creating a directory (folder)" | |-------------------------------------------------------------------------|

    Electronic engineers of course do not need to be told that "directory"
    is "folder"!

    |-------------------------------------------------------------------------| |"How should | |it's "list of contents" be constructed to make for the most efficient | |use? *You*, the creator, likely know what that usage pattern (and | |list of potential contents) will likely be. Yet, you (the creator) | |aren't given any input into the actual implementation chosen for |
    |a folder with 4 entries or 400,000. [. . .] |
    |[. . .] |
    | | |You want the structure to be optimized by the *access* order (weighted by| |the types of accesses)[. ..]" | |-------------------------------------------------------------------------|

    I used Wget or whatever in circa 2003 to download the comp.arch.fpga
    archive that SIGDA used to host. It is not an mbox of many posts -
    instead each post therein is a single file called like 10001.txt and
    10002.txt etc.

    They are in many subdirectories, so I decided to burn a version to a
    CD but with them all in one directory.

    This reorganization does not transpire to be an
    improvement. Contrarily it slows down accesses via e.g.
    ls *
    by many MINUTES. So I decided to burn another copy - preserving these subdirectories of SIGDA.

    (I actually could not copy them all into the same directory (excluding subdirectories) without renamings, because after I started I detected
    that a few subdirectories had identically named different posts. SIGDA
    uniquely named most of them, so I copied most of them into a single
    directory.)
    (S. HTTP://Gloucester.Insomnia247.NL/ fuer Kontaktdaten!)
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From joegwinn@joegwinn@comcast.net to sci.electronics.design on Thu Apr 16 15:13:53 2026
    From Newsgroup: sci.electronics.design

    On Thu, 16 Apr 2026 09:39:57 -0700, Don Y
    <blockedofcourse@foo.invalid> wrote:

    On 4/16/2026 9:05 AM, joegwinn@comcast.net wrote:
    On Wed, 15 Apr 2026 17:32:17 -0700, Buzz McCool
    <buzz_mccool@yahoo.com> wrote:

    On 4/15/26 1:44 PM, Don Y wrote:
    I'm trying to sort out the "nominal" cost of using AVL trees to store a >>>> "dictionary".
    ...
    Pointers?a Suggestions?

    Perhaps a book on tree data structures would compare/contrast different
    types of trees and already have determined which is least compute and/or >>> memory intensive rather than having to experimentally find this out for
    yourself.

    Knuth's TAOCP might be a place to start.

    That's exactly right.
    .<https://en.wikipedia.org/wiki/The_Art_of_Computer_Programming>

    I have a paper copy of Volumes 1-3 bought in the late 1970s. I made
    much use of it.

    All computer science text books suffer from a lack of practicality.
    Why isn't Raita used universally -- if it so much more efficient?
    (esp wrt brute force). Or, Red-Black trees? B trees? H trees?
    AVL trees? Why so many solutions as their performance characteristics
    (in the GENERAL sense) can easily be ranked.

    As I said, "in practice, how they are actually *used* dramatically alters >ACTUAL performance."

    Pesky little details like:
    - NUMA (let alone NORMA!)
    - multiple threads concurrently accessing LARGE data structures
    - cost of the "compare" operator relative to the structure itself
    - failure resilience
    etc.

    Things that make a "school boy" approach perform terribly in a real-world >application.

    I take it you had not heard of Knuth. Not a school boy at all. Full
    professor and known to be the top of that field, and has personally
    written many very useful applications.

    TAOCP covers the foundations at a deep level, so all kinds and classes
    of solutions are there, compared and contrasted.

    Anyway, the various sections are now available online in pdf form, so
    there is no reason why you cannot read the section about binary trees
    and the like. This is often good for creating new approaches.

    .<https://www-cs-faculty.stanford.edu/~knuth/taocp.html#:~:text=These%20volumes%20are%20now%20available%20also%20in%20portable>

    As for searching binary trees, here is the Wiki summary: .<https://en.wikipedia.org/wiki/Optimal_binary_search_tree>

    Joe
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Don Y@blockedofcourse@foo.invalid to sci.electronics.design on Thu Apr 16 13:07:15 2026
    From Newsgroup: sci.electronics.design

    On 4/16/2026 12:13 PM, joegwinn@comcast.net wrote:
    On Thu, 16 Apr 2026 09:39:57 -0700, Don Y
    <blockedofcourse@foo.invalid> wrote:

    On 4/16/2026 9:05 AM, joegwinn@comcast.net wrote:
    On Wed, 15 Apr 2026 17:32:17 -0700, Buzz McCool
    <buzz_mccool@yahoo.com> wrote:

    On 4/15/26 1:44 PM, Don Y wrote:
    I'm trying to sort out the "nominal" cost of using AVL trees to store a >>>>> "dictionary".
    ...
    Pointers?-a Suggestions?

    Perhaps a book on tree data structures would compare/contrast different >>>> types of trees and already have determined which is least compute and/or >>>> memory intensive rather than having to experimentally find this out for >>>> yourself.

    Knuth's TAOCP might be a place to start.

    That's exactly right.
    .<https://en.wikipedia.org/wiki/The_Art_of_Computer_Programming>

    I have a paper copy of Volumes 1-3 bought in the late 1970s. I made
    much use of it.

    All computer science text books suffer from a lack of practicality.
    Why isn't Raita used universally -- if it so much more efficient?
    (esp wrt brute force). Or, Red-Black trees? B trees? H trees?
    AVL trees? Why so many solutions as their performance characteristics
    (in the GENERAL sense) can easily be ranked.

    As I said, "in practice, how they are actually *used* dramatically alters
    ACTUAL performance."

    Pesky little details like:
    - NUMA (let alone NORMA!)
    - multiple threads concurrently accessing LARGE data structures
    - cost of the "compare" operator relative to the structure itself
    - failure resilience
    etc.

    Things that make a "school boy" approach perform terribly in a real-world
    application.

    I take it you had not heard of Knuth.

    I have first editions (beige linen hardcover) or 1-3. He started on 4
    while I was in school but them refocused on TeX (of course, I also
    have all of the TeX books ("Computers and Typesetting") in hardcover
    (and those that weren't available as such in paperback)

    Not a school boy at all. Full
    professor and known to be the top of that field, and has personally
    written many very useful applications.

    But disconnected with real-world issues encountered in "appliances".
    The algorithms are largely described in a sterile, "workstation" (MIX) environment.

    Ask yourself how two competing threads would access a Red-Black tree
    when one is inserting/searching/deleting while the other is performing
    any of the above operations. Do you put a big lock on the entire
    structure? And, prevent the task holding the lock from being preempted?
    What if the tree is stored in FLASH where a write/update is costly (wear)
    and time consuming?

    How does paging affect the performance of various sort algorithms?
    Caching?

    These are real world issues that are completely ignored in the texts.

    An algorithm running on an abstract machine is just a teaching tool;
    it doesn't provide practical information.

    [Remember when "loop unrolling" was an optimization technique? And
    then came caches... <frown> It's not enough to be able to get a piece
    of code to produce an answer if HOW it produces it is "uninspired".
    Witness the floating point number thread.]

    TAOCP covers the foundations at a deep level, so all kinds and classes
    of solutions are there, compared and contrasted.

    Anyway, the various sections are now available online in pdf form, so
    there is no reason why you cannot read the section about binary trees
    and the like. This is often good for creating new approaches.

    .<https://www-cs-faculty.stanford.edu/~knuth/taocp.html#:~:text=These%20volumes%20are%20now%20available%20also%20in%20portable>

    As for searching binary trees, here is the Wiki summary: .<https://en.wikipedia.org/wiki/Optimal_binary_search_tree>

    Joe

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Don Y@blockedofcourse@foo.invalid to sci.electronics.design on Thu Apr 16 13:53:15 2026
    From Newsgroup: sci.electronics.design

    On 4/16/2026 11:20 AM, John R Walliker wrote:
    Things that make a "school boy" approach perform terribly in a real-world
    application.

    Knuth documented a huge amount a very long time ago.
    How much more has really been done since then in terms
    of algorithm development?

    There are things that are now commercially viable/practical
    that weren't really part of the conversation 50 years ago.
    Public key cryptography post-dated Knuth's work. As did most
    AI. (My copy of Winston was 8.5x11" photocopied sheets as
    he was revising the text as he was teaching it -- he welcomed
    bug reports in class)

    E.g., it is now common for cheap processors to have memory
    onboard, large register files, cache, address spaces with
    different access characteristics, etc. Memory management
    units, ECC syndrome handling, floating point and cryptographic
    accelerators, vector/SIMD processors, etc.

    These capabilities alter how you approach problems.

    I don't have my own copy, but I know several others whose
    bookshelves are bending under the weight of Knuth.

    I've scanned all of my textbooks -- bits are much lighter! :>

    Back in the real world, I discovered the hard way that unless
    it is a huge project it is better to keep things simple
    and not worry to much about efficiency.-a CPU cycles are cheap
    compared with the time spent on designing a clever algorithm
    and making sure that it really works reliably.

    The bigger risk is the effort subsequent developers will have
    to invest to understand what you have done and how it can
    be expected to behave. People "look at" code but seldom
    with a critical eye (witness the number of latent bugs found
    in FOSS where dozens/scores/hundreds of eyes have had a chance
    to discover them -- but failed!)

    [I designed a front panel for a Nova minicomputer that
    used 7-segment decoders as a hack to implement the required
    logic *AND* O.C. output stages. My boss was so afraid of
    it's magic that he made me redesign it with a handful of
    devices instead of the one or two I'd originally used]

    CPU cycles (and memory, and other resources) should be spent
    making code more accurate and reliable. Along with adding
    features and mechanisms that make the *product* more reliable.

    [E.g., I treat memory much like a disk drive. If a page
    starts throwing too many ECC errors, then I mark it as
    defective and map a "working" page in its place. If
    too many pages are retired in this way, I alert the user.
    Much like SMART does.]

    A lot depends on scale. Unfortunately, how you start often
    limits the extent to which you can realistically scale a
    solution. And, inertia tends to constrain subsequent
    projects, as well -- as folks try to reuse code and designs.
    The belief (delusion?) that "it works" is a strong motivator
    for leaving things as they are.

    E.g., this post sought information to help in the design of
    namespaces -- specifically, (key, capability) tuples.
    If you expect to have only a few entries in a "Context"
    (the equivalent of a "folder" within a namespace), then
    you can use a simple array to store the capabilities
    with the key acting as an index. And, do a lookup in
    constant time (so much for the "efficiency" of AVL trees!)

    Sharing such an object is considerably easier (less expensive)
    as you can create a critical region that just includes
    the actual array access -- any other work can be done before
    (or after) the critical region. Compare that effort to
    storing, say, just *10* items in a tree (ANY sort of tree).

    [Remember, the thread holding a lock can be preempted so
    you REALLY want to keep the critical region small to
    minimize the chance of preemption WHILE the lock is held]

    I have a "camera" namespace (managed by the camera server
    and distributed to individual camera clients). it
    has keys like:
    - NW looking East
    - NW looking South
    - NE looking West
    - NE looking South
    - SW looking East
    - SW looking North
    ...
    - Facing Front Door
    - Facing Visitor
    - Rooftop PTZ
    - Back yard PTZ
    - Front yard PTZ
    - Garage Door Riser, Left Side
    - Garage Door Riser, Right Side
    - Garage Door Traveler, Left Side
    - Garage Door Traveler, Right Side
    ...
    - Watching Sky
    - Watching Mailbox
    - Watching Driveway

    (I use a lot of cameras as once you can process video imagery,
    you can do a *lot* with a cheap camera!)

    Searching this namespace requires matching strings, even though there
    are only a few dozen entries. So, operations that access that
    structure likely take longer than accessing an indexed array.

    But, once a particular entry is located, the controller can
    spawn a "Camera Monitor" with a namespace that consists of just:
    - Camera
    - PTZ Controller
    - IR illuminator
    binding the corresponding resources derived from the controller's
    namespace.

    And, the "Camera Monitor" *knows* that there will be a successful
    result when it tries to resolve any of these three names! Not
    a guarantee that can be made of the controller's namespace.
    I could just as easily call them 1, 2 and 3 (with a more efficient
    namespace implementation as an array) with:
    enum {"Camera"=1, "PTZ Controller", "IR illuminator"}
    to let the developer dress up his code.

    OTOH, imagine having 100,000 tuples in the dictionary.
    And, keys that want to have special meaning to a developer
    (beyond simple integers/indices). Now, you need an efficient
    way to examine those 100,000 keys to locate the tuple of
    interest (or, determine that it doesn't exist). So, how
    you share *that* structure becomes a crucial issue (unless
    you subscribe to the "intermittent, hard to track down,
    bug" way of coding!)

    If the namespace is effectively read-only, then you would
    approach its design differently than if you expected a large
    number of insert/delete operations.

    If the namespace is stored, *scattered* on rotating rust,
    there is a cost to accessing it in its entirety and a
    risk of loss if those operations are interrupted or
    corrupted.

    The developer knows what the range of names/keys will likely
    be and how they will be accessed. So, *he* should determine how
    the NameSpace object is instantiated instead of one-size-fits-all.
    Show me where Knuth addresses any of these real-world issues...
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Don Y@blockedofcourse@foo.invalid to sci.electronics.design on Thu Apr 16 14:01:20 2026
    From Newsgroup: sci.electronics.design

    |-------------------------------------------------------------------------| |"How should | |it's "list of contents" be constructed to make for the most efficient | |use? *You*, the creator, likely know what that usage pattern (and | |list of potential contents) will likely be. Yet, you (the creator) | |aren't given any input into the actual implementation chosen for | |a folder with 4 entries or 400,000. [. . .] | |[. . .] |
    | | |You want the structure to be optimized by the *access* order (weighted by| |the types of accesses)[. ..]" | |-------------------------------------------------------------------------|

    I used Wget or whatever in circa 2003 to download the comp.arch.fpga
    archive that SIGDA used to host. It is not an mbox of many posts -
    instead each post therein is a single file called like 10001.txt and 10002.txt etc.

    They are in many subdirectories, so I decided to burn a version to a
    CD but with them all in one directory.

    Often, large datasets are broken into a series of smaller subgroups
    to reduce the size of each directory along the path: 1997/January/firstweek/entries for jan 1-7
    1997/January/secondweek/entries for jan 8-14
    etc.

    This takes some of the complexity out of the final directory level and
    handles it in higher level directories in the hierarchy.

    The risk with such an approach is the temptation to name the actual
    entries "1, 2, 3, 4..." in each of the "week" directories. Then,
    trying to combine them results in a January with five "1" entries
    (one for the 1st-7th, another for the 8th-14th, etc.)

    This reorganization does not transpire to be an
    improvement. Contrarily it slows down accesses via e.g.
    ls *
    by many MINUTES. So I decided to burn another copy - preserving these subdirectories of SIGDA.

    Many of the BSDs operate repositories of third party applications
    from which users can fetch "packages" (source or binary). Sadly,
    these all reside in a ".../distfiles" directory -- thousands of
    them! So, access to its contents is time consuming (and don't
    even think of "ls -al"!)

    Of course, if you want to mirror said contents, you are stuck with the
    came constraint!

    (I actually could not copy them all into the same directory (excluding subdirectories) without renamings, because after I started I detected
    that a few subdirectories had identically named different posts. SIGDA uniquely named most of them, so I copied most of them into a single directory.)
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Ross Finlayson@ross.a.finlayson@gmail.com to sci.electronics.design on Fri Apr 17 08:46:16 2026
    From Newsgroup: sci.electronics.design

    On 04/15/2026 05:56 PM, Don Y wrote:
    On 4/15/2026 3:19 PM, joegwinn@comcast.net wrote:
    Of course, it depends on the ACTUAL keys inserted/removed and the
    current context. But, so would every other algorithm!

    Pointers? Suggestions?

    It's O[log n] no matter what, so you won't do better, so just use it
    and be happy.

    Ref: <https://en.wikipedia.org/wiki/AVL_tree>

    You assume a (potentially) relatively high cost of rebalancing the tree
    on each insertion or deletion. So, the win for search is offset by
    that hit -- which only pays off if you then do other searches.

    E.g., if you are adding/removing lots of entries in creating a tree, there
    is a relatively high cost that may never pay off (as you may later only
    opt to do one "lookup").

    And, of course, the order in which you add entries has an optimal case
    (as well as pathological).

    I.e., naive structures can prove to be more efficient (time/space),
    depending on their contents and manner of construction.

    E.g., locating file descriptor 38 is likely more efficient than
    locating file "38".


    If the tree is of a bounded size, you could put it into an array
    instead of off the heap.

    A usual idea is figuring out attributes of unbounded size,
    yet usually _small_. For example, something like map-reduce
    has expectations that the count of keys is about uniform size
    and furthermore that it's not a lot, while often there are
    like "modes: principals, majors, minors, and the long tail".

    So, if the attribute has count 1 and sometimes count 2-3,
    then it might make sense to just make the attribute size
    4, then to have it that if the first or last pointer is
    null, then the other side is an offset to a store on
    the heap, or for example a balanced binary tree of those,
    vis-a-vis each being on (off) the heap.



    Matters of locality, proximity, and affinity, in many-core
    architectures which are usual these days since the like
    of AMD "Bulldozer" after dual-core, makes for that modern
    architectures like Arm/Thumb, Intel/AMD, or MIPS/RISC-V
    are much alike in that the neighborhood topology is a thing
    then that it's always a model of a distributed system.



    About "data" and "summary", and, the "tractable", tractable
    summary, begins usually enough with "counts", and for example
    about checksums, yet though that "counts" indicate the size,
    where the size is the usual idea for Big-O, Big-Theta, and
    Little-Theta notation, of the "asymptotics", since also size
    is always assumed to grow "unboundedly" in the consideration
    of the "asymptotics" or the "amortization" over time, which
    is usually given as the only factor of cost. So, that's not
    a complete account, and, for example, then whether a linked-list
    or an array is the better backing for any sort of iterative
    access and organization and maintenance of a data structure,
    has that it varies.


    Indeed it's so that if the distribution as of what would be
    a random sample is completely known, then an entirely
    demonstrably optimum sort of data structure can be made
    for it. Then, that the _library algorithms_ and the
    _library data structures_ are instead very common and
    unspecialized, is about specializations and types in
    a sense, and probabilistic accounts.

    About the massive-wide and the parallel, then how to make
    it so that the data is stored in summary, then usually
    for partitions with "uniform random access" by the
    discrimination of partitions, has that the organization
    of data, for example in hierarchical naming and directory
    interfaces or hierarchical time-series data, those being
    about opposites yet always in terms of each other,
    is much for summary and statistic.


    Then, at least one idea has that any data structure
    is half-data / half-summary, about usual ideas of
    state and scope, and the locality of reference,
    then for example having a system that just constantly
    updates those and is always re-organizing / re-structuring /
    re-sorting / re-building, then that reads and writes
    have the same cost.

    A usual account is for making reads free, and consistent,
    and writes atomic, and independent, vis-a-vis the granular
    and bulk and items and batches.

    So, vis-a-vis the school-boy or class-room accounts,
    for something like Aho Sethi Ullman and "a parser"
    vis-a-vis "parsing", modern architecures are themselves
    models of self-contained distributed systems.





    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Ross Finlayson@ross.a.finlayson@gmail.com to sci.electronics.design on Fri Apr 17 09:16:36 2026
    From Newsgroup: sci.electronics.design

    On 04/17/2026 08:46 AM, Ross Finlayson wrote:
    On 04/15/2026 05:56 PM, Don Y wrote:
    On 4/15/2026 3:19 PM, joegwinn@comcast.net wrote:
    Of course, it depends on the ACTUAL keys inserted/removed and the
    current context. But, so would every other algorithm!

    Pointers? Suggestions?

    It's O[log n] no matter what, so you won't do better, so just use it
    and be happy.

    Ref: <https://en.wikipedia.org/wiki/AVL_tree>

    You assume a (potentially) relatively high cost of rebalancing the tree
    on each insertion or deletion. So, the win for search is offset by
    that hit -- which only pays off if you then do other searches.

    E.g., if you are adding/removing lots of entries in creating a tree,
    there
    is a relatively high cost that may never pay off (as you may later only
    opt to do one "lookup").

    And, of course, the order in which you add entries has an optimal case
    (as well as pathological).

    I.e., naive structures can prove to be more efficient (time/space),
    depending on their contents and manner of construction.

    E.g., locating file descriptor 38 is likely more efficient than
    locating file "38".


    If the tree is of a bounded size, you could put it into an array
    instead of off the heap.

    A usual idea is figuring out attributes of unbounded size,
    yet usually _small_. For example, something like map-reduce
    has expectations that the count of keys is about uniform size
    and furthermore that it's not a lot, while often there are
    like "modes: principals, majors, minors, and the long tail".

    So, if the attribute has count 1 and sometimes count 2-3,
    then it might make sense to just make the attribute size
    4, then to have it that if the first or last pointer is
    null, then the other side is an offset to a store on
    the heap, or for example a balanced binary tree of those,
    vis-a-vis each being on (off) the heap.



    Matters of locality, proximity, and affinity, in many-core
    architectures which are usual these days since the like
    of AMD "Bulldozer" after dual-core, makes for that modern
    architectures like Arm/Thumb, Intel/AMD, or MIPS/RISC-V
    are much alike in that the neighborhood topology is a thing
    then that it's always a model of a distributed system.



    About "data" and "summary", and, the "tractable", tractable
    summary, begins usually enough with "counts", and for example
    about checksums, yet though that "counts" indicate the size,
    where the size is the usual idea for Big-O, Big-Theta, and
    Little-Theta notation, of the "asymptotics", since also size
    is always assumed to grow "unboundedly" in the consideration
    of the "asymptotics" or the "amortization" over time, which
    is usually given as the only factor of cost. So, that's not
    a complete account, and, for example, then whether a linked-list
    or an array is the better backing for any sort of iterative
    access and organization and maintenance of a data structure,
    has that it varies.


    Indeed it's so that if the distribution as of what would be
    a random sample is completely known, then an entirely
    demonstrably optimum sort of data structure can be made
    for it. Then, that the _library algorithms_ and the
    _library data structures_ are instead very common and
    unspecialized, is about specializations and types in
    a sense, and probabilistic accounts.

    About the massive-wide and the parallel, then how to make
    it so that the data is stored in summary, then usually
    for partitions with "uniform random access" by the
    discrimination of partitions, has that the organization
    of data, for example in hierarchical naming and directory
    interfaces or hierarchical time-series data, those being
    about opposites yet always in terms of each other,
    is much for summary and statistic.


    Then, at least one idea has that any data structure
    is half-data / half-summary, about usual ideas of
    state and scope, and the locality of reference,
    then for example having a system that just constantly
    updates those and is always re-organizing / re-structuring /
    re-sorting / re-building, then that reads and writes
    have the same cost.

    A usual account is for making reads free, and consistent,
    and writes atomic, and independent, vis-a-vis the granular
    and bulk and items and batches.

    So, vis-a-vis the school-boy or class-room accounts,
    for something like Aho Sethi Ullman and "a parser"
    vis-a-vis "parsing", modern architecures are themselves
    models of self-contained distributed systems.






    About some ideas from distributed systems in "space and time"
    of "accesors and mutators" of "vertical and horizontal scaling"
    of the "inherently serial and embarrassingly parallel" are
    the "eventually consistent" vis-a-vis the "optimistic locking"
    about "vending universally unique ID's" and "retry-logic",
    issues that arise in any model of coherency and consistency
    about "sources of truth and a source of truth" and "points
    of change and propagation of change".


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From =?UTF-8?Q?Niocl=C3=A1s_P=C3=B3l_Caile=C3=A1n?= de Ghloucester@thanks-to@Taf.com to sci.electronics.design on Fri Apr 17 18:30:56 2026
    From Newsgroup: sci.electronics.design

    Don Y <blockedofcourse@foo.invalid> wrote: |----------------------------------------------------------------|
    |"The bigger risk is the effort subsequent developers will have |
    |to invest to understand what you have done and how it can |
    |be expected to behave. People "look at" code but seldom |
    |with a critical eye (witness the number of latent bugs found |
    |in FOSS where dozens/scores/hundreds of eyes have had a chance |
    |to discover them -- but failed!)" | |----------------------------------------------------------------|

    True. Cf. Knuth said to write a paper assuming that a reader does not
    bother to read its equations before the 2nd time thru.

    |----------------------------------------------------------------|
    |"CPU cycles (and memory, and other resources) should be spent |
    |making code more accurate and reliable. Along with adding |
    |features and mechanisms that make the *product* more reliable. |
    | |
    |[E.g., I treat memory much like a disk drive. If a page |
    |starts throwing too many ECC errors, then I mark it as |
    |defective and map a "working" page in its place. If |
    |too many pages are retired in this way, I alert the user. |
    |Much like SMART does.]" | |----------------------------------------------------------------|

    Good.

    |-----------------------------------------------------------------|
    |"Show me where Knuth addresses any of these real-world issues..."| |-----------------------------------------------------------------|

    These very big very late books would be even bigger later if Knuth
    would add more considerations to them.
    (S. HTTP://Gloucester.Insomnia247.NL/ fuer Kontaktdaten!)
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Don Y@blockedofcourse@foo.invalid to sci.electronics.design on Fri Apr 17 15:36:39 2026
    From Newsgroup: sci.electronics.design

    On 4/17/2026 11:30 AM, Niocl|is P||l Caile|in de Ghloucester wrote:
    |-----------------------------------------------------------------|
    |"Show me where Knuth addresses any of these real-world issues..."| |-----------------------------------------------------------------|

    These very big very late books would be even bigger later if Knuth
    would add more considerations to them.
    There's nothing that says how large a book can be. Or, how many
    "spines" it can have.

    Knuth published collections of lecture notes, much later in his
    career. But, they touch on issues that aren't really related
    to real implementations. I.e., more "school boy" approach where
    your hands never really get dirty.
    --- Synchronet 3.21f-Linux NewsLink 1.2