I'm trying to sort out the "nominal" cost of using AVL trees to store a "dictionary".
It's relatively easy to sort out the cost of constructing it -- esp
if that occurs prior to load time.
But, I'm trying to come up with a model that reflects dynamic changes
to the structure "at runtime". So I can sort out a break-even point
for it vs. other algorithms.
Of course, it depends on the ACTUAL keys inserted/removed and the
current context. But, so would every other algorithm!
Pointers? Suggestions?
I'm trying to sort out the "nominal" cost of using AVL trees to store a >"dictionary".
It's relatively easy to sort out the cost of constructing it -- esp
if that occurs prior to load time.
But, I'm trying to come up with a model that reflects dynamic changes
to the structure "at runtime". So I can sort out a break-even point
for it vs. other algorithms.
Of course, it depends on the ACTUAL keys inserted/removed and the
current context. But, so would every other algorithm!
Pointers? Suggestions?
I'm trying to sort out the "nominal" cost of using AVL trees to store a >"dictionary".
It's relatively easy to sort out the cost of constructing it -- esp
if that occurs prior to load time.
But, I'm trying to come up with a model that reflects dynamic changes
to the structure "at runtime". So I can sort out a break-even point
for it vs. other algorithms.
Of course, it depends on the ACTUAL keys inserted/removed and the
current context. But, so would every other algorithm!
Pointers? Suggestions?
I'm trying to sort out the "nominal" cost of using AVL trees to store a "dictionary"....
Pointers?-a Suggestions?
Of course, it depends on the ACTUAL keys inserted/removed and the
current context. But, so would every other algorithm!
Pointers? Suggestions?
It's O[log n] no matter what, so you won't do better, so just use it
and be happy.
Ref: <https://en.wikipedia.org/wiki/AVL_tree>
On 4/15/26 1:44 PM, Don Y wrote:
I'm trying to sort out the "nominal" cost of using AVL trees to store a...
"dictionary".
Pointers?-a Suggestions?
Perhaps a book on tree data structures would compare/contrast different
types of trees and already have determined which is least compute and/or memory
intensive rather than having to experimentally find this out for yourself.
Knuth's TAOCP might be a place to start.
On 4/15/26 1:44 PM, Don Y wrote:
I'm trying to sort out the "nominal" cost of using AVL trees to store a...
"dictionary".
Pointers?a Suggestions?
Perhaps a book on tree data structures would compare/contrast different
types of trees and already have determined which is least compute and/or >memory intensive rather than having to experimentally find this out for >yourself.
Knuth's TAOCP might be a place to start.
On Wed, 15 Apr 2026 17:32:17 -0700, Buzz McCool
<buzz_mccool@yahoo.com> wrote:
On 4/15/26 1:44 PM, Don Y wrote:
I'm trying to sort out the "nominal" cost of using AVL trees to store a...
"dictionary".
Pointers?-a Suggestions?
Perhaps a book on tree data structures would compare/contrast different
types of trees and already have determined which is least compute and/or
memory intensive rather than having to experimentally find this out for
yourself.
Knuth's TAOCP might be a place to start.
That's exactly right. .<https://en.wikipedia.org/wiki/The_Art_of_Computer_Programming>
I have a paper copy of Volumes 1-3 bought in the late 1970s. I made
much use of it.
On 4/16/2026 9:05 AM, joegwinn@comcast.net wrote:
On Wed, 15 Apr 2026 17:32:17 -0700, Buzz McCool
<buzz_mccool@yahoo.com> wrote:
On 4/15/26 1:44 PM, Don Y wrote:
I'm trying to sort out the "nominal" cost of using AVL trees to store a >>>> "dictionary"....
Pointers?-a Suggestions?
Perhaps a book on tree data structures would compare/contrast different
types of trees and already have determined which is least compute and/or >>> memory intensive rather than having to experimentally find this out for
yourself.
Knuth's TAOCP might be a place to start.
That's exactly right.
.<https://en.wikipedia.org/wiki/The_Art_of_Computer_Programming>
I have a paper copy of Volumes 1-3 bought in the late 1970s.-a I made
much use of it.
All computer science text books suffer from a lack of practicality.
Why isn't Raita used universally -- if it so much more efficient?
(esp wrt brute force).-a Or, Red-Black trees?-a B trees?-a H trees?
AVL trees?-a Why so many solutions as their performance characteristics
(in the GENERAL sense) can easily be ranked.
As I said, "in practice, how they are actually *used* dramatically alters ACTUAL performance."
Pesky little details like:
- NUMA (let alone NORMA!)
- multiple threads concurrently accessing LARGE data structures
- cost of the "compare" operator relative to the structure itself
- failure resilience
etc.
Things that make a "school boy" approach perform terribly in a real-world application.
On 4/16/2026 9:05 AM, joegwinn@comcast.net wrote:
On Wed, 15 Apr 2026 17:32:17 -0700, Buzz McCool
<buzz_mccool@yahoo.com> wrote:
On 4/15/26 1:44 PM, Don Y wrote:
I'm trying to sort out the "nominal" cost of using AVL trees to store a >>>> "dictionary"....
Pointers?a Suggestions?
Perhaps a book on tree data structures would compare/contrast different
types of trees and already have determined which is least compute and/or >>> memory intensive rather than having to experimentally find this out for
yourself.
Knuth's TAOCP might be a place to start.
That's exactly right.
.<https://en.wikipedia.org/wiki/The_Art_of_Computer_Programming>
I have a paper copy of Volumes 1-3 bought in the late 1970s. I made
much use of it.
All computer science text books suffer from a lack of practicality.
Why isn't Raita used universally -- if it so much more efficient?
(esp wrt brute force). Or, Red-Black trees? B trees? H trees?
AVL trees? Why so many solutions as their performance characteristics
(in the GENERAL sense) can easily be ranked.
As I said, "in practice, how they are actually *used* dramatically alters >ACTUAL performance."
Pesky little details like:
- NUMA (let alone NORMA!)
- multiple threads concurrently accessing LARGE data structures
- cost of the "compare" operator relative to the structure itself
- failure resilience
etc.
Things that make a "school boy" approach perform terribly in a real-world >application.
On Thu, 16 Apr 2026 09:39:57 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:
On 4/16/2026 9:05 AM, joegwinn@comcast.net wrote:
On Wed, 15 Apr 2026 17:32:17 -0700, Buzz McCool
<buzz_mccool@yahoo.com> wrote:
On 4/15/26 1:44 PM, Don Y wrote:
I'm trying to sort out the "nominal" cost of using AVL trees to store a >>>>> "dictionary"....
Pointers?-a Suggestions?
Perhaps a book on tree data structures would compare/contrast different >>>> types of trees and already have determined which is least compute and/or >>>> memory intensive rather than having to experimentally find this out for >>>> yourself.
Knuth's TAOCP might be a place to start.
That's exactly right.
.<https://en.wikipedia.org/wiki/The_Art_of_Computer_Programming>
I have a paper copy of Volumes 1-3 bought in the late 1970s. I made
much use of it.
All computer science text books suffer from a lack of practicality.
Why isn't Raita used universally -- if it so much more efficient?
(esp wrt brute force). Or, Red-Black trees? B trees? H trees?
AVL trees? Why so many solutions as their performance characteristics
(in the GENERAL sense) can easily be ranked.
As I said, "in practice, how they are actually *used* dramatically alters
ACTUAL performance."
Pesky little details like:
- NUMA (let alone NORMA!)
- multiple threads concurrently accessing LARGE data structures
- cost of the "compare" operator relative to the structure itself
- failure resilience
etc.
Things that make a "school boy" approach perform terribly in a real-world
application.
I take it you had not heard of Knuth.
Not a school boy at all. Full
professor and known to be the top of that field, and has personally
written many very useful applications.
TAOCP covers the foundations at a deep level, so all kinds and classes
of solutions are there, compared and contrasted.
Anyway, the various sections are now available online in pdf form, so
there is no reason why you cannot read the section about binary trees
and the like. This is often good for creating new approaches.
.<https://www-cs-faculty.stanford.edu/~knuth/taocp.html#:~:text=These%20volumes%20are%20now%20available%20also%20in%20portable>
As for searching binary trees, here is the Wiki summary: .<https://en.wikipedia.org/wiki/Optimal_binary_search_tree>
Joe
Things that make a "school boy" approach perform terribly in a real-world
application.
Knuth documented a huge amount a very long time ago.
How much more has really been done since then in terms
of algorithm development?
I don't have my own copy, but I know several others whose
bookshelves are bending under the weight of Knuth.
Back in the real world, I discovered the hard way that unless
it is a huge project it is better to keep things simple
and not worry to much about efficiency.-a CPU cycles are cheap
compared with the time spent on designing a clever algorithm
and making sure that it really works reliably.
|-------------------------------------------------------------------------| |"How should | |it's "list of contents" be constructed to make for the most efficient | |use? *You*, the creator, likely know what that usage pattern (and | |list of potential contents) will likely be. Yet, you (the creator) | |aren't given any input into the actual implementation chosen for | |a folder with 4 entries or 400,000. [. . .] | |[. . .] |
| | |You want the structure to be optimized by the *access* order (weighted by| |the types of accesses)[. ..]" | |-------------------------------------------------------------------------|
I used Wget or whatever in circa 2003 to download the comp.arch.fpga
archive that SIGDA used to host. It is not an mbox of many posts -
instead each post therein is a single file called like 10001.txt and 10002.txt etc.
They are in many subdirectories, so I decided to burn a version to a
CD but with them all in one directory.
This reorganization does not transpire to be an
improvement. Contrarily it slows down accesses via e.g.
ls *
by many MINUTES. So I decided to burn another copy - preserving these subdirectories of SIGDA.
(I actually could not copy them all into the same directory (excluding subdirectories) without renamings, because after I started I detected--- Synchronet 3.21f-Linux NewsLink 1.2
that a few subdirectories had identically named different posts. SIGDA uniquely named most of them, so I copied most of them into a single directory.)
On 4/15/2026 3:19 PM, joegwinn@comcast.net wrote:
Of course, it depends on the ACTUAL keys inserted/removed and the
current context. But, so would every other algorithm!
Pointers? Suggestions?
It's O[log n] no matter what, so you won't do better, so just use it
and be happy.
Ref: <https://en.wikipedia.org/wiki/AVL_tree>
You assume a (potentially) relatively high cost of rebalancing the tree
on each insertion or deletion. So, the win for search is offset by
that hit -- which only pays off if you then do other searches.
E.g., if you are adding/removing lots of entries in creating a tree, there
is a relatively high cost that may never pay off (as you may later only
opt to do one "lookup").
And, of course, the order in which you add entries has an optimal case
(as well as pathological).
I.e., naive structures can prove to be more efficient (time/space),
depending on their contents and manner of construction.
E.g., locating file descriptor 38 is likely more efficient than
locating file "38".
On 04/15/2026 05:56 PM, Don Y wrote:
On 4/15/2026 3:19 PM, joegwinn@comcast.net wrote:
Of course, it depends on the ACTUAL keys inserted/removed and the
current context. But, so would every other algorithm!
Pointers? Suggestions?
It's O[log n] no matter what, so you won't do better, so just use it
and be happy.
Ref: <https://en.wikipedia.org/wiki/AVL_tree>
You assume a (potentially) relatively high cost of rebalancing the tree
on each insertion or deletion. So, the win for search is offset by
that hit -- which only pays off if you then do other searches.
E.g., if you are adding/removing lots of entries in creating a tree,
there
is a relatively high cost that may never pay off (as you may later only
opt to do one "lookup").
And, of course, the order in which you add entries has an optimal case
(as well as pathological).
I.e., naive structures can prove to be more efficient (time/space),
depending on their contents and manner of construction.
E.g., locating file descriptor 38 is likely more efficient than
locating file "38".
If the tree is of a bounded size, you could put it into an array
instead of off the heap.
A usual idea is figuring out attributes of unbounded size,
yet usually _small_. For example, something like map-reduce
has expectations that the count of keys is about uniform size
and furthermore that it's not a lot, while often there are
like "modes: principals, majors, minors, and the long tail".
So, if the attribute has count 1 and sometimes count 2-3,
then it might make sense to just make the attribute size
4, then to have it that if the first or last pointer is
null, then the other side is an offset to a store on
the heap, or for example a balanced binary tree of those,
vis-a-vis each being on (off) the heap.
Matters of locality, proximity, and affinity, in many-core
architectures which are usual these days since the like
of AMD "Bulldozer" after dual-core, makes for that modern
architectures like Arm/Thumb, Intel/AMD, or MIPS/RISC-V
are much alike in that the neighborhood topology is a thing
then that it's always a model of a distributed system.
About "data" and "summary", and, the "tractable", tractable
summary, begins usually enough with "counts", and for example
about checksums, yet though that "counts" indicate the size,
where the size is the usual idea for Big-O, Big-Theta, and
Little-Theta notation, of the "asymptotics", since also size
is always assumed to grow "unboundedly" in the consideration
of the "asymptotics" or the "amortization" over time, which
is usually given as the only factor of cost. So, that's not
a complete account, and, for example, then whether a linked-list
or an array is the better backing for any sort of iterative
access and organization and maintenance of a data structure,
has that it varies.
Indeed it's so that if the distribution as of what would be
a random sample is completely known, then an entirely
demonstrably optimum sort of data structure can be made
for it. Then, that the _library algorithms_ and the
_library data structures_ are instead very common and
unspecialized, is about specializations and types in
a sense, and probabilistic accounts.
About the massive-wide and the parallel, then how to make
it so that the data is stored in summary, then usually
for partitions with "uniform random access" by the
discrimination of partitions, has that the organization
of data, for example in hierarchical naming and directory
interfaces or hierarchical time-series data, those being
about opposites yet always in terms of each other,
is much for summary and statistic.
Then, at least one idea has that any data structure
is half-data / half-summary, about usual ideas of
state and scope, and the locality of reference,
then for example having a system that just constantly
updates those and is always re-organizing / re-structuring /
re-sorting / re-building, then that reads and writes
have the same cost.
A usual account is for making reads free, and consistent,
and writes atomic, and independent, vis-a-vis the granular
and bulk and items and batches.
So, vis-a-vis the school-boy or class-room accounts,
for something like Aho Sethi Ullman and "a parser"
vis-a-vis "parsing", modern architecures are themselves
models of self-contained distributed systems.
|-----------------------------------------------------------------|There's nothing that says how large a book can be. Or, how many
|"Show me where Knuth addresses any of these real-world issues..."| |-----------------------------------------------------------------|
These very big very late books would be even bigger later if Knuth
would add more considerations to them.
| Sysop: | Amessyroom |
|---|---|
| Location: | Fayetteville, NC |
| Users: | 63 |
| Nodes: | 6 (0 / 6) |
| Uptime: | 493001:38:55 |
| Calls: | 840 |
| Files: | 1,302 |
| D/L today: |
9 files (7,185K bytes) |
| Messages: | 267,829 |