• Fwd: Re: Meta: a usenet server just for sci.math (AATU)

    From Ross Finlayson@ross.a.finlayson@gmail.com to sci.electronics.design,sci.math on Thu Apr 23 09:39:31 2026
    From Newsgroup: sci.math

    On 12/01/2025 12:34 PM, Ross Finlayson wrote:
    [ The "Meta: a usenet server just for sci.math" and "Archive Any And All
    Text Usenet" threads transcribed. ]

    [ page break 1]
    [2016/12/01]


    I have an idea here to build a usenet server
    only for sci.math and sci.logic. The idea is
    to find archives of sci.math and sci.logic and
    to populate a store of the articles in a more
    or less enduring form (say, "on the cloud"),
    then to offer some usual news server access
    then to, say, 1 month 3 month 6 month retention,
    and then some cumulative retention (with a goal
    of unlimited retention of sci.math and sci.logic
    articles). The idea would be to have basically
    various names of servers then reflect those
    retentions for various uses for a read-only
    archival server and a read-only daily server
    and a read-and-write posting server. I'm willing
    to invest time and effort to write the necessary
    software and gather existing archives and integrate
    with existing usenet providers to put together these
    things.

    Then, where basically it's in part an exercise
    in vanity, I've been cultivating some various
    notions of how to generate some summaries or
    reports of various post, articles, threads, and
    authors, toward the specialization of the cultivation
    of summary for reporting and research purposes.


    So, I wonder others' idea about such a thing and
    how they might see it as a reasonably fruitful
    thing, basically for the enjoyment and for the
    most direct purposes of the authors of the posts.


    I invite comment, as I have begun to carry this out.

    [2016/12/02]

    So far I've read through the NNTP specs and looked
    a bit at the INND code. Then, the general idea is
    to define a filesystem layout convention, that then
    would be used for articles, then for having those
    on virtual disks (eg, "EBS volumes") or cloud storage
    (eg, "S3") in essentially a Write-Once-Read-Many
    configuration, where the goal is to implement data
    structures that have a forward state machine so that
    they remain consistent with unreliable computing
    resources (eg, "runtimes on EC2 hosts"), and that
    are readily cacheable (and horizontally scaleable).

    Then, the runtimes are of the collection and maintenance
    of posts ("infeeds" and "outfeeds", backfills), about
    summary generation (overview, metadata, key extraction,
    information content, working up auto-correlation), then
    reader servers, then some maintenance and admin. As a
    usual software design principle there is a goal of the
    both "stack-on-a-box" and also "abstraction of resources"
    and a usual separation of domain, library, routine, and
    runtime logic.

    So basically it looks like:
    1) gather mbox files of sci.math and sci.logic
    2) copy those to archive inputs
    3) break those out into a filesystem layout for each article
    (there are various filesystems that support this many files
    these days)
    4) generate partition and overview summaries
    5) generate various revisioning schemes (the "article numbers"
    of the various servers)
    6) figure out the incremental addition and periodic truncation
    7) establish a low-cost but high-availability endpoint runtime
    8) make elastic/auto-scaling service routine behind that
    9) have opportunistic / low cost periodic maintenance
    10) emit that as a configuration that anybody can run
    as "stack-on-a-box" or with usual "free tier" cloud accounts


    [2016/12/04]

    I've looked into this a bit more and the implementation is
    starting to look along these lines.

    First there's the ingestion side, or "infeed", basically
    the infeed connects and pushes articles. Here then the
    basic store of the articles will be an object store (or
    here "S3" as an example object store). This is durable
    and the object keys are the article's "unique" message-id.

    If the message-id already exists in the store, then the
    infeed just continues.

    The article is stored with matching the message-id, noting
    the body offset, and counting the lines, and storing that
    with the object. Then, the message-id pushed to
    a queue, can also have the headers as extracted from
    the article, that are relevant to the article and overview,
    and the arrival date or effective arrival date. The slow-
    and-steady database worker (or, distributed data structure
    on "Dynamo tables") then retrieves a queue item, at some
    metered rate, and gets an article number for each of the
    newsgroups (by some conditional update that might starve a thread)
    for each group that is in the newsgroups of the article and
    some "all" newsgroup, so that each article also has a (sequential) number.

    Assigning a sequence is a bit the wicket, because, here
    there's basically "eventual consistency" and "forward safe"
    operations. Any of the threads, connections, or boxes
    could die at any time, then the primary concern is "no
    drops, then, no dupes". So, there isn't really a transactional
    context to make atomic "for each group, give it the next
    sequence value, doing that together for each groups' numbering
    of articles in an atomic transaction". Luckily, while NNTP
    requires strictly increasing values, it allows gaps in the
    sequences. So, here, when mapping article-number to message-id
    and message-id to article-number, if some other thread has
    already stored a value for that article-number, then it can
    be re-tried until there is an unused article-number. Updating
    the high-water mark can fail if it was updated by another thread,
    then to re-try again with the new, which could lead to starvation.

    (There's a notion then, when an article-number is assigned, to
    toss that back onto queue for the rest of the transaction to
    be carried out.)

    Then, this having established a data structure for the message
    store, these are basically the live data structures, distributed,
    highly available, fault-tolerant and maintenance free, this
    implements the basic function for getting feeds (or new articles)
    and also the reader capability, which is basically a protocol
    listener that maintains the reader's current group and article.

    To implement then some further features of NNTP, there's an idea
    to store the article numbers for each group and "all" basically
    a bucket for each time period (eg, 1 day), then, that scans over
    the articles by their numbers find those as the partitions, then
    that sequentially (or rather, increasingly) the rest follow.

    To omit or remove articles or expire them for no-archive, that
    is basically ignored, but the idea is to maintain for the all
    group series of 1000 or 10000 articles then for what offsets in
    those series are cancelled. Basically the object store is
    write-once, immutable, and flat, where it's yet to be determined
    how to backfill the article store from archive files or suck
    feeds from live servers with long retentions. Then there's an
    idea to start the numbering at 1 000 000 or so an then have
    plenty of ranges where to fill in articles as archived or
    according to their receipt date header.

    Then, as the primary data stores would basically just implement
    a simple news server, there are two main notions of priority,
    to implement posting and to implement summaries and reports.

    Then, as far as I can tell, this pretty much fits within the
    "free tier" then that it's pretty economical.

    [2016/12/04]

    It's a matter of scale and configuration.

    It should scale quite well enough, though at some point
    it would involve some money. In rough terms, it looks
    like storing 1MM messages is ~$25/month, and supporting
    readers is a few cents a day but copying it would be
    twenty or thirty dollars. (I can front that.)

    I'm for it where it might be useful, where I hope to
    establish an archive with the goal of indefinite retention,
    and basically to present an archive and for my own
    purposes to generate narratives and timelines.

    The challenge will be to get copies of archives of these
    newsgroups. Somebody out of news.admin.peering might
    have some insight into who has the Dejanews CDs or what
    there might be in the Internet Archive Usenet Archive,
    then in terms of today's news servers which claim about
    ten years retention. Basically I'm looking for twenty
    plus years of retention.

    Now, some development is underway, and in no real hurry.
    Basically I'm looking at the runtimes and a software
    library to be written, (i.e., interfaces for the components
    above and local file-system versions for stack-on-a-box,
    implementing a subset of NNTP, in a simple service runtime
    that idles really low).

    Then, as above, it's kind of a vanity project or author-centric,
    about making it so that custom servers could be stood up with
    whatever newsgroups you want with the articles filtered
    however you'd so care, rendered variously.

    [2016/12/06]

    I've been studying this a bit more.

    I set up a linux development environment
    by installing ubuntu to a stick PC, then
    installing vim, gcc, java, mvn, git. While
    ubuntu is a debian distribution and Amazon
    Linux (a designated target) is instead along
    the lines of RedHat/Yellowdog (yum, was rpm,
    instead of apt-get, for component configuration),
    then I'm pretty familiar with these tools.

    Looking to the available components, basically
    the algorithm is being designed with data
    structures that can be local or remote. Then,
    these are usually that much more complicated
    than just the local or just the remote, and
    here also besides the routine or state machine
    also the exception or error handling and the
    having of the queues everywhere for both
    throttling and delay-retries (besides the
    usual inline re-tries and about circuit
    breaker). So, this is along the lines of
    "this is an object/octet store" (and AWS
    has an offering "Elastic File System" which
    is an NFS Networked File System that looks
    quite the bit more economical than S3 for
    this purpose), "this is a number allocator"
    (without sequence.nextVal in an RDBMS, the
    requirements allow some gaps in the sequence,
    here to use some DynamoDB table attribute's
    "atomic counter"), then along the lines of
    "this is a queue" and separately "I push to
    queues" and "I pop queues", and about "queue
    this for right now" and "queue this for later".
    Then, there's various mappings, like id to number
    and number to id, where again for no-drops / no-dupes
    / Murphy's-law that the state of the mappings is
    basically "forward-safe" and that retries make
    the system robust and "self-healing". Other mappings
    include a removed/deleted bag, this basically looks
    like a subset of a series or range of the assigned
    numbers, of the all-table and each group-table,
    basically numbers are added as attributes to the
    item for the series or range.

    Octet Store
    Queue
    Mapping

    Then, as noted above, with Murphy's law, any of the
    edges of the flowgraph can break at any time, about
    the request/response each that defines the boundary
    (and a barrier), there is basically defined an abstract
    generic exception "TryableException" that has only two
    subclasses, "Retryable" and "Nonretryable". Then, the
    various implementations of the data structures in the
    patterns of their use variously throw these in puking
    back the stack trace, then for inline re-tries, delay
    re-tries, and fails. Here there's usually a definition
    of "idempotence" for methods that are re-tryable besides
    exceptions that might go away. The idea is to build
    this into the procedure, so it's all built at compile-
    time the correctness of the composition of the steps
    of the flowgraph of the procedure.


    Then, for the runtime, basically it will be some Java
    container on the host or in a container, with basically
    a cheap simple watchdog/heartbeat that uses signals on
    unix (posix) to be keeping the service/routine nodes
    (that can fail) up, to bounce (restart) them with signals,
    and to reasonably fail and alarm if thrashing of the
    child process of the watchdog/nanny, with maybe some
    timer update up to the watchdog/heartbeat. Then basically
    this runner executes the routine/workflow logic in the jar,
    besides that then a mount of the NFS being the only admin
    on the box, everything else being run up out of the
    environment from the build artifact.

    The build artifact then looks that I'd use Spring for
    wiring a container and also configuration profiles and
    maybe Spring AOP and this kind of thing, i.e., just
    spring-core (toward avoiding "all" of spring-boot).

    Then, with local (in-memory and file) and remote
    (distributed) implementations, basically the
    design is to the distributed components, making
    abstract those patterns then implementing for the
    usual local implementation as standard containers
    and usual remote implementation as building transactions
    and defined behavior over the network.

    [2016/12/09]

    Having been researching this a bit more, and
    tapping at the code, I've written out most of
    the commands then to build a state machine of
    the results, and, having analyze the algorithm
    of article ingestion and group and session state,
    have defined interfaces suitable either for local
    or remote operation, with the notion that local
    operation would be self-contained (with a quite
    simple file backing) while remote operation would
    be quite usually durable and horizontally scalable.

    I've written up a message reader/writer interface
    or ("Scanner" and "Printer") for non-blocking I/O
    and implementing reading Commands and writing Results
    via non-blocking I/O. This should allow connection
    scaling, with threads on accepter/closer and reader/
    writer and an execution pool for the commands. The
    Scanner and Printer use some BufferPool (basically
    abut 4*1024 or 4K buffers), with an idea that that's
    pretty much all the I/O usage of RAM and is reasonably
    efficient, and that if RAM is hogged it's simple enough
    to self-throttle the reader for the writer to balance
    out.

    About the runtime, basically the idea is to have it
    installable as a "well-known service" for "socket
    activation" as via inetd or systemd. The runtime is
    really rather lean and starts quickly, here on-demand,
    that it can be configured as "on-demand" or "long-running".
    For some container without systemd or the equivalent,
    it could have a rather lean nanny. There's some notion
    of integrating heartbeat or status about Main.main(),
    then that it runs as "java -jar nntp.jar".

    Where the remote backing store or article file system
    is some network file system, it also seems that the
    runtime would so configure dependency on its file system
    resource with quite usual system configuration tools,
    for a fault-tolerant and graceful box that reboots as activable.

    It interests me that SMTP is quite similar to NNTP. With
    an idea of an on-demand server, which is quite rather usual,
    these service nodes run on the smallest cloud instances
    (here the "t2.nano") and scale to traffic, with a very low
    idle or simply the "on-demand" (then for "containerized").


    About usenet them I've been studying what it would mean to
    be compliant and example what to do with some "control" or
    "junk" (sideband) groups and otherwise what it would mean
    and take to make a horizontally scalable elastic cloud
    usenet server (and persistent store). This is where the
    service node is quite lean, the file store and database
    (here of horizontally scalable "tables") is basically unbounded.


    [2016/12/11]

    I've collected what RFC's or specs there are for usenet,
    then having surveyed the most of the specified use cases,
    have cataloged descriptions of the commands about the protocol
    that they are self-contained descriptions within the protocol
    of each command. Then, for where there is the protocol and
    perhaps any exchange or change of the protocol, for example
    for TLS, then that is also being worked into the state machine
    of sorts (simply enough a loop over the input buffer to generate
    command values from the input given the command descriptions),
    for that then as commands are generated (and maintained in their
    order) that the results (eg, in the parallel) are thus computed
    and returned (again back in the order).

    Then, within the protocol, and basically for encryption and
    compression, these are established within the protocol instead
    of, for example, externally to the protocol. So, there is
    basically a filter between the I/O reader and I/O writer and
    the scanner and the printer, as it were, that scans input data
    to commands and writes command results to output data. This is
    again with the "non-blocking I/O" then about that the blocks or
    buffers I've basically settled to 4 kibibyte (4KB) buffers, where,
    basically an entire input or output in the protocol (here a message
    body or perhaps a list of up to all the article numbers) would be
    buffered (in RAM), so I'm looking to spool that off to disk if it
    so results that essentially unbounded inputs and outputs are to be
    handled gracefully in the limited CPU, RAM, I/O, and disk resources
    of the usually quite reliable but formally unreliable computing node
    (and at cost).

    The data structures for access and persistence evolve as the in-memory
    and file-based local editions and networked or cloud remote editions.
    The semantics are built out to the remote editions, as then they can be erased in the difference for efficiencies of the local editions.
    The in-memory structures (with the article bodies themselves yet
    actually written to a file store) are quite efficient and bounded
    by RAM or the heap, the file-based structures which makes use of the memory-mapped files as you may well know comprise all the content of
    "free" RAM caching the disk files may be mostly persistent with
    a structure that can be bounded by disk size, then the remote network-
    based structures here have a usual expectation of being highly reliable (i.e., that the remote files, queues, and records have a higher reliability than any given component in their distributed design, at the corresponding cost in efficiency and direct performance, but of course, this is design
    for correctness).

    So, that said, then I'm tapping away at the implementation of a queue of
    byte buffers, or the I/O RAM convention. Basically, there is some I/O,
    and it may or may not be a complete datum or event in the protocol, which
    is 1-client-1-server or a stateful protocol. So, what is read off the
    I/O buffer, so the I/O controller can service that and other I/O lines,
    is copied to a byte buffer. Then, this is to be filtered as above as necessary, that it is copied to a list of byte buffers (a double ended
    queue or linked list). These buffers maintain their current position
    and limit, from their beginning, the "buffer" is these pointers and the
    data itself. So, that's their concrete type already, then the scanner
    or printer also maintains its scan or print position, that the buffer can
    be filled and holds some data, then that as the scan pointer moves past
    a buffer boundary, that buffer can be reclaimed, with only moving the
    scan pointer when a complete datum is read (here as defined for the scanner in small constant terms by the command descriptions as above).

    So, that is pretty much sorted out, then about that basically it should ingest articles just fine and be a mostly compliant NNTP server.

    Then, generating the overview and such is another bit to get figured out, which is summary.

    Another thing in this design to get figured out is how to implement the
    queue and database action for the remote, where, the cost efficiency of
    the (managed, durable, redundant) remote database, is on having a more-or- less constant (and small) rate of reads and writes. Then the distributed queue will hold the backlog, but, the queue consumer is to be constant
    rate not for the node but for the fleet, so I'm looking at how to implement some leader election (fault-tolerance) or otherwise to have loaner threads
    of the runtime for any service of the queue. This is where, ingestion is de-coupled from inbox, so, there's an idea of having a sentinel queue consumer
    (because this data might be high volume or low or zero) on a publish/subscribe,
    it listens to the queue and if it gets an item it refuses it and wakes up
    the constant-rate (or spiking) queue consumer workers, that then proceed
    with the workflow items and then retire themselves if and when traffic
    drops
    to zero again, standing back up the sentinel consumer.


    Anyways that's just about how to handle variable load but here there's
    that it's OK for the protocol to separate ingestion and inbox, otherwise establishing the completion of the workflow item from the initial request involves usual asynchronous completion considerations.


    So, that said, then, the design is seeming pretty flexible, then about,
    what extension commands might be suitable. Here the idea is about article transfer and which articles to transfer to other servers. The idea is to
    add some X-RETRANSFER-TO command or along these lines,

    X-RETRANSFER-TO host [group [dateBegin [dateEnd]]]

    then that this simply has the host open a connection to the other host
    and offer via IHAVE/CHECK/TAKETHIS all the articles so in the range
    or until the connection is closed. This way then, for example, if this
    NNTP system was running, and, someone wanted a subset of the articles,
    then this command would have them sent out-of-band, or, "automatic
    out-feed".
    Figuring out how to re-distribute or message routing besides simple
    message store and retrieval is its own problem.

    Another issue is expiry, I don't really intend to delete anything, because the purpose is archival, but people still use usenet in some corners of
    the internet for daily news, again that's its own problem. Handling out-of-order ingestion with the backfilling or archives as they can be discovered is another issue, with that basically being about filling a
    corpus of the messages, then trying to organize them that the message
    date is effectively the original injection date.


    Anyways, it proceeds along these lines.

    [2016/12/13]

    One of the challenges of writing this kind of system
    is vending the article-id's (or article numbers) for
    each newsgroup of each message-id. The message-id is
    received with the article as headers and body, or set
    as part of the injection info when the article is posted.
    So, vending a number means that there is known a previous
    number to give the next. Now, this is clear and simple
    in a stand-alone environment, with integer increment or
    "x = i++". It's not so simple in a distributed environment,
    with that the queuing system does not "absolutely guarantee"
    no dupes, with the priority being no drops, and also, the
    independent workers A and B can't know the shared value of
    x to make and take atomic increments, without establishing
    a synchronization barrier, here over the network, which is
    to be avoided (eg, blocking and locking on a database's
    critical transactional atomic sequence.nextval, with, say,
    a higher guarantee of no gaps). So, there is a database
    for vending strictly increasing numbers, each group of
    an article has a current number and there's an "atomic
    increment" feature thus that A working on A' will get
    i+1 and B working on B' will get i+2 (or maybe i+3, if
    for example the previous edition of B died). If A working
    on A' and B working on A' duplicated from the queue get
    i+1 and i+2, then, there is as mentioned above a conditional
    update to make sure the article number always increases,
    so there is a gap from the queue dupe or a gap from the
    worker drop, but then A or B has a consistent view of the
    article-id of A' or B'.

    So, then with having the number, once that's established,
    then all's well and good to associate the message-id, and
    the article-id.

    group: article-id -> message-id
    message: groups -> article-ids

    Then, looking at the performance, this logical association
    is neatly maintainable in the DB tables, with consistent
    views for A and B. But it's a limited resource, in this
    implementation, there are actually only so many reads and
    writes per period. So, workers can steadily chew away the
    intake queue, assigning numbers, but then querying for the
    numbers is also at a cost, which is primarily what the
    reader connections do.

    Then, the idea is to maintain the logical associations, of
    the message-id <-> article-id, also in a growing file, with
    a write-once read-many file about the NFS file system. There's
    no file locking, and, writes to the file that are disordered
    or contentious could (and by Murphy's law, would) write corrupt
    entries to the file. There are various notions of leader election
    or straw-pulling for exactly one of A or B to collect the numbers
    in order and write them to the article-ids file, one "row" (or 64
    byte fixed length record) per number, at the offset 64*number
    (as from some 0 or the offset from the first number). But,
    consensus and locking for serialization of tasks couples A and B
    which are otherwise running entirely independently. So, then
    the idea is to identify the next offset for the article-ids file,
    and collect a batch of numbers as make a block-sized block of
    the NFS implementation (eg 4Kb or 8Kb and hopefully configurably
    and not 1Mb which is about 64Kb records of 64b each). So, as
    A and B each collect the numbers (and detect if there were gaps
    now) then either (or both) completes a segment to append to the
    file. There aren't append modes of the NFS files, which is fine
    because actually the block now is written to the computed offset,
    which is the same for A and B. In the off chance A and B both
    make writes, file corruption doesn't follow because it's the
    same content, and it's block size, and it's an absolute offset.

    So, in this way, it seems that over time, the contents of the DB
    are written out to the sequence by article-id of message-id for
    each group

    group: article-id -> message-id

    besides that the message-id folder contains the article-ids

    message-id: groups -> article-id

    the content of which is known when the article-id numbers for
    the groups of the message are vended.


    Then, in the usual routine of looking up the message-id or
    article-id given the group, the DB table is authoritative,
    but, the NFS file is also correct, where a value exists.
    (Also it's immutable or constant and conveniently a file.)
    So, readers can map into memory the file, and consult the
    offset in the file, to find the message-id for the requested
    article-id, if that's not found, then the DB table, where it
    would surely be, as the message-id had vended an article-id,
    before the groups article-id range was set to include the
    new article.

    When a range of the article numbers is passed, then effectively,
    the lookup will always be satisfied by the file lookup instead
    of the DB table lookup, so there won't be the cost of the DB
    table lookup. In some off chance the open files of the NFS
    (also a limited resource, say 32K) are all exhausted, there's
    still a DB table to read, that is a limited and expensive
    resource, but also elastic and autoscalable.

    Anyways, this design issue also has the benefit of keeping it
    so that the file system has a convention with that all the data
    remains in the file system, with then usual convenience in
    backup and durability concerns, while still keeping it correct
    and horizontally scalable, basically with the notion of then
    even being able to truncate the database in any lull of traffic,
    for that the entire state is consistent on the file system.

    It remains to be figured out that NFS is OK with writing duplicate
    copies of a file block, toward having this highly reliable workflow
    system.


    That is basically the design issue then, I'm tapping away on this.


    [ page break 2 ]

    [2016/12/14]

    Tapping away at this idea of a usenet server system,
    I've written much of the read routine that is the
    non-blocking I/O with the buffer passing and for the
    externally coded data and any different coded data
    like the unencrypted or uncompressed. I've quite
    settled on 4KiB (2^12B) as the usual buffer page,
    and it looks that the NFS offering can be so tuned
    that its wsize (write size) is 4096 and with an
    async NFS write option that that page size will
    have that writes are incorruptible (though for
    whatever reason they may be lost), and that 4096B
    or 256 entries of 64B (2^6B) for a message-id or oversize-
    message-id entry will spool off the message-id's of
    the group's articles at an offset in the file that
    is article-id * (1 << 6). The MTU of Ethernet packets
    is often 1500 so having a wsize of 1KiB is not
    nonsensible, as many of the writes are of this
    granularity, the MTU might be 9001 or jumbo, which
    would carry 2 4KiB NFS packets in one Ethernet packet.
    Having the NFS rsize (read size) say 32KiB seems not
    unreasonable, with that the reads will be pages of the
    article-id's, or, the article contents themselves (split
    to headers, xrefs, body) from the filesystem that are
    mostly some few key and mostly quite altogether > 32 KiB,
    which is quite a lot considering that's less than a JPEG
    the size of "this". (99+% of Internet traffic was JPEG
    and these days is audio/video traffic, often courtesy JPEG.)

    Writing the read routine is amusing me with training the
    buffers and it amuses me to write code with quite the
    few +1 and -1 in the offsets. Usually having +-1 in
    the offset computations is a good or a bad thing, rarely
    good, with that often it's a sign that the method signature
    just isn't being used quite right in terms of the locals,
    if not quite as bad as "build a fence a mile then move it
    a foot". When +-1 offsets is a good thing, here the operations
    on the content of the buffers are rather agnostic the bounds
    and amount of the buffers, thus that I/O should be quite
    expedient in the routine.

    (Written in Java, it should run quite the same on any
    runtime with Java 1.4+.)

    That said then next I'm looking to implement the Executor pool.

    Acceptor -> Reader -> Scanner -> Executor -> Printer -> Writer

    The idea of the Executor pool is that there are many connections
    or sessions (the protocol is stateful), then that for one session,
    its command's results are returned in order, but, that doesn't say
    that the commands are executed in order, just that their results
    are returned in order. (For some commands, which affect the state
    of the session like current group or current article, that being
    pretty much it, those also have to be executed sequentially for
    consistency's sake.) So, I'm looking to have the commands be
    executed in any possible order, for the usual idea of saturating
    the bandwidth of the horizontally scalable backend. (Yeah, I
    know NFS has limits, but it's unbounded and durable, and there's
    overall a consistent, non-blocking toward lock-free view.)
    Anyways, basically the Session has a data structure of its
    outstanding commands, as they're enqueued to the task executor,
    then whether it can go into the out-of-order pool or must stay
    in the serial pool. Then, as the commands complete, or for
    example timeout after retries on some network burp, those are
    queued back up as the FIFO of the Results and as those arrive
    the Writer is re-registered with the SocketChannel's Selector
    for I/O notifications and proceeds to fill the socket's output
    buffer and retire the Command and Result. One aspect of this
    is that the Printer/Writer doesn't necessarily get the data on
    the heap, the output for example an article is composed from
    the FileChannels of the message-id's header, xref, body. Now,
    these days, the system doesn't have much of a limit in open
    file handles, but as mentioned above there are limits on NFS
    file handles. Basically then the data is retrieved as from the
    object store (or here an octet store but the entire contents of
    the files are written to the output with filesystem transfer
    direct to memory or the I/O channel). Then, releasing the
    NFS file handles expeditiously basically is to be figured out
    with caching the contents, for any retransmission or simply
    serving copies of the current articles to any number of
    connections. As all these are, read-only, it looks like the
    filesystems' built-in I/O caching with, for example, a read-only
    client view and no timeout, basically turns the box into a file
    cache, because that is what it is.

    Then, it looks like there is a case for separate reader and
    writer implementations altogether of the NFS or octet store
    (that here is an object store for the articles and their
    sections, and an octet store for the pages of the tables).
    This is with the goal of minimizing network access while
    maintaining the correct view. But, an NFS export can't
    be mounted twice from the same client (one for reads and
    one for writes), and, while ingesting the message can be
    done separately the client, intake has to occur from the
    client, then what with a usual distributed cloud queue
    implementation having size and content limits, it seems
    like it'll be OK.

    [2016/12/17]

    The next thing I'm looking at is how to describe the "range",
    as a data structure or in algorithms.

    Here a "range" class in the runtime library is usually a
    "bounds" class. I'm talking about a range, basically a
    1-D range, about basically a subset of the integers,
    then that the range is iterating over the subset in order,
    about how to maintain that in the most maintainable and
    accessible terms (in computational complexity's space and time
    terms).

    So, I'm looking to define a reasonable algebra of individuals,
    subsets, segments, and rays (and their complements) that
    naturally compose to objects with linear maintenance and linear
    iteration and constant access of linear partitions of time-
    series data, dense or sparse, with patterns and scale.

    This then is to define data structures as so compose that
    given a series of items and a predicate, establish the
    subset of items as a "range", that then so compose as
    above (and also that it has translations and otherwise
    is a fungible iterator).

    I don't have one of those already in the runtime library.

    punch-out <- punches have shapes, patterns? eg 1010
    knock-out <- knocks have area
    pin-out <- just one
    drop-out <-
    fall-out <- range is out

    Then basically there's a coalescence of all these,
    that they have iterators or mark bounds, of the
    iterator of the natural range or sequence, for then
    these being applied in order

    push-up <- basically a prioritization
    fill-in <- for a "sparse" range, like the complement upside-down
    pin-in
    punch-in
    knock-in

    Then all these have the basic expectation that a range
    is the combination of each of these that are expressions
    then that they are expressions only of the value of the
    iterator, of a natural range.

    Then, for the natural range being time, then there is about
    the granularity or fine-ness of the time, then that there is
    a natural range either over or under the time range.

    Then, for the natural range having some natural indices,
    the current and effective indices are basically one and
    zero based, that all the features of the range are shiftable
    or expressed in terms of these offsets.

    0 - history

    a - z

    -m,n

    Whether there are pin-outs or knock-outs rather varies on
    whether removals are one-off or half-off.

    Then, pin-outs might build a punch-out,
    While knock-outs might build a scaled punch-out

    Here the idea of scale then is to apply the notions
    of stride (stripe, stribe, striqe) to the range, about
    where the range is for example 0, 1, .., 4, 5 .., 8, 9
    that it is like 1, 3, 5, 7 scaled out.

    Then, "Range" becomes quite a first-class data structure,
    in terms of linear ranges, to implement usual iterators
    like forward ranges (iterators).

    Then, for time-forward searches, or to compose results in
    ranges from time-forward searches, without altogether loading
    into memory the individuals and then sorting them and then
    detecting their ranges, there is to be defined how ranges
    compose. So, the Range includes a reference to its space
    and the Bounds of the Space (in integers then extended
    precision integers).

    "Constructed via range, slices, ..." (gslices), ....



    Then, basically I want that the time series is a range,
    that expressions matching elements are dispatched to
    partitions in the range, that the returned or referenced
    composable elements are ranges, that the ranges compose
    basically pair-wise in constant time, thus linearly over
    the time series, then that iteration over the elements
    is linear in the elements in the range, not in the time
    series. Then, it's still linear in the time series,
    but sub-linear in the time series, also in space terms.

    Here, sparse or dense ranges should have the same small-
    linear space terms, with there being maintenance on the
    ranges, about there being hysteresis or "worst-case 50/50"
    (then basically some inertia for where a range is "dense"
    or "sparse" when it has gt or lt .5 elements, then about
    where it's just organized that way because there is a re-
    organization).

    So, besides composing, then the elements should have very
    natural complements, basically complementing the range by
    taking the complement of the ranges parts, that each
    sub-structure has a natural complement.

    Then, pattern and scale are rather related, about figuring
    that out some more, and leaving the general purpose, while
    identifying the true primitives of these.

    Then eventually there attachment or reference to values
    under the range, and general-purpose expressions to return
    an iteration or build a range, about the collectors that
    establish where range conditions are met and then collapse
    after the iteration is done, as possible.

    So, there is the function of the range, to iterate, then
    there is the building of the range, by iterating. The
    default of the range and the space is its bounds (or, in
    the extended, that there are none). Then, segments are
    identified by beginning and end (and perhaps a scale, about
    rigid translations and about then that the space is
    unsigned, though unbounded both left and right see
    some use). These are dense ranges, then for whether the
    range is "naturally" or initially dense or sparse. (The
    usual notion is "dense/full" but perhaps that's as
    "complement of sparse/empty".) Then, as elements are
    added or removed in the space, if they are added range-wise
    then that goes to a stack of ranges that any forward
    iterator checks before it iterators, about whether the
    natural space's next is in or out, or, whether there is
    a skip or jump, or a flip then to look for the next item
    that is in instead of out.

    This is where, the usual enough organization of the data
    as collected in time series will be bucketed or partitioned
    or sharded into some segment of the space of the range,
    that buiding range or reading range has the affinity to
    the relevant bucket, partition, or shard. (This is all
    1-D time series data, no need to make things complicated.)

    Then, the interface basically "builds" or "reads" ranges,
    building given an expression and reading as a read-out
    (or forward iteration), about that then the implementation
    is to compose the ranges of these various elements of a
    topological sort about the bounds/segments and scale/patterns
    and individuals.

    https://en.wikipedia.org/wiki/Allen%27s_interval_algebra

    This is interesting, for an algebra of intervals, or
    segments, but here so far I'd been having that the
    segments of contiguous individuals are eventually
    just segments themselves, but composing those would
    see the description as of this algebra. Clearly the
    goal is the algebra of the contents of sets of integers
    in the integer spaces.

    An algebra of sets and segments of integers in integer spaces

    An integer space defines elements of a type that are ordered.

    An individual integer is an element of this space.

    A set of integers is a set of integers, a segment of integers
    is a set containing a least and greatest element and all elements
    between. A ray of integers of a set containing a least element
    and all greater elements or containing a greatest element and
    all lesser elements.

    A complement of an individual is all the other individuals,
    a complement of a set is the intersection of all other sets,
    a complement of a segment is all the elements of the ray less
    than and the ray greater than all individuals of the segment.

    What are the usual algebras of the compositions of individuals,
    sets, segments, and rays?

    https://en.wikipedia.org/wiki/Region_connection_calculus



    Then basically all kinds of things that are about subsets
    of thing in a topological or ordered space should basically
    have a first-class representation as (various kinds of)
    elements in the range algebra.

    So, I'm wondering what there is already for
    "range algebra" and "range calculus".

    [2016/12/18]

    Some of the features of this subsets of a
    range of integers is available as a usual
    bit vector, eg with ffs ("find-first-set")
    memory scan instructions memory scan instructions,
    and as well usual notions of compressed bitmap
    indices, with some notion of random access to
    the value of a bit by its index and variously
    iterating over the elements. Various schemes
    to compress the bitmaps down to uncompressed
    regions with representing words' worths of bits
    may suit parts of the implementation, but I'm
    looking for a "pyramidal" or "multi-resolution"
    organization of efficient bits, and also flags,
    about associating various channels of bits with
    the items or messages.

    https://en.wikipedia.org/wiki/Bitmap_index

    Then, with having narrowed down the design for
    what syntax to cover, and, mostly selected data
    structures for the innards, then I've been looking
    to the data throughput, then some idea of support
    of client features.

    Throughput is basically about how to keep the
    commands moving through. For this, there's a
    single thread that reads off the network interface'
    I/O buffers, it was also driving the scanner, but
    adding encryption and compression layers, then there's
    also adding a separate thread to drive the scanner
    thus that the network interface is serviced on demand.
    Designing a concurrent data structure basically has
    a novel selector (as of the non-blocking I/O) to
    then pick off a thread from the pool to run the
    scanner. Then, on the "printer" side and writing
    off to the network interface, it is similar, with
    having the session or connection's resources run
    the compression and encryption, then for the I/O
    thread as servicing the network interface. Basically
    this is having put a collator/relay thread between
    the I/O threads and the scanner/printer threads
    (where the commands are run by the executor pool).


    Then, a second notion has been the support of TLS.
    It looks I would simply sign a certificate and expect
    users to check and install it themselves in their
    trust-store for SSL/TLS. That said, it isn't really
    a great solution, because, if someone compromises any
    of the CA's, certificate authorities, in the trust
    store (any of them), then a man-in-the-middle could
    sign a cert, and it would be on the server to check
    that the content hash reflected the server cert from
    the handshake. What might be better would be to have
    that each client, signs their own certificate, for the
    server to present. This way, the client and server
    each sign a cert, and those are exchanged. When the
    server gets the client cert, it restarts the negotiation
    now with using the client-signed cert as the server
    cert. This way, there's only a trust anchor of depth
    1 and the trust anchors are never exchanged and can
    not be cross-signed nor otherwise would ever share
    a trust root. Similarly the server get's the server-
    signed cert back from the client then that TLS could
    proceed with a session ticket and that otherwise there
    would be a stronger protection from compromised CA
    certs. Then, this could be pretty automatic with
    a simple enough browser interface or link to set up TLS.
    Then the server and client would only trust themselves
    and each other (and keep their secrets private).

    Then, for browsing, a reading of IMAP, the Internet
    Message Access Protocol, shows a strong affinity with
    the organization of Usenet messages, with newsgroups
    as mailboxes. As well, implementing an IMAP server
    that is backed by the NNTP server has then that the
    search artifacts and etcetera (and this was largely
    a reason why I need this improved "range" pattern)
    would build for otherwise making deterministic date-
    oriented searches over the messages in the NNTP server.
    IMAP has a strong affinity with NNTP, and is a very
    similar protocol and is implemented much the same
    way. Then it would be convenient for users with
    an IMAP client to simply point to "usenet.science"
    or what and get usenet through their email browser.


    [2016/12/23]

    About implementing usenet with reasonably
    modern runtimes and an eye toward
    unlimited retention, basically looking
    into "microtasks" for the routine or
    workflow instances, as are driven with
    non-blocking I/O throughout, basically
    looking to memoize the steps as through
    a finite state machine, for restarts as
    of a thread, then to go from "service
    oriented" to "message oriented".


    This involves writing a bit of an
    HTTP client for rather usual web
    service calls, but with high speed
    non-blocking I/O (less threads, more
    connections). Also this involves a
    sufficient abstraction.


    [ page break 3 ]

    [2017/01/06]

    This writing some software for usenet service
    is coming along with the idea of how to implement
    the fundamentally asynchronous non-blocking routine.
    This is crystallizing in pattern as a: re-routine,
    in reference to computing's usual: co-routine.

    The idea of the re-routine is that there are only
    so many workers, threads, of the runtime. The usual
    runtimes (and this one, Java, say) support preemptive
    multithreading as a means of implementing cooperative
    multithreading, with the maintenance of separate stacks
    (of, the stack machine of usual C-like procedural runtimes)
    and some thread-per-connection model. This is somewhat
    reasonable for the composition of blocking APIs, but
    not so much for the composition of non-blocking APIs
    and about how to not have many thread-per-connection
    resources with essentially zero duty cycle that instead
    could maintain for themselves the state machine of their
    routine (with simplified forward states and a general
    exception and error routine), for cooperative multi-threading.

    The idea of this re-routine then is to connect functions,
    there's a scope for variables in the scope, there is
    execution of the functions (or here the routines, as
    the "re-routines") then the instance of the re-routine
    is re-entrant in the sense that as partial results are
    accumulated the trace of the routine is marked out, with
    leaving in the scope the current or partial or intermediate
    results. Then, the asynchronous workers that fulfill each
    routine (eg, with a lookup, a system call, or a network
    call) are separate worker units dedicated to their domain
    (of the routine, not the re-routine, and they can be blocking,
    polling for their fleet, or callback with the ticket).

    Then, this is basically a network machine and protocol,
    here about NNTP and IMAP, and its resources are often
    then of network machines and protocols (eg networked
    file systems, web services). Then, these "machines"
    of the "re-routine" being built (basically for the
    streaming model instead of the batch model if you
    know what I'm talking about) defining the logical
    outcomes of the composition of the inputs and the
    resulting outputs in terms of scopes as a model of
    the cooperative multithreading, these re-routines
    then are seeing for the pattern then that the
    source template is about implicitly establishing
    the scope and the passing and calling convention
    (without a bunch of boilerplate or "callback confusion",
    "async hell"). This is where the re-routine, when
    a routine worker fills in a partial result and resubmits
    the re-routine (with the responsibility/ownership of
    the re-routine) that it is re-evaluated from the beginning,
    because it is constant linear in reading forward for the
    item the state of its overall routine, thusly implicit
    without having to build a state machine, as it is
    declaratively the routine.

    So, I am looking at this as my solution as to how to
    establish a very efficient (in resource and performance
    terms) formally correct protocol implementation (and
    with very simple declarative semantics of usual forward,
    linear routines).

    This "re-routine" pattern then as a model of cooperative
    multithreading sees the complexity and work into the
    catalog of blocking, polling, and callback support,
    then for usual resource injection of those as all
    supported with references to usual sequential processes
    (composition of routine).


    [2017/0121]

    I've about sorted out how to implement the re-routine.

    Basically a re-routine is a suspendable composite
    operation, with normal declarative flow-of-control
    syntax, that memo-izes its partial results, and
    re-executes the same block of statements then to
    arrive at its pause, completion, or exit.

    Then, the command and executor are passed to the
    implementation that has its own (or maybe the
    same) execution resources, eg a thread or connection
    pool. This resolves the value of the asynchronous
    operation, and then re-submits the re-routine to
    its originating executor. The re-routine re-runs
    (it runs through the branching or flow-of-control
    each time, but that's small in the linear and all
    the intermediate products are already computed,
    and the syntax is usual and in the language).
    The re-routine then either re-suspends (as it
    launches the next task) or completes or exits (errors).
    Whether it suspends, completes or exits, the
    re-routine just returns, and the executor then
    is specialized and just checks the re-routine
    whether it's suspended (and just drops it, the
    new responsible launched will re-submit it),
    or whether it's completed or errored (to call
    back to the originating commander the result of
    the command).


    In this manner, it seems like a neat way to basically
    establish the continuation, for this "non-blocking
    asynchronous operation", while at the same time
    the branching and flow of control is all in the
    language, with the usual un-suprising syntax and
    semantics, for cooperative multi-threading. The
    cost is in wrapping the functional callers of the
    routine and setting up their factories and otherwise
    as via injection (and they can block the calling
    thread, or have their own threads and block, or
    be asynchronous, without changing the definition
    of the routine).


    [2017/01/21]

    I've about sorted out how to implement the re-routine.

    Basically a re-routine is a suspendable composite
    operation, with normal declarative flow-of-control
    syntax, that memo-izes its partial results, and
    re-executes the same block of statements then to
    arrive at its pause, completion, or exit.

    Then, the command and executor are passed to the
    implementation that has its own (or maybe the
    same) execution resources, eg a thread or connection
    pool. This resolves the value of the asynchronous
    operation, and then re-submits the re-routine to
    its originating executor. The re-routine re-runs
    (it runs through the branching or flow-of-control
    each time, but that's small in the linear and all
    the intermediate products are already computed,
    and the syntax is usual and in the language).
    The re-routine then either re-suspends (as it
    launches the next task) or completes or exits (errors).
    Whether it suspends, completes or exits, the
    re-routine just returns, and the executor then
    is specialized and just checks the re-routine
    whether it's suspended (and just drops it, the
    new responsible launched will re-submit it),
    or whether it's completed or errored (to call
    back to the originating commander the result of
    the command).


    In this manner, it seems like a neat way to basically
    establish the continuation, for this "non-blocking
    asynchronous operation", while at the same time
    the branching and flow of control is all in the
    language, with the usual un-suprising syntax and
    semantics, for cooperative multi-threading. The
    cost is in wrapping the functional callers of the
    routine and setting up their factories and otherwise
    as via injection (and they can block the calling
    thread, or have their own threads and block, or
    be asynchronous, without changing the definition
    of the routine).

    So, having sorted this mostly out, then the usual
    work as of implementing the routines for the protocol
    can so proceed then with a usual notion of a framework
    of support for both the simple declaration of routine
    and the high performance (and low resource usage) of
    the delegation of routine, and support for injection
    for test and environment, and all in the language
    with minimal clutter, no byte-code modification,
    and a ready wrapper for libraries of arbitrary
    run-time characteristic.

    This solves some problems.


    [2017/01/22]

    Thanks for your interest, if you read the thread,
    I'm talking about an implementation of usenet,
    with modern languages and runtimes, but, with
    a filesystem convention, and a distributed redundant
    store, and otherwise of very limited hardware and
    distributed software resources or the "free tier"
    of cloud computing (or, any box).

    When it comes to message formats, usenet isn't
    limited to plain text, it's as simply usual
    MIME multimedia. (The user-agent can render
    text however it would so care.)

    A reputation system is pretty simply implemented
    with forwarding posts to various statistics groups
    that over time build profiles of authors that
    readers may adopt.

    Putting an IMAP interface in front of a NNTP gateway
    makes it pretty simple to have cross-platform user
    interfaces from any IMAP (eg, email) client.

    Then, my requirements include backfilling a store
    with the groups of interest for implementing summary
    and search for archival and research purposes.


    [2017/01/22]

    (About the 2nd law of thermodynamics, Moore's
    law, and the copper process with regards to the
    cross-talk about the VLSI or "ultra" VLSI or
    the epoch these days, and burning bits, what
    you might if interest is the development of
    the "reversible computing", which basically
    recycles the bits, and then also that besides
    the usual electronic transistor, and besides that
    today there can be free-form 3-D IC's or "custom
    logic", instead of just the planar systolic clock-
    driven chip, there are also "systems on chip" with
    regards to electron, photon, and heat pipes as
    about the photo-electic and Seebeck/Peltier,
    with various remarkably high efficiency models
    of computation, this besides the very novel
    serial and parallel computational units and
    logical machines afforded by 3-D IC' and optics.

    About "reasonably simple declaration of routine
    in commodity languages on commodity hardware
    for commodity engineers for enduring systems",
    at cost, see above.)


    [2017/02/07]

    Not _too_ much progress, has basically seen the adaptation
    of this re-routine pattern to the command implementations,
    with basically usual linear procedural logic then the
    automatic and agnostic composition of the asynchronous
    tasks in the usual declarative syntax that then the
    pooled (and to be metered) threads are possibly by
    design entirely non-blocking and asynchronous, and
    possibly by design blocking or otherwise agnostic of
    implementation, with then the design of the state
    machine of the routine as "eventually consistent"
    or forward and making efficient use of the computational
    and synchronization resources.

    The next part has been about implementing a client "machine"
    as complement to the server "machine", where a machine here
    is an assembly as it were of threads and executors about the
    "reactive" (or functional, event-driven) handling of the
    abstract system resources (small pojos, file name, and
    linked lists of 4K buffers). The server basically starts
    up listening on a port then accepts and starts a session
    for any connection and then a reader fills and moves buffers
    to each of the sessions of the connections, and signals the
    relay then for the scanning of the inputs and then composing
    the commands and executing those as these re-routines, that
    as they complete, then the results of the commands are then
    printed out to buffers (eg, encoded, compressed, encrypted)
    then the writer sends that back on the wire. The client
    machine then is basically a model of asynchronous and
    probably serial computation or a "web service call", these
    days often and probably on a pooled HTTP connections. This
    then is pretty simple with the callbacks and the addressing/
    routing of the response back to the re-routine's executor
    to then re-submit the re-routine to completion.

    I've been looking at other examples of continuations, the
    "reactive" programming or these days' "streaming model"
    (where the challenge is much in the aggregations), that
    otherwise non-blocking or asynchronous programming is
    often rather ... recursively ... rolled out where this
    re-routine gains even though the flow-of-control is
    re-executed over the memoized contents of the re-routines
    as they are so composed declaratively, that this makes
    what would be "linear" at worst "n squared", but that is
    only on how many commands there are in the procedure,
    not combined over their execution because all the
    intermediate results are memoized (as needed, because
    if the implementation is local or a mock instead, the
    re-routine is agnostic of asychronicity and just runs
    through linearly, but the relevant point is that the
    number of composable units is a small constant thus
    that it's square is a small constant, particularly
    as otherwise being a free model of cooperative multi-
    threading, here toward a lock-free design). All the
    live objects remain on the heap, but just the objects
    and not for example the stack as a serialized continuation.
    (This could work out to singleton literals or "coding"
    but basically it will have to auto-throttle off heap-max.)

    So, shuffling and juggling the identifiers and organizations
    around and sifting and sorting what elements of the standard
    concurrency and functional libraries (of, the "Java" language)
    to settle on for usual neat and concise (and re-usable and
    temporally agnostic) declarative flow-of-control (i.e., with
    "Future"'s everywhere and as about reasonable or least-surprising
    semantics, if any, with usual and plain code also being "in
    the convention"), then it is settling on a style.

    Well, thanks for reading, it's a rather stream-of-consciousness
    narrative, here about the design of pretty re-usable software.

    [2017/02/07]

    Sure, I'll limit this.

    There is plenty of usenet server software, but it is mostly
    INND or BNews/CNews, or a few commercial cousins. The design
    of those systems is tied to various economies that don't so much
    apply these days. (The use-case, of durable distributed message-
    passing, is still quite relevant, and there are many ecosystems
    and regimes small and large as about it.) In the days of managed
    commodity network and compute resources or "cloud computing", here
    as above about requirements, then a modernization is relevant, and
    for some developers with the skills, not so distant.

    Another point is that the eventual goal is archival, my goal isn't
    to start an offshoot, instead to build the system as a working
    model of an archive, basically from the author's view as a working
    store for extracting material, and from the developer's view as
    an example in design with low or no required maintenance and
    "scalable" operation for a long time.


    You mention comp.ai.philosophy, these days there's a lot more
    automated reasoning (or, mockingbird generators), as computing
    and development affords more and different forms of automated
    reasoning, here again the point is for an archival setting to
    give them something to read.

    Thanks, then, I'll limit this.

    [2017/03/21]

    I continued tapping away at this.

    The re-routines now sit beyond a module or domain definition.
    This basically defines the modules' value types like session,
    message, article, group, content, wildmat. Then, it also
    defines a service layer, as about the relations of the elements
    of the domain, so that then the otherwise simple value types
    have natural methods as relate them, all implemented behind
    a service layer, that implemented with these re-routines is
    agnostic of synchronous or asynchronous convention, and
    is non-blocking throughout with cooperative multithreading.
    This has a factory of factories or industry pattern that provides
    the object graph wiring and dynamic proxying to the routine
    implementations, that are then defined as traits, that the re-
    routine composes the routines as mixins (of the domain's
    services).

    (This is all "in the language" in Java, with no external dependencies.)

    The transport mechanism is basically having abstracted the
    attachment for a usual non-blocking I/O framework for the
    transport types as of the scattering/gathering or vector I/O
    as about then the interface between transport and protocol
    (here NNTP, but, generally). Basically in a land of 4K byte buffers,
    then those are fed from the Reader/Writer that is the endpoint to
    a Feeder/Scanner that is implemented for the protocol and usual
    features like encryption and compression, then making Commands
    and Results out of those (and modelling transactions or command
    sequences as state machines which are otherwise absent), those
    systolically carrying out as primitive or transport types to a Printer/ Hopper, that also writes the response (or rather, consumes the buffers
    in a highly concurrent highly efficient event and selection hammering).
    The selector is another bounded resource, so it's configurable the SelectorAssignment and there might be a thread for each group of
    selectors about FD_SETSIZE, but that's not really at issue as select
    went to epoll, but provides an option for that eventuality.

    The transport and protocol routines are pretty well decoupled this
    way, and then the protocol domain, modules, and routines are as
    well so decoupled (and fall together pretty naturally), much using
    quite usual software design patterns (if not necessarily so formally,
    quite directly).

    The protocol then (here NNTP) then is basically in a few files detailing
    the semantics of the commands to the scanner as overriding methods
    of a Command class, and implementing the action in the domain from
    extending the TraitedReRoutine then for a single definition in the NNTP domain that is implemented in various modules or as collections of
    services.


    [2017/04/09]

    I'm still tapping away at this if rather more slowly (or, more
    sporadically).

    The "re-routine" async completion pattern is more than less
    figured out (toward high concurrency as a model of cooperative multi-threading, behind also a pattern of a domain layer, with mix-in
    nyms that is also some factory logic), a simple non-blocking I/O socket service routine is more than less figured out (the server not the client, toward again high concurrency and flexible and efficient use of machine
    or virtualized resources as they are), the commands and their bodies are pretty much typed up, then I've been trying to figure out some data structures basically in I/O (Input/Output), or here mostly throughput
    as it is about the streams.

    I/O datum FIFOs and holders:

    buffer queue
    handles queue
    buffer+handles queue
    buffer/buffer[] or buffer[]/buffer in loops
    byte[]/byte[] in steps
    Input/Output in Streams

    Basically any of the filters or adapters is specialized to these
    input/output
    data holders. Then, there are logically enough queues or FIFOs as there are really implicitly between any communicating sequential processes that are rate-limited or otherwise non-systolic ("real-time"), here for some
    ideas about
    data structures, as either implement or adapt unbounded single producer/ single consumer (SPSC) queues.

    One idea is the making the linked container with then sentinel nodes
    and otherwise making it thread-safe (for a single producer and single consumer). This is where the queue (or, "monohydra" or "slique") is
    rather generally a container, and that here iterations are usually
    consuming the queue, but sometimes there are aggregates collected
    then to go over the queue. The idea then is that the producer and
    consumer have separate views of the queue that the producer does
    atomic swap on the tail of the queue and that a consumer's iterator
    of elements (as iterable and not just a queue, for using the queue as
    a holder and not just a FIFO) returns a marker to the end of the iteration, for example in computing bounds over the buffers then re-iterating and flipping the buffers then given the bounds moving the buffers' references
    to an output array thus consuming the FIFO.

    This then combines with the tasks that the tasks driving the I/O (as events drive the tasks) are basically constant tasks or runnables (constant to the session or attachment) that just have incremented a count of times to run thus that there's always a service of the FIFO after the atomic append.

    Another idea is this hybrid or serial mix-and-match (SPSC FIFO), of buffers and handles. This is where the buffer in the data in-line, the handle is a reference to the data. This is about passing through the handles where
    the channels support their transfer, and converting them to inline data
    where they don't. That's then about all the combined cases as the above
    I/O datum FIFOs and holders, with adapting them so the filter chain blasts (eg specialized operation), loops (transferring in and out of buffers),
    steps
    (statefully filling and levelling data), or moves (copying the
    references, the
    data in or out or on or off, then to perform the I/O operations) over them.

    It seems rather simpler to just adapt the data types to the boundary I/O
    data
    types which are byte buffers (here size-4K pooled memory buffers) and for that the domain shouldn't know concrete types so much as interfaces, but
    the buffers and handles (file handles) and arrays as they are are pretty
    much
    fungible to the serialization of the elements of the domain, that can then specialize how they build logical inputs and outputs of the commands.

    [2017/07/16]

    Implementing search is rather a challenge.

    Besides accepter/rejector and usual notions of matching
    (eg the superscalar on closed categories), find and query
    seems for where besides usual notions of object hashes
    as indices that there is to be built up from the accepter/
    rejector all sorts of indices as do/don't/don't-matter the
    machines of the accepters and rejectors, vis-a-vis going
    over input data and the corpus and finding relations (to
    the input, or here space of inputs), of the corpus.

    That's where, after finding an event for AP, whether
    you're interested in the next for him or the first
    for someone else. There are quite various ways to
    achieve those quite various goals, besides computing
    the first goal. Just as an example that's, for example,
    the first reasonable AP Maxwell equation (or reference)
    or for everybody else, like, who knows about the Maxwell
    equation(s).

    Search is a challenge, NNTP rather puts it off to IMAP first
    for free text search, then for the concept search or
    "call by meaning" you reference, basically refining
    estimates of the scope of what it takes to find out
    what that is.

    Then for events in time-series data there's a usual general
    model for things as they occur. That could be rather
    rich and where causal is separate from associative
    (though of course casuality is associative).

    With the idea of NNTP as a corpus, then a usual line
    for establishing tractability of search is to associate
    its contents some document then semantic model i.e.,
    then to generate and maintain that besides otherwise
    that the individual items or posts and their references
    in the meta-data besides the data are made tractable
    then for general ideas of things.

    I'm to get to this, the re-routine particularly amuses
    me as a programming idiom in the design of more-or-less
    detached service routine from the corpus, then about
    what body of data so more-than-less naturally results,
    with rather default and usual semantics.


    Such "natural language" meaning as can be compiled for
    efficiency to the very direct in storage and reference,
    almost then asks "what will AP come up with, next".

    [ page break 4 ]

    [2020/06/29]

    I haven't much worked on this. The idea of the industry
    pattern and for the re-routine makes for quite a bit simply
    the modules in memory or distributed and a default free-threaded
    machine.

    Search you mentioned and for example HTTP is adding the SEARCH verb,
    for example simple associative conditions that naturally only combine,
    and run in parallel, there are of course any number of whatever is the
    HTTP SEARCH implementations one might consider, here usenet's is
    rudimentary where for example IMAP over it is improved, what for
    contextual search and content representation.

    Information retrieval and pattern recognition and all that is
    plenty huge, here that terms define the corpus.

    My implementation of the high-performance selector routine,
    the networking I/O selector, with this slique I implemented,
    runs up and fine and great up to thousands of connections,
    but, it seems like running the standard I/O and non-blocking
    I/O in the same actual container, makes that I implemented
    the selecting hammering non-blocking I/O toward the 10KC,
    though it is is small blocks because here the messages are
    small, then for under what conditions it runs server class.

    With the non-blocking networking I/O, the scanning and parsing
    that assembles messages off the I/O, and that's after compression
    and encryption in the layers, that it's implemented in Java and
    Java does that, then inside that all the commands in the protocol
    then have their implementations in the re-routine, that all
    non-blocking itself and free-threaded, makes sense for
    co-operative multithreading, of an efficient server runtime
    with here the notion of a durable back-end (or running in memory).


    [2020/11/16]

    In traffic there are two kinds of usenet users,
    viewers and traffic through Google Groups,
    and, USENET. (USENET traffic.)

    Here now Google turned on login to view their
    Google Groups - effectively closing the Google Groups
    without a Google login.

    I suppose if they're used at work or whatever though
    they'd be open.



    Where I got with the C10K non-blocking I/O for a usenet server,
    it scales up though then I think in the runtime is a situation where
    it only runs epoll or kqueue that the test scale ups, then at the end
    or in sockets there is a drop, or it fell off the driver. I've implemented the code this far, what has all of NNTP in a file and then the "re-routine, industry-pattern back-end" in memory, then for that running usually.

    (Cooperative multithreading on top of non-blocking I/O.)

    Implementing the serial queue or "monohydra", or slique,
    makes for that then when the parser is constantly parsing,
    it seems a usual queue like data structure with parsing
    returning its bounds, consuming the queue.

    Having the file buffers all down small on 4K pages,
    has that a next usual page size is the megabyte.

    Here though it seems to make sense to have a natural
    4K alignment the file system representation, then that
    it is moving files.

    So, then with the new modern Java, it that runs in its own
    Java server runtime environment, it seems I would also
    need to see whether the cloud virt supported the I/O model
    or not, or that the cooperative multi-threading for example
    would be single-threaded. (Blocking abstractly.)

    Then besides I suppose that could be neatly with basically
    the program model, and its file model, being well-defined,
    then for NNTP with IMAP organization search and extensions,
    those being standardized, seems to make sense for an efficient
    news file organization.

    Here then it seems for serving the NNTP, and for example
    their file bodies under the storage, with the fixed headers,
    variable header or XREF, and the message body, then under
    content it's same as storage.

    NNTP has "OVERVIEW" then from it is built search.

    Let's see here then, if I get the load test running, or,
    just put a limit under the load while there are no load test
    errors, it seems the algorithm then scales under load to be
    making usually the algorithm serial in CPU, with: encryption,
    and compression (traffic). (Block ciphers instead of serial transfer.)

    Then, the industry pattern with re-routines, has that the
    re-routines are naturally co-operative in the blocking,
    and in the language, including flow-of-control and exception scope.


    So, I have a high-performance implementation here.

    [2020/11/16]

    It seems like for NFS, then, and having the separate read and write of
    the client,
    a default filesystem, is an idea for the system facility: mirroring the mounted file
    locally, and, providing the read view from that via a different route.


    A next idea then seems for the organization, the client views themselves organize over the durable and available file system representation, this provides anyone a view over the protocol with a group file convention.

    I.e., while usual continuous traffic was surfing, individual reads over
    group
    files could have independent views, for example collating contents.

    Then, extracting requests from traffic and threads seems usual.

    (For example a specialized object transfer view.)

    Making protocols for implementing internet protocols in groups and
    so on, here makes for giving usenet example views to content generally.

    So, I have designed a protocol node and implemented it mostly,
    then about designed an object transfer protocol, here the idea
    is how to make it so people can extract data, for example their own
    data, from a large durable store of all the usenet messages,
    making views of usenet running on usenet, eg "Feb. 2016: AP's
    Greatest Hits".

    Here the point is to figure that usenet, these days, can be operated
    in cooperation with usenet, and really for its own sake, for leaving
    messages in usenet and here for usenet protocol stores as there's
    no reason it's plain text the content, while the protocol supports it.

    Building personal view for example is a simple matter of very many
    service providers any of which sells usenet all day for a good deal.

    Let's see here, $25/MM, storage on the cloud last year for about
    a million messages for a month is about $25. Outbound traffic is
    usually the metered cloud traffic, here for example that CDN traffic
    support the universal share convention, under metering. What that
    the algorithm is effectively tunable in CPU and RAM, makes for under
    I/O that's it's "unobtrusive" or the cooperative in routine, for CPI I/O
    and
    RAM, then that there is for seeking that Network Store or Database Time instead effectively becomes File I/O time, as what may be faster,
    and more durable. There's a faster database time for scaling the ingestion here with that the file view is eventually consistent. (And reliable.)

    Checking the files would be over time for example with "last checked"
    and "last dropped" something along the lines of, finding wrong offsets, basically having to make it so that it survives neatly corruption of the store (by being more-or-less stored in-place).

    Content catalog and such, catalog.

    [2021/12/06]

    Then I wonder and figure the re-routine can scale.

    Here for the re-routine, the industry factory pattern,
    and the commands in the protocols in the templates,
    and the memory module, with the algorithm interface,
    in the high-performance computer resource, it is here
    that this simple kind of "writing Internet software"
    makes pretty rapidly for adding resources.

    Here the design is basically of a file I/O abstraction,
    that the computer reads data files with mmap to get
    their handlers, what results that for I/O map the channels
    result transferring the channels in I/O for what results,
    in mostly the allocated resource requirements generally,
    and for the protocol and algorithm, it results then that
    the industry factory pattern and making for interfaces,
    then also here the I/O routine as what results that this
    is an implementation, of a network server, mostly is making
    for that the re-routine, results very neatly a model of
    parallel cooperation.

    I think computers still have file systems and file I/O but
    in abstraction just because PAGE_SIZE is still relevant for
    the network besides or I/O, if eventually, here is that the
    value types are in the commands and so on, it is besides
    that in terms of the resources so defined it still is in a filesystem convention that a remote and unreliable view of it suffices.

    Here then the source code also being "this is only 20-50k",
    lines of code, with basically an entire otherwise library stack
    of the runtime itself, only the network and file abstraction,
    this makes for also that modularity results. (Factory Industry
    Pattern Modules.)

    For a network server, here, that, mostly it is high performance
    in the sense that this is about the most direct handle on the channels
    and here mostly for the text layer in the I/O order, or protocol layer,
    here is that basically encryption and compression usually in the layer,
    there is besides a usual concern where encryption and compression
    are left out, there is that text in the layer itself is commands.

    Then, those being constants under the resources for the protocol,
    it's what results usual protocols like NNTP and HTTP and other protocols
    with usually one server and many clients, here is for that these protocols are defined in these modules, mostly there NNTP and IMAP, ..., HTTP.

    These are here defined "all Java" or "Pure Java", i.e. let's be clear that
    in terms of the reference abstraction layer, I think computers still use
    the non-blocking I/O and filesystems and network to RAM, so that as
    the I/O is implemented in those it actually has those besides instead for example defaulting to byte-per-channel or character I/O. I.e. the usual semantics for servicing the I/O in the accepter routine and what makes
    for that the platform also provides a reference encryption implementation,
    if not so relevant for the block encoder chain, besides that for example compression has a default implementation, here the I/O model is as simply
    in store for handles, channels, ..., that it results that data
    especially delivered
    from a constant store can anyways be mostly compressed and encrypted
    already or predigested to serve, here that it's the convention, here is for resulting that these client-server protocols, with usually reads > postings then here besides "retention", basically here is for what it is.

    With the re-routine and the protocol layer besides, having written the routines in the re-routine, what there is to write here is this industry factory, or a module framework, implementing the re-routines, as they're built from the linear description a routine, makes for as the routine progresses
    that it's "in the language" and that more than less in the terms, it
    makes for
    implementing the case of logic for values, in the logic's
    flow-of-control's terms.

    Then, there is that actually running the software is different than just writing it, here in the sense that as a server runtime, it is to be made a thing, by giving it a name, and giving it an authority, to exist on the Internet.

    There is basically that for BGP and NAT and so on, and, mobile fabric networks,
    IP and TCP/IP, of course IPv4 and IPv6 are the coarse fabric main space,
    with
    respect to what are CIDR and 24 bits rule and what makes for TCP/IP, here entirely the course is using the TCP/IP stack and Java's TCP/IP stack, with respect to that TCP/IP is so provided or in terms of process what results ports mostly and connection models where it is exactly the TCP after the
    IP,
    the Transport Control Protocol and Internet Protocol, have here both this socket and datagram connection orientation, or stateful and stateless or
    here that in terms of routing it's defined in addresses, under that names
    and routing define sources, routes, destinations, ..., that routine numeric IP addresses result in the usual sense of the network being behind an IP
    and including IPv4 network fabric with respect to local routers.

    I.e., here to include a service framework is "here besides the routine,
    let's
    make it clear that in terms of being a durable resource, there needs to be some lockbox filled with its sustenance that in some locked or constant
    terms results that for the duration of its outlay, say five years, it is
    held
    up, then, it will be so again, or, let down to result the carry-over
    that it
    invested to archive itself, I won't have to care or do anything until
    then".


    About the service activation and the idea that, for a port, the routine itself
    needs only run under load, i.e. there is effectively little traffic on
    the old archives,
    and usually only the some other archive needs any traffic. Here the
    point is
    that for the Java routine there is the system port that was accepted for
    the
    request, that inetd or the systemd or means the network service was
    accessed,
    made for that much as for HTTP the protocol is client-server also for IP
    the
    protocol is client-server, while the TCP is packets. This is a general
    idea for
    system integration while here mostly the routine is that being a detail:
    the filesystem or network resource that results that the re-routines basically
    make very large CPU scaling.

    Then, it is basically containerized this sense of "at some domain name,
    there
    is a service, it's HTTP and NNTP and IMAP besides, what cares the world".

    I.e. being built on connection oriented protocols like the socket layer, HTTP(S) and NNTP(S) and IMAP(S) or with the TLS orientation to
    certificates,
    it's more than less sensible that most users have no idea of installing
    some
    NNTP browser or pointing their email to IMAP so that the email browser browses the newsgroups and for postings, here this is mostly only talk
    about implementing NNTP then IMAP and HTTP that happens to look like that, besides for example SMTP or NNTP posting.

    I.e., having "this IMAP server, happens to be this NNTP module", or
    "this HTTP server, happens to be a real simple mailbox these groups",
    makes for having partitions and retentions of those and that basically
    NNTP messages in the protocol can be more or less the same content
    in media, what otherwise is of a usual message type.

    Then, the NNTP server-server routine is the progation of messages
    besides "I shall hire ten great usenet retention accounts and gently
    and politely draw them down and back-fill Usenet, these ten groups".

    By then I would have to have made for retention in storage, such contents,
    as have a reference value, then for besides making that independent in reference value, just so that it suffices that it basically results "a
    usable
    durable filesystem that happens you can browse it like usenet". I.e. as
    the pieces to make the backfill are dug up, they get assigned reference numbers
    of their time to make for what here is that in a grand schema of things,
    they have a reference number in numerical order (and what's also the
    server's "message-number" besides its "message-id") as noted above this
    gets into the storage for retention of a file, while, most services for
    this
    are instead for storage and serving, not necessarily or at all retention.

    I.e., the point is that as the groups are retained from retention, there
    is an
    approach what makes for an orderly archeology, as for what convention
    some data arrives, here that this server-server routine is besides the
    usual
    routine which is "here are new posts, propagate them", it's "please deliver as of a retention scan, and I'll try not to repeat it, what results as orderly
    as possible a proof or exercise of what we'll call afterward entire retention",
    then will be for as of writing a file that "as of the date, from start
    to finish,
    this site certified these messages as best-effort retention".

    It seems then besides there is basically "here is some mbox file, serve it like it was an NNTP group or an IMAP mailbox", ingestion, in terms of that what is ingestion, is to result for the protocol that "for this protocol, there is actually a normative filesystem representation that happens to
    be pretty much also altogether definede by the protocol", the point is
    that ingestion would result in command to remain in the protocol,
    that a usual file type that "presents a usual abstraction, of a filesystem, as from the contents of a file", here with the notion of "for all these threaded discussions, here this system only cares some approach to
    these ten particular newgroups that already have mostly their corpus
    though it's not in perhaps their native mbox instead consulted from services".

    Then, there's for storing and serving the files, and there is the usual notion that moving the data, is to result, that really these file organizations
    are not so large in terms of resources, being "less than gigabytes" or so, still there's a notion that as a durable resource they're to be made
    fungible here the networked file approach in the native filesystem,
    then that with respect to it's a backing store, it's to make for that
    the entire enterprise is more or less to made in terms of account,
    that then as a facility on the network then a service in the network,
    it's basically separated the facility and service, while still of course
    that the service is basically defined by its corpus.


    Then, to make that fungible in a world of account, while with an exit strategy so that the operation isn't not abstract, is mostly about the
    domain name, then that what results the networking, after trusted
    network naming and connections for what result routing, and then
    the port, in terms of that there are usual firewalls in ports though that besides usually enough client ports are ephemeral, here the point is
    that the protocols and their well-known ports, here it's usually enough
    that the Internet doesn't concern itself so much protocols but with
    respect to proxies, here that for example NNTP and IMAP don't have
    so much anything so related that way after startTLS. For the world of account, is basically to have for a domain name, an administrator, and,
    an owner or representative. These are to establish authority for changes
    and also accountability for usage.

    Basically they're to be persons and there is a process to get to be an administrator of DNS, most always there are services that a usual person implementing the system might use, besides for example the numerical.

    More relevant though to DNS is getting servers on the network, with respect to listening ports and that they connect to clients what so discover
    them as
    via DNS or configuration, here as above the usual notion that these are standard services and run on well-known ports for inetd or systemd.
    I.e. there is basically that running a server and dedicated networking,
    and power and so on, and some notion of the limits of reliability, is then
    as very much in other aspects of the organization of the system, i.e.
    its name,
    while at the same time, the point that a module makes for that basically
    the provision of a domain name or well-known or ephemeral host, is the
    usual notion that static IP addresses are a limited resource and as about
    the various networks in IPv4 and how they route traffic, is for that these services have well-known sections in DNS for at least that the most usual configuration is none.

    For a usual global reliability and availability, is some notion
    basically that
    each region and zone has a service available on the IP address, for that "hostname" resolves to the IP addresses. As well, in reverse, for the IP address and about the hostname, it should resolve reverse to hostname.

    About certificates mostly for identification after mapping to port, or multi-home Internet routing, here is the point that whether the domain
    name administration is "epochal" or "regular", is that epochs are defined
    by the ports behind the numbers and the domain name system as well,
    where in terms of the registrar, the domain names are epochal to the registrar, with respect to owners of domain names.

    Then if DNS is a datagram or UDP service is for ICMP as for TCP/IP,
    and also BGP and NAT and routing and what are local and remote
    addresses, here is for not-so-much "implement DNS the protocol
    also while you're at it", rather for what results that there is a durable
    and long-standing and proper doorman, for some usenet.science.

    Here then the notion seems to be whether the doorman basically
    knows well-known services, is a multi-homing router, or otherwise
    what is the point that it starts the lean runtime, with respect to that
    it's a container and having enough sense of administration its operation
    as contained. I.e. here given a port and a hostname and always running
    makes for that as long as there is the low (preferable no) idle for
    services
    running that have no clients, is here also for the cheapest doorman that knows how to standup the client sentinel. (And put it back away.)

    Probably the most awful thing in the cloud services is the cost for
    data ingress and egress. What that means is that for example using
    a facility that is bound by that as a cost instead of under some constant cost, is basically why there is the approach that the containers needs a handle to the files, and they're either local files or network files, here with the some convention above in archival a shared consistent view
    of all the files, or abstractly consistent, is for making that the doorman can handle lots of starting and finishing connections, while it is out of
    the way when usually it's client traffic and opening and closing
    connections,
    and the usual abstraction is that the client sentinel is never off and doorman
    does nothing, here is for attaching the one to some lower constant cost, where for example any long-running cost is more than some low constant
    cost.

    Then, this kind of service is often represented by nodes, in the usual
    sense
    "here is an abstract container with you hope some native performance under the hypervisor where it lives on the farm on its rack, it basically is
    moved the
    image to wherever it's requested from and lives there, have fun, the
    meter is on".
    I.e. that's just "this Jar has some config conventions and you can make the container associate it and watchdog it with systemd for example and use the cgroups while you're at it and make for tempfs quota and also the best network
    file share, which you might be welcome to cache if you care just in the off-chance
    that this file-mapping is free or constant cost as long as it doesn't
    egress the
    network", is for here about the facilities that work, to get a copy of
    the system
    what with respect to its usual operation is a piece of the Internet.

    For the different reference modules (industry factories) in their
    patterns then
    and under combined configuration "file + process + network + fare", is that the fare of the service basically reflects a daily coin, in the sense
    that it
    represents an annual or epochal fee, what results for the time there is
    what is otherwise all defined the "file + process + network + name",
    what results it perpetuates in operation more than less simply and automatically.

    Then, the point though is to get it to where "I can go to this service, and administer it more or less by paying an account, that it thus lives in its budget and quota in its metered world".

    That though is very involved with identity, that in terms of "I the account as provided this sum make this sum paid with respect to an agreement",
    is that authority to make agreements must make that it results that the operation of the system, is entirely transparent, and defined in terms of
    the roles and delegation, conventions in operation.

    I.e., I personally don't want to administer a copy of usenet, but, it's
    here
    pretty much sorted out that I can administer one once then that it's to administer itself in the following, in terms of it having resources to allocate
    and resources to disburse. Also if nobody's using it it should basically
    work
    itself out to dial its lights down (while maintaining availability).

    Then a point seems "maintain and administer the operation in effect,
    what arrangement sees via delegation, that a card number and a phone
    number and an email account and more than less a responsible entity,
    is so indicated for example in cryptographic identity thus that the
    operation
    of this system as a service, effectively operates itself out of a kitty,
    what makes for administration and overhead, an entirely transparent
    model of a miniature business the system as a service".

    "... and a mailing address and mail service."

    Then, for accounts and accounts, for example is the provision of the component
    as simply an image in cloud algorithms, where basically as above here
    it's configured
    that anybody with any cloud account could basically run it on their own terms,
    there is for here sorting out "after this delegation to some business
    entity what
    results a corporation in effect, the rest is business-in-a-box and more-than-less
    what makes for its administration in state, is for how it basically
    limits and replicates
    its service, in terms of its own assets here as what administered is abstractly
    "durable forever mailboxes with private ownership if on public or
    managed resources".

    A usual notion of a private email and usenet service offering and business-in-a-box,
    here what I'm looking at is that besides archiving sci.math and copying
    out its content
    under author line, is to make such an industry for example here that
    "once having
    implemented an Internet service, an Internet service of them results Internet".

    I.e. here the point is to make a corporation and a foundation in effect,
    what in terms
    of then about the books and accounts, is about accounts for the business accounts
    that reflect a persistent entity, then what results in terms of
    computing, networking,
    and internetworking, with a regular notion of "let's never change this arrangement
    but it's in monthly or annual terms", here for that in overall
    arrangements,
    it results what the entire system more than less runs in ways then to
    either
    run out its limits or make itself a sponsored effort, about more-or-less
    a simple
    and responsible and accountable set of operations what effect the business (here that in terms of service there is basically the realm of agreement) that basically this sort of business-in-a-box model, is then besides
    itself of
    accounts, toward the notion as pay-as-you-go and "usual credits and
    their limits".

    Then for a news://usenet.science, or for example sci.math.usenet.science,
    is the idea that the entity is "some assemblage what is so that in DNS,
    and,
    in the accounts payable and receivable, and, in the material matters of arrangement and authority for administration, of DNS and resources and accounts what result durably persisting the business, is basically for a service
    then of what these are usual enough tasks, as that are interactive
    workflows
    and for mechanical workflows.

    I.e. the point is for having the service than an on/off button and more
    or less
    what is for a given instance of the operation, what results from some protocol
    that provides a "durable store" of a sort of the business, that at any
    time basically
    some re-routine or "eventually consistent" continuance of the operation
    of the
    business, results basically a continuity in its operations, what is
    entirely granular,
    that here for example the point is to "pick a DNS name, attach an
    account service,
    go" it so results that in the terms, basically there are the
    placeholders of the
    interactive workflows in that, and as what in terms are often for
    example simply
    card and phone number terms, account terms.

    I.e. a service to replenish accounts as kitties for making accounts only
    and
    exactly limited to the one service, its transfers, basically results
    that there
    is the notion of an email address, a phone number, a credit card's information,
    here a fixed limit debit account that works as of a kitty, there is a
    regular workflow
    service that will read out the durable stores and according to the
    timeliness of
    their events, affect the configuration and reconciliation of payments
    for accounts
    (closed loop scheduling/receiving).

    https://datatracker.ietf.org/doc/draft-flanagan-regext-datadictionary/ https://www.rfc-editor.org/rfc/rfc9022.txt

    Basically for dailies, monthlies, and annuals, what make weeklies,
    is this idea of Internet-from-a- account, what is services.

    [ page break 5 ]


    [2023/03/08]

    After implementing a store, and the protocol for getting messages, then
    what seems relevant here in the
    context of the SEARCH command, is a fungible file-format, that is
    derived from the body of the message
    in a normal form, that is a data structure that represents an index and catalog and dictionary and summary
    of the message, a form of a data structure of a "search index".

    These types files should naturally compose, and result a data structure
    that according to some normal
    forms of search and summary algorithms, result that a data structure
    results, that makes for efficient
    search of sections of the corpus for information retrieval, here that "information retrieval is the science
    of search algorithms".

    Now, for what and how people search, or what is the specification of a search, is in terms of queries, say,
    here for some brief forms of queries that advise what's definitely
    included in the search, what's excluded,
    then perhaps what's maybe included, or yes/no/maybe, which makes for a predicate that can be built,
    that can be applied to results that compose and build for the terms of a filter with yes/no/maybe or
    sure/no/yes, with predicates in values.

    Here there is basically "free text search" and "matching summaries",
    where text is the text and summary is
    a data structure, with attributes as paths the leaves of the tree of
    which match.

    Then, the message has text, its body, and and headers, key-value pairs
    or collections thereof, where as well
    there are default summaries like "a histogram of words by occurrence" or
    for example default text like "the
    MIME body of this message has a default text representation".

    So, the idea developing here is to define what are "normal" forms of
    data structures that have some "normal"
    forms of encoding that result that these "normalizing" after "normative"
    data structures define well-behaved
    algorithms upon them, which provide well-defined bounds in resources
    that return some quantification of results,
    like any/each/every/all, "hits".

    This is where usually enough search engines' or collected search
    algorithms ("find") usually enough have these
    de-facto forms, "under the hood", as it were, to make it first-class
    that for a given message and body that
    there is a normal form of a "catalog summary index" which can be
    compiled to a constant when the message
    is ingested, that then basically any filestore of these messages has alongside it the filestore of the "catsums"
    or as on-demand, then that any algorithm has at least well-defined
    behavior under partitions or collections
    or selections of these messages, or items, for various standard
    algorithms that separate "to find" from
    "to serve to find".

    So, ..., what I'm wondering are what would be sufficient normal forms in brief that result that there are
    defined for a given corpus of messages, basically at the granularity of messages, how is defined how
    there is a normal form for each message its "catsum", that catums have a natural algebra that a
    concatenation of catums is a catsum and that some standard algorithms naturally have well-defined
    results on their predicates and quantifiers of matching, in serial and parallel, and that the results
    combine in serial and parallel.

    The results should be applicable to any kind of data but here it's more
    or less about usenet groups.

    [2023/03/08]

    So I start browsing the Information Retrieval section in Wikipedia and
    more or less get to reading
    Luhn's 1958 "automatic coding of document summaries" or "The Automatic Creation of Literature
    Abstracts". Then, what I figure, is that the histogram, is an
    associative array of keys to counts,
    and what I figure is to compute both the common terms, and, the rare
    terms, so that there's both
    "common-weight" and "rare-weight" computed, off of the count of the
    terms, and the count of
    distinct terms, where it is working up that besides catums, or catsums,
    it would result a relational
    algebra of terms in, ..., terms, of counts and densities and these type things. This is where, first I
    would figure the catsum would be deterministic before it's at all probabilistic, because the goal is
    match-find not match-guess, while still it's to support the less deterministic but more opportunistic
    at the same time.

    Then, the "index" is basically like a usual book's index, for each term that's not a common term in
    the language but is a common term in the book, what page it's on, here
    that that is a read-out of
    a histogram of the terms to pages. Then, compound terms, basically get
    into grammar, and in terms
    of terms, I don't so much care to parse glossolalia as what result
    mostly well-defined compound terms
    in usual natural languages, for the utility of a dictionary and
    technical dictionaries. Here "pages" are
    both according to common message threads, and also the surround of
    messages in the same time
    period, where a group is a common message thread and a usenet is a
    common message thread.

    (I've had a copy of "the information retrieval book" before, also
    borrowed one "data logic".)

    "Spelling mistakes considered adversarial."

    https://en.wikipedia.org/wiki/Subject_indexing#Indexing_theory

    Then, there's lots to be said for "summary" and "summary in statistic".


    A first usual data structure for efficiency is the binary tree or
    bounding tree. Then, there's
    also what makes for divide-and-conquer or linear speedup.


    About the same time as Luhn's monograph or 1956, there was published a
    little book
    called "Logic and Language", Huppe and Kaminsky. It details how
    according to linguistics
    there are certain usual regular patterns of words after phonemes and morphology what
    result then for stems and etymology that then for vocabulary that
    grammar or natural
    language results above. Then there are also gentle introductions to
    logic. It's very readable
    and quite brief.


    [2023/04/29]

    I haven't much been tapping away at this,
    but it's pretty simple to stand up a usenet peer,
    and pretty simple to slurp a copy,
    of the "Big 8" usenet text groups, for example,
    or particularly just for a few.

    [2023/12/22]

    Well, I've been thinking about this, and there are some ideas.

    One is about a system of reputation, the idea being
    New/Old/Off/Bad/Bot/Non,
    basically figuring that reputation is established by action.

    Figuring how to categorize spam, UCE, vice, crime, and call that Bad, then gets into basically two editions, with a common backing, Cur (curated)
    and Raw,
    with Old and New in curated, and Off and Bot a filter off that, and Bad
    and Non
    excluded, though in the raw feed. Then there's only to forward what's curated,
    or current.

    Here the idea is that New graduates to Old, Non might be a
    false-negative New,
    but is probably a negative Bad or Off, and then Bot is a sort of honor system, and
    Old might wander to Off and vice-versa, then that Off and Old can
    vacillate.

    Then for renditions, is basically that the idea is that it's the same
    content
    behind NNTP, with IMAP, then also an HTTP gateway, Atom/RDF feed, ....

    (It's pretty usually text-only but here is MIME.)

    There are various ways to make for posting that's basically for that Old
    can post what they want, and Off, then for something like that New,
    gets an email in reply to their post, that they reply to that, to
    round-trip a post.

    (Also mail-to-news and news-to-mail are pretty usual. Also there are
    notions of humanitarian inputs.)

    Similarly there are the notions above about using certificates and TLS to
    use technology and protocol to solve technology protocol abuse problems.

    For surfacing the items then is about technologies like robots.txt and
    Dublin Core metadata, and similar notions with respect to uniqueness.
    If you have other ideas about this, please chime in.

    Then for having a couple sorts of organizations of both the domain name
    and the URL's as resources, makes for example for sub-domains for groups,
    for example then with certificate conventions in that, then usual sorts of URL's that are, you know, URL's, and URN's, then, about URL's, URI's,
    and URN's.

    Luckily it's all quite standardized so quite stock NTTP, IMAP, and HTTP browsers,
    and about SMTP and IMAP, and with TLS, make of course a fungible sort of system.


    How to pay for it all? At about $500 a year for all text usenet,
    about a day's golf foursome and a few beers can stand up a new Usenet peer.

    [2024/01/22]

    Basically thinking about a "backing file format convention".

    The message ID's are universally unique. File-systems support various
    counts and depths
    of sub-directories. The message ID's aren't necessarily opaque
    structurally as file-names.
    So, the first thing is a function that given a message-ID, results a message-ID-file-name.

    Then, as it's figured that groups, are, separable, is about how, to,
    either have all the
    messages in one store, or, split it out by groups. Either way the idea
    is to convert the
    message-ID-file-name, to a given depth of directories, also legal in
    file names, so it
    results that the message's get uniformly distributed in sub-directories
    of approximately
    equal count and depth.

    A....D...G <- message-ID

    ABCDEFG <- message-ID-file-name

    /A/B/C/D/E/F/ABCDEFG <- message-ID-directory-path

    So, the idea is that the backing file format convention, basically
    results uniform lookup
    of a file's existence, then about ingestion and constructing a message,
    then, moving
    that directory as a link in the filesystem, so it results atomicity in
    the file system that
    supports that the existence of a message-ID-directory-path is a function
    of message-ID,
    and usual filesystem guarantees.



    About the storage of the files, basically each message is only "header + body". Then,
    when the message is served, then it has appended to its header the
    message numbers
    according to the group, "header + numbers + body".

    So, the idea is to store the header and body compressed with deflate,
    then that there's
    a pretty simple implementation of a first-class treatment of deflated
    data, to compute
    the deflated "numbers" on demand, and result that concatenation results "header + numbers
    + body". It's figured that clients would either support deflated,
    compressed data natively,
    or, that the server would instead decompress data is compression's not supported, then
    figuring that otherwise the data's stored at-rest as compressed. There's
    an idea that the
    entire backing could be stored partially encrypted also, at-rest, but
    that would be special-purpose,

    The usual idea that the backing-file-format-convention, is a physical interface for all access,
    and also results that tar'ing that up to a file results a transport file also, and that, simply
    the backing-file-formats can be overlaid or make symlinks farms together
    and such.


    There's an idea then to make metadata, of, the, message-date, basically
    to have partitions
    by day, where Jan 1 2020 = Jan 1 1970 - 18262,

    YYYY/MM/DD/A/B/C/D/E/F/ABCDEFG -> symlink to /A/B/C/D/E/F/ABCDEFG/


    This is where, the groups' file, which relate their message-numbers to message-ID's, only
    has the message-numbers, vis-a-vis, browsing by date, in terms of,
    taking the intersection
    of message-numbers' message-ID's and time-partitions' message-ID's.


    Above, the idea of the groups file, is that message-ID's have a limit,
    and that, the groups file,
    would have a fixed-size or fixed-length record, with the index and message-number being the offset,
    and the record being the message-ID, then its header and body accessed
    as the message-ID-directory-path.

    So, toward working out a BFF convention is to make it possible that file operation tools
    like tar and cp and deflate and other usual command line tools, or facilities, make it so that
    then while there should be a symlink free approach, also then as to how
    to employ symlinks,
    with regards to usual indexes from axes of access to enumeration.

    As above then I'm wondering to figure out how to make it so, that for something like a mailbox format,
    then to have that round-trip from BFF format, but mostly how to make it
    so that any given collection
    of messages, given each has a unique ID, and according to its headers
    its groups and injection date,
    it results an automatic sort of building or rebuilding then the groups
    files.

    Another key sort of thing is the threading. Also, there is to be
    consider the multi-post or cross-post.


    Then, for metadata, is the idea of basically into supporting the
    protocol's overview and wildmat,
    then for the affinity with IMAP, then up into usual notions of
    key-attribute filtering, and as with
    regards to full content search, about a sort of "search file format", or indices, again with the goal
    of that being fungible variously, and constructible according to simple bounds, and, resulting
    that the goal is to reduce the size of the files at rest, figuring
    mostly the files at rest aren't accessed,
    or when they are, they're just served raw as compressed, because
    messages once authored are static.

    That said, the groups their contents grow over time, and also there is
    for notions of no-archive
    and retention, basically about how to consider that in those use cases,
    to employ symlinks,
    which result natural partitions, then to have usual rotation of
    truncation as deleting a folder,
    invalidating all the symlinks to it, then a usual handler of ignoring
    broken symlinks, or deleting them,
    so that maintenance is simple along the lines of "rm group" or "rm year".

    So, there's some thinking involved to make it so the messages each, have their own folders,
    and then parts in those, as above, this is the thinking here along the
    lines of "BFF/SFF",
    then for setting up C10+K servers in front of that for NNTP, IMAP, and a simple service
    mechanism for surfacing HTTP, these kinds of things. Then, the idea is
    that metadata
    gets accumulated next to the messages in their folders, then those also
    to be concatenable,
    to result that then for search, that corpuses or corpi are built off
    those intermediate data,
    for usual searches and specialized searches and these kinds things.

    Then, the idea is to make for this BFF/SFF convention, then to start gathering "certified corpi"
    of groups over time, making for those then being pretty simply
    distributable like the old
    idea of an mbox mailbox format, with regards to that being one file that results the entire thing.

    Then, threads and the message numbers, where threading by message number
    is the

    header + numbers + body

    the numbers part, sort of is for open and closed threads, here though of course that threads
    are formally always open, or about composing threads of those as over
    them being partitioned
    in usual reasonable times, for transient threads and long-winded threads
    and recurring threads.



    Then, besides "control" and "junk" and such or relating administration,
    is here for the sort
    of minimal administration that results this NOOBNB curation. This and
    matters of relay
    ingestion and authoring ingestion and ingestion as concatenation of BFF files,
    is about these kinds of things.

    [2024/01/22]

    The idea of "NOOBNB curation" seems a reasonable sort of simple-enough
    yet full-enough way to start building a usual common open messaging system, with as well the omission of the overall un-wanted and illicit.

    The idea of NOOBNB curation, is that it's like "Noob NB: Nota Bene for Noobs",
    with splitting New/Old/Off or "NOO" and Bot/Non/Bad or BNB, so that the curation
    delivers NOO, or Nu, while the raw includes be-not-back, BNB.

    So, the idea for New/Old/Off, is that there is Off traffic, but, "caveat lector",
    reader be aware, figuring that people can still client-side "kill file"
    the curated feed.

    Then, Bot/Non/Bad, basically includes that Bot would include System Bot,
    and Free Bot,
    sort of with the idea of that if Bots want feed then they get raw, while System Bot can
    post metadata of what's Bot/Non/Bad and it gets simply excluded from the curated.

    Then, for this it seems the axis of metadata is the Author, about the relation of Authors
    to posts. I.e. it's the principal metadata axis of otherwise posts, individual messages.

    Here the idea is that generally that once some author's established as
    "Old", then
    they always go into NOO, as either Old or Off, while "New" is the establishment
    of this maturity, to at least follow the charter and otherwise for take-it-or-leave-it.


    Then, "Non" is basically that "New", according to Author, basically
    either gets accepted,
    or not, according to what must be some "objective standards of
    topicality and etiquette".

    Then "Bad" is pretty much that anybody who results Bad basically gets
    marked Bad.

    Now, it's a temporal thing, and it's possible that attacks would result
    false positives
    and false negatives, a.k.a. Type I and Type II errors. There's a general
    idea to attenuate
    "Off" and even "Bad", figuring "Off" reverts to "Old" and "Bad" reverts
    to "Non", according
    to Author, or for example "Injection Site".


    Then, for the posting side, there are some things involved. There are
    legal things involved,
    illicit content or contraband, have some safe harbor provisions in usual first-world countries,
    vis-a-vis, for example, the copyright claim. Responsiveness to copyright claims, would basically
    be marking spammers of warez as Bad, and not including them in the
    curated, that being figured
    the extent of responsibility.

    There's otherwise a usual good-faith expectation of fair-use, intellectual-property wise.


    Otherwise then it's that "Usenet the protocol relies on email identity".
    So, the idea to implement
    that posts round-trip through email, is considered the bar.

    Here then furthermore is considered how to make a general sort of Injection-Site algorithm,
    in terms of peering or peerages, and compeering, as with respect to
    Sites, their policies, and then
    here with respect to the dual outfeeds, curated and raw, figuring
    curated is good-faith and raw,
    includes garbage, or for example to just pipe raw to /dev/null, and for automatically curating in-feed.

    The idea is to support establishment of association of an e-mail
    identity, so that a usual sort
    of general-purpose responsible algorithm, can work up various factors authentication, in
    the usual notions of authentication AuthN and authorization AuthZ, with respect to
    login and "posting allowed", or as via delegation in what's called
    Federated identity,
    that resulting being the responsibility of peers, their hosts, and so on.

    Then, about that for humanitarian and free-press sorts reasons,
    "anonymity", well first
    off there's anonymity is not part of the charter, and indeed the charter
    says to use
    your real name and your real e-mail address. I.e., anonymity on the one
    has a reasonable
    sort of misdirection from miscreants attacking anybody, on the other
    hand those same
    sorts miscreants abuse anonymity, so, here it's basically the idea that "NOOBNB" is a very
    brief system of reputation as of the vouched identity of an author by
    email address,
    or the opaque value that results gets posted in the sender field by
    whatever peer injects whatever.

    How then to automatically characterize spam and the illicit is sort of a thing,
    while that the off-topic but otherwise according to charter including
    the spirit
    of the charter as free press, with anonymity to protect while not
    anonymity to attack,
    these are the kinds of things that help make for that "NOOBNB curation",
    is to result
    a sort of addendum to Usenet charter, that results though same as the
    old Usenet charter.

    Characterization could include for example "MIME banned", "glyph ranges banned",
    "subjects banned", "injection sites banned", these being open then so
    that legitimate
    posters run not afoul, that while bad actors could adapt, then they
    would get funneled
    into "automatic semantic text characterization bans".

    The idea then is that responsible injection sites will have measures in
    place to prevent
    "Non" authors from becoming "New" authors, those maturing, "Old" and
    "Off" post freely,
    that among "Bot" is "System Bot" and "Tag Bot", then that according to algorithms in
    data in the raw Bot feed, is established relations that attenuate to Bad
    and Non,
    so that it's a self-describing sort of data set, and peers pick up
    either or both.


    Then the other key notion is to reflect an ID generator, so that, every
    post, gets
    exactly and uniquely, one ID, identifier, a global and universally
    unique identifer.
    This was addressed as above and it's a usual notion of a common
    facility, UUID dispenser.
    The idea of identifying those over times, is for that over the corpus,
    is established
    a sort of digit-by-digit stamp generator, to check for IDs over the
    entire corpus,
    or here a compact and efficient representation of same, then for issuing ranges,
    for usual expectations of the order of sites on the order of posters the order of posts.

    Luckily it's sort of already the case that all the messages already do
    have unique ID's.

    "Usenet: it has a charter."

    [2024/01/23]

    About build-time and run-time, here the idea is to make some specifications what reflect the BFF/SFF filesystem and file-format conventions, then to
    make it so that algorithms and servers run on those, as then with respect
    to reference implementations, and specification conformance, of the client protocols, and the server and service protocols, what are all pretty much standardized, inside and outside, usual sorts Internet text protocols,
    and usual sorts data facilities.

    I figure the usual sort of milieu these days for common, open systems,
    is something like "Git Web", or otherwise in terms of git hosting,
    in terms of that it's an idea that setting up a git server, makes it
    pretty simple to clone code and so on. I'm most familiar with this
    tooling compared to RCS, CVS, svn, hg, tla, arch, or other sorts usual "source control", systems. Most people might know: git.


    So, the idea is to make reference implementations in various editions of tooling,
    that result the establishment of the common backing, this filesystem convention
    or BFF the backing file-format, best friends forever, then basically
    about making
    for their being cataloged archives of groups their messages in
    time-series data,
    then to simply start a Usenet archive by concatenating those together as overlaying
    them, then as to generating the article numbers, as where the article
    numbers are
    specific to the installation, where there are globally unique IDs of message-IDs,
    then article numbers indicate the server's handles to messages by group.

    The sources of reference implementations of services and algorithms are sources
    and go in source control, but the notion of archives fungibly in BFF files, represent static assets for where a given corpus of a month's messages basically represent the entirety, or what "25 million messages" is,
    vis-a-vis
    low-volume groups like Big 8 text Usenet, and here curated and raw feeds after NOOBNB.

    So, there's a general idea to surface the archive files, those being
    fungible anyways,
    then some bootstrap scripts in terms of data-install and code-install,
    for config/code/data,
    so that anybody can rent a node, clone these scripts, download a year's Usenet,
    run some scripts if to setup SFF files, then launch a Usenet service.

    So, that is about common sources and provisioning of code and data.

    The compeering then is the other idea about the usual idea of pull and
    push feeds,
    and suck feeds, where NNTP is mostly push feeds, and compeers are
    expected to
    be online and accept CHECK, IHAVE, and TAKETHIS, and these kinds
    use-cases of
    ingestion, of the propagation of posts.

    There's a notion of a sort of compeering topology, basically in terms of
    "the lot of us
    will hire each some introductory resources, and use them up, passing
    around the routing
    according to DNS, what serves making ingress and egress, from a named Internet protocol port".

    https://datatracker.ietf.org/doc/html/rfc3977 https://datatracker.ietf.org/doc/html/rfc4644


    (Looking at WILDMAT, it's cool that a sort of this yes/no/maybe or sure/no/yes, which
    is a sort of very composable filtering. I sort of invented one of those
    for rich front-end
    data tables since looking at the specs here, "filterPredicate",
    composable, front-end/back-end,
    yes/no/maybe.)

    I.e., NNTP has a static (network) topology, expecting peers to be online usually, while here
    the idea is that "compeering", will include push and pull, about the "X-RETRANSFER-TO",
    and along the lines of the Message Transfer Agent, queuing messages for opportunistic
    delivery, and in-line with the notions of e-mail traditionally and the functions of DNS and
    the Internet protocols.

    https://datatracker.ietf.org/doc/html/rfc4642 https://datatracker.ietf.org/doc/html/rfc1036 https://datatracker.ietf.org/doc/html/rfc2980 https://datatracker.ietf.org/doc/html/rfc4644 https://datatracker.ietf.org/doc/html/rfc4643

    This idea of compeering sort of results that as peers come online, then
    to start
    in the time-series data of the last transmission, then to launch a push
    feed
    up to currency. It's similar with that simply being periodic in
    real-time (clock time),
    or message-driven, pushing messages as they arrive.

    The message feeds in-feeds and out-feeds reflect sorts of system accounts
    or peering agreements, then for the compeering to establish what are the topologies, then for something like a message transfer agent, to fill a basket
    with the contents, for caches or a sort of lock-box approach, as well
    aligned
    with SMTP, POP3, IMAP, and other Internet text protocols of messaging.

    The idea is to implement some use-cases of compeering, with e-mail,
    news2mail and mail2news, as the Internet protocols have high affinity
    for each other, and are widely implemented.

    So, besides the runtime (code and data, config), then is also involved
    the infrastructure,
    resources of the runtime and resources of the networking. It's pretty
    simple to write
    code and not very difficult to get data, then infrastructure gets into
    cost. This was
    described above as the idea of "business-in-a-box".

    Well, tapping away at this, ....


    [ page break 6 ]

    [2024/01/24]

    Yeah, when there's a single point of ingress, is pretty much simpler than when there's federated ingress, or here NNTP peerage, vis-a-vis a site's
    own postings.

    Here it's uncomplicated when all messages get propagated to all peers,
    with the idea that NOOBNB pattern is going to ingest raw and result curated (curated, cured, cur).


    How to figure out for each incoming item, whether to have System Tag Bot result appending another item marking it, or, just storing a stub for the item as excluded, gets into "deep inspection", or as related to the things.

    Because Usenet is already an ongoing concern, it's sort of easy to identify old posters already, then about the issue of handling New/Non, and as
    with regards to identifying Bad, as what it results Cur is New/Old/Off
    and Raw includes Bot/Non/Bad, or rather that it excludes Bot/Non/Bad,
    with regards to whether the purpose of Bot is to propagate Bans.


    It's sort of expected that the Author field makes for a given Author,
    but some posters for example mutilate the e-mail address or result
    something non-unique. Disambiguating those, then, is for the idea
    that either the full contents of the Author field make a thing or that otherwise Authors would need to make some way to disambiguate Sender.

    About propagation and stubbing, the idea is that propagation should
    generally result, then that presence of articles or stubs either way
    results the relevant response code, as with regards to either
    "propagating raw including Non and Bad" or just "propagating Raw
    only Non-Tag and Bad-Tag Tag-Bot, generated messages", basically
    with the idea of semantics of "control" and "junk", or "just ignore it".


    The use case of lots of users of Usenet isn't a copy of Usenet, just
    a few relevant groups. Others for example appreciate all the _belles
    lettres_
    of text, and nothing from binaries. Lots of users of Usenet have it
    as mostly a suck-feed of warez and vice. Here I don't much care about
    except _belles lettres_.


    So, here NOOBNB is a sort of white-list approach, because Authors is
    much less than messages, to relate incoming messages, to Authors,
    per group, here that ingestion is otherwise constant-rate for assigning numbers in the groups a message is in, then as with regards to threading
    and bucketing, about how to result these sorts ideas sort of building up
    from "the utility of bitmaps" to this "patterns in range" and "region calculus",
    here though what's to result partially digested intermediate results for an overall concatenation strategy then for selection and analysis,
    all entirely write-once-read-many.

    It's figured that Authors will write and somebody will eventually read
    them,
    with regards to that readings and replies result the Author born as New
    and then maturing to Old, what results after Author infancy, to result
    a usual sort of idea that Authors that read Bad are likely enough Bad themselves.

    I.e., there's a sort of hysteresis to arrive at born as New, in a group,
    then a sort of gentle infancy to result Old, or Off, in a group, as
    with regards to the purgatory of Non or banishment of Bad.

    happy case:
    Non -> New -> Old (good)
    Non -> Bad (bad)

    Old -> Off
    Off -> Old


    The idea's that nobody's a moderator, but anybody's a reviewer,
    and correspondent, then that correspondents to spam or Bad get
    the storage of a signed quantity, about the judgment, of what
    is spam, in the error modes.

    error modes:
    Non -> false New
    Non -> false not Bad


    New -> Bad
    Old -> Bad

    (There's that reviewers and correspondents
    Old <-> Old
    Off <-> Old
    Old <-> Off
    Off <-> Off
    result those are all same O <-> O.)

    The idea's that nobody's a moderator, and furthermore then all
    the rules of the ignorance of Non and banishment of Bad,
    then though are as how to arrive at that Non's, get a chance
    to be reviewed by Old/Off and New, with respect to New and New
    resulting also the conditions of creation, of a group, vis-a-vis,
    the conditions of continuity, of a group.


    I.e. the relations should so arise that creating a group and posting
    to it, should result "Originator" or a sort of class of Old, about these ideas of the best sort of reasonable performance and long-lived scalability and horizontal scalability, that results interpreting any usual sort of messaging with message-ID's and authors, in a reference algorithm
    and error-detection and error-correction, "NOOBNB".

    There's an idea that Bot replies to new posters, "the Nota Bene",
    but, another that Bot replies to Non and Bad, and another that
    there's none of that at all, or not guaranteed.


    Then, the idea is that this is matters of convention and site policy,
    what it results exactly the same as a conformant Usenet peer,
    in "NOOBNB compeering: slightly less crap".


    Then, getting into relating readings (reviews) and correspondence
    as a matter of site policy in readings or demonstration in correspondence, results largely correspondence discriminates Old from Bad, and New from
    Non.

    Then as "un-moderated" there's still basically "site-policy",
    basically in layers that result "un-abuse", "dis-abuse".

    I.e. the disabusement of abuse, is of this Old <-> Off for the venial,
    and about the ceremony of infancy via some kind of interaction
    or the author's own origination, about gating New, then figuring
    that New matures to Old and then the compute cost is on News,
    that long-running conversations result constants, called stability.

    Well I'm curious your opinion of this sort of approach, it's basically
    as of
    defining conventions of common messaging, what result a simplest
    and most-egalitarian common resource of correspondents in _belles lettres_.

    [2024/01/24]

    Then it seems the idea is to have _three_ editions,

    Cur: current, curated, New/Old/Off
    Pur: purgatory, Non/New/Old/Off
    Raw: raw, Non/New/Old/Off/Bot/Bad

    Then, the idea for bot, seems to be for system, to have delegations,
    of Bot to Old, with respect to otherwise usually the actions of Old,
    to indicate correspondence.

    Then, with regards to review, it would sort of depend on some Old
    or Off authors reviewing Pur, with regards to review and/or correspondence, what results graduating Non to New, then that it results that
    there's exactly a sort of usual write-once-read-many, common
    backing store well-defined by presence in access (according to filesystem).



    Then, for the groups files, it's figured there's the main message-Id's,
    as with respect to cur/pur/raw, then with regards to author's on the
    groups, presence in the authors files indicating Old, then with regards
    to graduation Non to New and New to Old.

    Keeping things simple, then the idea is to make it so that usual New
    have a way to graduate from Non, where there is or isn't much traffic
    or is or isn't much attention paid to Pur.

    The idea is that newbies log on to Pur, then post there on their own
    or in replies to New/Old/Off, that thus far this is entirely of a monadic
    or pure function the routine, which is thusly compile-able and parallelizable,
    and about variables in effect, what result site policy, and error modes.


    There's an idea that Non's could reply to their own posts,
    as to eventually those graduating altogether, or for example
    just that posting is allowed, to Pur, until marked either New or Bad.


    The ratio of Bad+Non+Bot to Old+Off+New, basically has that it's figured
    that due to attacks like the one currently underway from Google Groups,
    would be non-zero. The idea then is whether to grow the groups file,
    in the sequence of all message-IDs, and whether to maintain one edition
    of the groups file, and ever modify it in place, that here the goal is instead
    growing files of write-once-read-many, and because propagation is
    permanent.

    Raw >= Pur >= Cur

    I.e., every message-id gets a line in the raw feed, that there is one,
    then as
    with regards to whether the line has reserved characters, where otherwise it's a fixed-length record up above the maximum length of message-id,
    the line, of the groups file, the index of its message-numbers.


    See, the idea here is a sort of reference implementation, and a
    normative implementation,
    in what are fungible and well-defined resources, here files, with
    reasonable performance
    and horizontal scale-ability and long-time performance with minimal or monotone maintenance.

    Then the files are sort of defined as either write-once and final or write-once and growing,
    given that pretty much unbounded file resources result a quite most
    usual runtime.



    Don't they already have one of these somewhere?


    [2024/01/26]

    I suppose the idea is to have that Noobs post to alt.test, then as with regards to
    various forms to follow, like:

    I read the charter
    I demonstrated knowledge of understanding the charter's definitions and intent
    I intend to follow the charter

    How I do or don't is my own business, how others do or don't is their
    own business

    I can see the exclusion rules
    I understand not to post against the exclusion rules
    I understand that the exclusion rules are applied unconditionally to all

    ... is basically for a literacy test and an etiquette assertion.


    Basically making for shepherding Noobs through alt.test, or that people
    who post
    in alt.test aren't Noobs, yet still I'm not quite sure how to make it
    for usual first-time
    posters, how to get them out of Purgatory to New. (Or ban them to Bad.)

    This is where federated ingestion basically will have that in-feeds are either

    these posts are good,
    these posts are mixed,
    these posts are bad,

    with regards then to putting them variously in Cur, Pur, Raw.

    Then, there's sort exclusions and bans, with regards to posts, and authors. This is that posts are omitted by exclusion, authors' posts are omitted
    by ban.

    Then, trying to associate all the author's of a mega-nym, in this case
    the Google's spam flood to make a barrier-to-entry of having open communications,
    is basically attributing those as a class those authors to a banned
    mega-nym.

    Yet, then there is the use case of identity fraud's abuses, disabusing
    an innocent dupe,
    where logins basically got hacked or the path to return to innocence.


    This sort of results a yes/no/maybe for authors, sort of like:

    yes, it's a known author, it's unlikely they are really bad
    (... these likely frauds are Non's?)

    no, it's a known excluded post, open rules
    no, it's a known excluded author, criminal or a-topical solicitation
    no, it's a new excluded author, associated with an abstract criminal or a-topical solicitation

    maybe (yes), no reason why not

    that a "rules engine" is highly efficient deriving decisions yes/no/maybe,
    in both execution and maintenance of the rules (data plane / control
    plane).

    Groups like sci.math have a very high bar to participation, literacy
    in mostly English and the language of mathematics. Groups have
    a very low bar to pollution, all else.

    So, figuring out a common "topicality standard", here is the idea to associate
    concepts with charter with topicality, then for of course a very loose and egalitarian approach to participation, otherwise free.

    (Message integrity, irrepudiability, free expression, free press, free speech,
    not inconsequence, nor the untrammeled.)


    [2024/01/28]

    Well, "what is spam", then, I suppose sort of follows from the
    "spam is a word coined on Usenet for unsolicated a-topical posts",
    then the ideas about how to find spam, basically make for that
    there are some ways to identify these things.

    The ideas of
    cohort: a group, a thread, a poster
    cliques: a group, posts that reply to each other

    Then
    content: words and such
    clicks: links

    Here the idea is to categorize content according to cohorts and cliques,
    and content and clicks,

    It's figured that all spam has clicks in it, then though that of course clicks
    are the greatest sort of thing for hypertext, with regards to

    duplicate links
    duplicate domains

    and these sorts of things.

    The idea is that it costs resources to categorize content, is according
    to the content, or the original idea that "spam must be identified by
    its subject header alone", vis-a-vis the maintenance of related data,
    and the indicator of matching various aspects of relations in data.

    So, clicks seem the first way to identify spam, basically that a histogram
    of links by their domain and path, results duplicates are spam, vis-a-vis, that clicks in a poster's sig or repeated many times in a long thread,
    are not.

    In this sense there's that posts are collections of their context,
    about how to make an algorithm in best effort to relate context
    to the original posts, usually according to threading.

    The idea here is that Non's can be excluded when first of all they
    have links, then for figuring that each group has usual sites that
    aren't spam, like their youtube links or their doc repo links or their
    wiki links or their arxiv or sep or otherwise, usual sorts good links,
    while that mostly it's the multiplicity of links that represent a spam attack,
    then just to leave all those in Purgatory.

    It's figured then that good posters when they reach Old, pretty much
    are past spamming, then about that posters are New for quite a while,
    and have some readings or otherwise mature into Old, about that
    simply Old and Off posters posts go right through, New posters posts
    go right through, then to go about categorizing for spam, excluding spam.


    I.e., the "what is spam", predicate, is to be an open-rules sort of composition,
    that basically makes it so that spamverts would be ineffective because spammers exploit lazy and if their links don't go through, get nothing.

    Then, there's still "what is spam" with regards to just link-less spam,
    about that mostly it would be about "repeated junk", that "spam is not unique".
    This is the usual notion of "signal to noise", basically finding whether
    it's just noise in Purgatory, that signal in Purgatory is a good sign of
    New.

    So, "what is spam" is sort of "what is not noise". Again, the goal is open-rules
    normative algorithms that operate on write-once-read-many graduated feeds, what result that the Usenet compeering, curates its federated ingress, then as for feeding its out-feed, with regards to other Usenet compeers
    following
    the same algorithm, then would get the same results.

    Then, the file-store might still have copies of all the spams, with the
    idea then
    that it's truncatable, because spam-campaigns are not long-running for archival,
    then to drop the partitions of Purgatory and Raw, according to retention. This then also is for fishing out what are Type I / Type II errors,
    about promoting
    from Non to New or also about the banishment of Non to Bad, or, Off to Bad. I.e., there's not so much "cancel", yet there's still for "no-archive",
    about how
    to make it open and normative how these kinds of things are.

    Luckily the availability of unbounded in size filesystems is pretty
    large these days,
    and, implementing things write-once-read-many, makes for pretty simple routines
    that make maintenance.


    It's like "whuh how do I monetize that?" and it's like "you don't", and
    "you figure
    that people will buy into free speech, free association, and free press".
    You can make your own front-end and decorate it with what spam you want,
    it just won't get federated back in the ingress of this Usenet Compeerage.

    Then it's like "well I want to only see Archimedes Plutonium and his co-horts"
    then there's the idea that there's to be generated some files with
    relations,
    the summaries and histrograms, then for those to be according to
    time-series
    buckets, making tractable sorts metadata partially digested, then for
    making
    digests of those, again according to normative algorithms with well-defined access patternry and run-times, according to here pretty a hierarchical file-system.
    Again it's sort of a front-end thing, with surfacing either the back-end files
    or the summaries and digests, for making search tractable in many
    dimensions.

    So, for the cohort, seems for sort of accumulated acceptance and rejection, about accepters and rejectors and the formal language of hierarchical data that's established by its presence and maintenance, about "what is spam" according to the entire cohort, and cliques, then with regards to Old/Off
    and spam or Non, with regards to spam and Bad.

    So, "what is spam" is basically that whatever results excluded was spam.


    [ page break 7 ]


    [2024/02/03]


    Well, with the great spam-walling of 2024 well underway, it's a bit too
    late to setup
    very easy personal Internet, but, it's still pretty simple, the Internet
    text protocols,
    and implementing standards-based network-interoperable systems, and
    there are
    still even some places where you can plug into the network and run your
    own code.

    So anyways the problem with the Internet today is that anything that's
    public facing
    can expect to get mostly not-want-traffic, where the general idea is to
    only get want-traffic.

    So, it looks like that any sort of public facing port, where TCP/IP
    sockets for the connection-oriented
    protocols like here the Internet protocols are basically as for the
    concept that the two participants
    in a client-server or two-way communication are each "host" and "port",
    then as for protocol, and
    as with respect to binding of the ports and so on or sockets or about
    the 7-layer ISO model of
    networking abstraction, here it's hosts and ports or what result IP
    addresses and packets
    destined for ports, those multiplexed and reassembled by the TCP/IP protocols' stacks on
    the usual commodity hardware's operating systems, otherwise as with
    respect to network
    devices, their addresses as in accords with the network topology's
    connection and routing
    logic, and that otherwise a connection-oriented protocol is in terms of listening and ephemeral
    ports, with respect to the connection-oriented protocols, theirs sockets
    or Address Family UNIX
    sockets, and, packets and the TCP/IP protocol semantics of the NICs and
    their UARTS, as with
    regards to usual intrusive middleware like PPP, NAT, BGP, and other
    stuff in the way of IP, IPv4, and IPv6.


    Thus, for implementing a server, is basically the idea then that as
    simply accepting connections,
    then is to implement for the framework, that it has at least enough
    knowledge of the semantics
    of TCP/IP, and the origin of requests, then as with regards to
    implementing a sort of "Load Shed"
    or "Load Hold", where Load Shedding is to dump not-want-traffic and Load Holding is to feed
    it very small packets at very infrequent intervals within socket
    timeouts, while dropping immediately
    anything it sends and using absolutely minimal resources otherwise in
    the TCP/IP stack, to basically
    give unwanted traffic a connection that never completes, as a sort of passive-aggressive response
    to unwanted traffic. "This light never changes."


    So, for Linux it's sockets and Windows it's like WSASocket and Java it's java.nio.channels.SocketChannel,
    about that the socket basically has responsibilities for happy-case want-traffic, and enemy-case not-want-traffic.


    Then, where in my general design for Internet protocol network
    interfaces, what I have filled in
    here is basically this sort of

    Reader -> Scanner -> Executor -> Printer -> Writer

    where the notions of the "home office equipment" like the multi-function device has here that in
    metaphor it basically considers the throughput as like a combination scanner/printer fax-machine,
    then the idea is that there needs to be some sort of protection mostly
    on the front, basically that
    the "Hopper" then has about the infeed and outfeed Hoppers, or with the Stamper at the end,
    figuring the Hopper does Shed/Hold, or Shed/Fold/Hold, while, the
    Stamper does the encryption
    and compression, about that Encryption and Compression are simply
    regular concerns what result
    plain Internet protocol text (and, binary) commands in the middle.

    Hopper -> Reader -> Scanner -> Executor -> Printer -> Writer

    Then, for Internet protocols like, SMTP, NNTP, IMAP, HTTP, usual sorts request/response client/server
    protocols, then I suppose I should wonder about multiplexing
    connections, though, HTTP/2 really
    is just about multiple calls with pretty much the same session, and
    getting into the affinity of sessions,
    about client/server protocols, logins, requests/responses, and sessions,
    here with the idea of
    pretty much implementing a machine, for implementing protocol, for the half-dozen usual messaging
    and web-service protocols mentioned above, and a complement of their
    usual options,
    implementing a sort of usual process designed to be exposed on its own
    port, resulting a
    sort shatter-proof protocol implementation, figuring the Internet is an
    ugly place and
    the Hopper is regularly clearing the shit out of the front.

    So anyways, then about how to go about implementing a want-traffic feed
    is basically the
    white-list approach, from the notion that there is want and not want,
    but not to be racist,
    basically a want-list approach, and a drop-list. The idea is that you
    expect to get email from
    people you've sent email, or their domain, and then, sometimes when you
    plan to expect an
    email, then the idea is to just maintain a window and put in terms what
    you expect to get or
    expect to have recently gotten, then to fish those out from all the
    trash, basically over time
    to put in the matches for the account, that messages to the account,
    given matches surface
    the messages, otherwise pretty much just maintaining a rotating queue of
    junk that dumps
    off the junk when it rotates, while basically having a copy of the
    incoming junk, for as
    necessary looking through the junk for the valuable message.


    The Internet protocols then for what they are the messaging level or
    user land, of the user-agents,
    have a great affinity and common implementation.

    SMTP -> POP|IMAP

    IMAP -> NNTP

    NNTP
    HTTP -> NNTP
    HTTP -> IMAP -> NNTP

    SMTP -> NNTP
    NNTP -> SMTP


    I'm really quite old-fashioned, and sort of rely on natural written
    language, while, still, there's
    the idea that messages are arbitrarily large and of arbitrary format and
    of arbitrary volume
    over an arbitrary amount of time, or 'unbounded' if '-trary' sounds too
    much like 'betrayedly',
    with the notion that there's basically for small storage and large
    storage, and small buffers
    and large buffers, and bounds, called quota or limits, so to result that usual functional message
    passing systems among small groups of people using modest amounts of resources can distance
    themselves from absolute buffoon's HDTV'ing themselves picking their nose.

    So, back to the Hopper, or Bouncer, then the idea is that everything
    gets in an input queue,
    because, spam-walls can't necessarily be depended on to let in the want-traffic. Then the
    want-list (guest-list) is used to bring those in to sort of again what results this, "NOOBNB",
    layout, so it sort of results again a common sort of "NOOBNB BFF/SFF", layout, that it results
    the layout can be serialized and tore down and set back up and commenced same, serialized.

    Then, this sort of "yes/no/maybe" (sure/no/yes, "wildmat"), has the idea
    of that still there
    can be consulted any sorts accepters/rejectors, and it builds a sort of
    easy way to make
    for the implementation, that it can result an infeed and conformant
    agent, on the network,
    while both employing opt-in sort spam-wall baggage, or, just winging it
    and picking ham deliberately.

    In this manner NOOBNB is sort of settling into the idea of the physical layout, then for the
    idea of this Load: Roll/Fold/Shed/Hold, is for sorts policies of
    "expect happy case", "expect
    usual case", "forget about it", and "let them think about it".

    The idea here is sort of to design modes of the implementation of the protocols, in
    simple and easy-to-remember terms like "NOOBNB", "BFF/SFF", "Roll/Fold/Shed/Hold",
    what results pragmatic and usual happy-case Internet protocols, on an Internet full
    of fat-cats spam-walling each other, getting in the way of the ham.
    (That "want" is ham,
    and "not-want" is spam.) "Ham is not spam, spam is spiced canned ham."


    Then, after the Internet protocols sitting behind a port on a host with
    an address,
    and that the address is static or dynamic in the usual sense, but that
    every host has one,
    vis-a-vis networks and routing, then the next thing to figure out is
    DNS, the name of
    the host, with respect to the overall infrastructure of the
    implementation of agents,
    in the protocols, on the network, in the world.

    Then, I don't know too much about DNS, as with respect to that in the
    old days it was sort
    of easy to register in DNS, that these days becoming a registrar is
    pretty involved, so after
    hiring some CPU+RAM+DISK+NET sitting on a single port (then for its
    ephemeral connections
    as up above that, but ports entirely in the protocol), with an address,
    is how to get traffic
    pointed at the address, by surfacing its address in DNS, or, just making
    an intermediary service
    for the discovery of addresses and ports and configuring one's own DNS resolver, but here
    of course to keep things simple for publicly-facing services that are
    good actors on the network
    and in Internet protocols.

    So I don't know too much about DNS, and it deserves some more study. Basically the DNS resolver
    algorithm makes lookups into a file called "the DNS file" and thusly a
    DNS resolver results
    addresses or lookup hosts for addresses and sorts of DNS records, like
    the "Mail Exchanger" record,
    or "the A record", "the CNAME", "various text attributes", "various
    special purpose attributes",
    then that DNS resolvers will mostly look those up to point their proxies
    they insert to it,
    then present those as addresses at the DNS resolver. (Like I said, the Internet protocols
    are pretty simple.)

    So, for service discovery pretty much, it looks like the DNS
    "authoritative name server",
    basically is to be designed for the idea that there are two user-agents
    that want to connect,
    over the Internet, and they're happy, then anything else that connects,
    is usual, so there's
    basically the idea that the authoritative name server, is to work itself
    up in the DNS protocols,
    so it results that anybody using the addresses of its names will have
    found itself with some
    reverse lookups or something like that, helping meet in the middle.

    https://en.wikipedia.org/wiki/Domain_Name_System

    RR Resource Records
    SOA Start of Authority
    A, AAAA IP addresses
    MX, Mail Exchanger
    NS, Name Server
    PTR, Reverse DNS Lookups
    CNAME, domain name aliases

    RP Responsible Person
    DNSSEC
    TXT ...

    ("Unsolicited email"? You mean lawyers and whores won't even touch them?)

    So, DNS runs over both UDP and TCP, so, there's for making that the Name Server,
    is basically that anybody who comes looking for a domain, it should
    result that
    then there's the high-availability Name Server, special-purpose for
    managing
    address resolution, and as within the context of name cache-ing, with
    regards
    to personal Internet services designed to run reliably and correctly in
    a more-or-less
    very modest and ad-hoc fashion. (Of primary importance of any Internet protocol
    implementation is to remain a good actor on the network, of course among other
    important things like protecting the users the agents their persons.)

    https://en.wikipedia.org/wiki/BIND

    "BIND 9 is intended to be fully compliant with the IETF DNS standards
    and draft standards."

    https://datatracker.ietf.org/wg/dnsop/documents/

    Here the point seems to be to make it mostly so that response fit in a
    single
    user datagram or packet, with regards to UDP implementation, while TCP implementation is according to this sort of "HRSEPW" throughput model.

    I.e. mostly the role here is for personal Internet services, not
    surfacing a
    vended layer of a copy of the Internet for a wide proxy all snuffling
    the host.
    (Though, that has also its role, for example creating wide and deep traffic sniffing, and for example buddy-checking equivalent views of the network, twisting up TLS exercises and such. If you've read the manuals, ....)


    Lots of the DNS standards these days are designed to aid the giants,
    from clobbering each other, here the goal mostly is effective
    industrious ants,
    effective industrious and idealistic ants, dedicated to their gents.


    So, "dnsops" is way too much specifications to worry about, instead just reading through those to arrive at what's functionally correct,
    and peels away to be correct backwards.

    https://datatracker.ietf.org/doc/draft-ietf-dnsop-rfc8499bis/

    "The Domain Name System (DNS) is defined in literally dozens of
    different RFCs."

    Wow, imagine the reading, ....

    "This document updates RFC 2308 by clarifying the definitions of
    "forwarder" and "QNAME"."


    "In this document, the words "byte" and "octet" are used interchangably. "


    "Any path of a directed acyclic graph can be
    represented by a domain name consisting of the labels of its
    nodes, ordered by decreasing distance from the root(s) (which is
    the normal convention within the DNS, including this document)."

    The goal seems implementation of a Name Server with quite correct cache-ing and currency semantics, TTLs, and with regards to particularly the Mail Exchanger,
    reflecting on a usual case of mostly receiving in a spam-filled
    spam-walled world,
    while occasionally sending or posting in a modest and personal fashion,
    while
    in accords with what protocols, result well-received ham.

    "The header of a DNS message is its first 12 octets."

    "There is no formal definition of "DNS server", but RFCs generally
    assume that it is an Internet server that listens for queries and
    sends responses using the DNS protocol defined in [RFC1035] and its successors."

    So, it seems that for these sorts of personal Internet services, then
    the idea
    is that a DNS Name Server is the sort of long-running and highly-available thing to provision, with regards to it being exceedingly small and fast,
    and brief in implementation, then as with regards to it tenanting the
    lookups
    for the various and varying, running on-demand or under-expectations.
    (Eg, with the sentinel pattern or accepting a very small amount of traffic while starting up a larger dedicated handler, or making for the sort of sentinel-to-wakeup or wakeup-on-service pattern.)

    https://en.wikipedia.org/wiki/DNS_Certification_Authority_Authorization https://en.wikipedia.org/wiki/Incident_Object_Description_Exchange_Format


    Then it looks like I'm supposed to implement Session Initiation Protocol,
    and have it do service discovery and relation or Dynamic DNS, but I sort of despise Session Initiation Protocol as it's so abused and twisted, yet, there's
    some idea to make a localhost server that fronts personal Internet agents that could drive off either SIP or DDNS, vis-a-vis starting up the
    agents on demand,
    as with respect to running the agents essentially locally and making peer-to-peer.

    https://en.wikipedia.org/wiki/Zero-configuration_networking#DNS-based_service_discovery


    But, it's simplest to just have a static IP and then run the agents as
    an MTA,
    here given that the resources are so cheap that personal Internet agents
    is economical,
    or as where anything resolves to a host and a well-known port, to
    virtualize that
    to well known ports at an address.

    PIA: in the interests of PII.

    [2024/02/08]

    So, if you know all about old-fashioned
    Internet protocols like DNS, then NNTP,
    IMAP, SMTP, HTTP, and so on, then where
    it's at is figuring out these various sorts
    conventions then to result a sort-of, the
    sensible, fungible, and tractable, conventions
    of the data structures and algorithms, in
    the protocols, what result keeping things
    simple and standing up a usual Internet
    messaging agentry.


    BFF: backing-file formats, "Best friends forever"

    Message files
    Group files

    Thread link files
    Date link files

    SFF: search-file formats, "partially digested metadata"



    NOOBNB: Noob Nota Bene: Cur/Pur/Raw

    Load Roll/Fold/Shed/Hold: throughput/offput



    Then, the idea is to make it so that by constructing
    the files or a logical/physical sort of distinction,
    that then results a neat tape archive then that
    those can just be laid down together and result
    a corpus, or filtered on down and result a corpus,
    where the existence standard is sort of called "mailbox"
    or "mbox" format, with the idea muchly of
    "converting mbox to BFF".


    Then, for enabling search, basically the idea or a
    design principle of the FF is that they're concatenable
    or just overlaid and all write-once-read-many, then
    with regards to things like merges, which also should
    result as some sort of algorithm in tools, what results,
    that of course usual sorts tools like textutils, working
    on these files, would make it so that usual extant tools,
    are native on the files.

    So for metadata, the idea is that there are standard
    metadata attributes like the closed categories of
    headers and so on, where the primary attributes sort
    of look like

    message-id
    author

    delivery-path
    delivery-metadata (account, GUID, ...)

    destinations

    subject
    size
    content

    hash-raw-id <- after message-id
    hash-invariant-id <- after removing inconstants
    hash-uncoded-id <- after uncoding out to full

    Because messages are supposed to be unique,
    there's an idea to sort of detect differences.


    The idea is to sort of implement NNTP's OVERVIEW
    and WILDMAT, then there's IMAP, figuring that the
    first goals of SFF is to implement the normative
    commands, then with regards to implementations,
    basically working up for HTTP SEARCH, a sort of
    normative representation of messages, groups,
    threads, and so on, sort of what results a neat sort
    standard system for all sorts purposes these, "posts".


    Anybody know any "normative RFC email's in HTTP"?
    Here the idea is basically that a naive server
    simply gets pointed at BFF files for message-id
    and loads any message there as an HTTP representation,
    with regards to HTTP, HTML, and so on, about these
    sorts "sensible, fungible, tractable" conventions.


    It's been a while since I studied the standards,
    so I'm looking to get back tapping at the C10K server
    here, basically with hi-po full throughput then with
    regards to the sentinel/doorman bit (Load R/F/S/H).

    So, I'll be looking for "partially digested and
    composable search metadata formats" and "informative
    and normative standards-based message and content".

    They already have one of those, it's called "Internet".


    [2024/02/09]

    Reading up on anti-spam, it seems that Usenet messages have
    a pretty simple format, then with regards to all of Internet
    messages, or Email and MIME and so on, gets into basically
    the nitty-gritty of the Internet Protocols like SMTP, IMAP, NNTP,
    and HTTP, about figuring out what's the needful then for things
    like Netnews messages, Email messages, HTTP messages,
    and these kinds of things, basically for message multi-part.

    https://en.wikipedia.org/wiki/MIME

    (DANE, DKIM, DMARC, ....)

    It's kind of complicated to implement correctly the parsing
    of Internet messages, so, it should be done up right.

    The compeering would involve the conventions of INND.
    The INND software is very usual, vis-a-vis Tornado or some
    commercial cousins, these days.

    The idea seems to be "run INND with cleanfeed", in terms
    of control and junk and the blood/brain barrier or here
    the text/binaries barrier, I'm only interested in setting up
    for text and then maybe some "richer text" or as with
    regards to Internet protocols for messaging and messages.

    Then the idea is to implement this "clean-room", so it results
    a sort of plain description of data structures logical/physical
    then a reference implementation.

    The groups then accepted/rejected for compeering basically
    follow the WILDMAT format, which is pretty reasonable
    in terms of yes/no/maybe or sure/no/yes sorts of filters.

    https://www.eyrie.org/~eagle/software/inn/docs-2.6/newsfeeds.html

    https://www.eyrie.org/~eagle/software/inn/docs-2.6/libstorage.html

    https://www.eyrie.org/~eagle/software/inn/docs-2.6/storage.conf.html#S2

    It refers to the INND storageApi token so I'll be curious about
    that and BFF. The tradspool format, here as it partitions under
    groups, is that BFF instead partitions under message-ID, that
    then groups files have pointers into those.

    message-id/

    id <- "id"

    hd <- "head"
    bd <- "body"

    td <- "thread", reference, references
    rd <- "replied to", touchfile

    ad <- "author directory", ... (author id)
    yd <- "year to date" (date)

    xd <- "expired", no-archive, ...
    dd <- "dead", "soft-delete"
    ud <- "undead", ...

    The files here basically indicate by presence then content,
    what's in the message, and what's its state. Then, the idea
    is that some markers basically indicate any "inconsistent" state.

    The idea is that the message-id folder should be exactly on
    the order of the message size, only. I.e. besides head and body,
    the other files are only presence indicators or fixed size.
    And, the presence files should be limited to fit in the range
    of the alphabet, as above it results single-letter named files.

    Then the idea is that the message-id folder is created on the
    side with id,hd,bd then just moved/renamed into its place,
    then by its presence the rest follows. (That it's well-formed.)

    The idea here again is that the storage is just stored deflated already,
    with the idea that then as the message is served up with threading,
    where to litter the thread links, and whether to only litter the
    referring post's folder with the referenced post's ID, or that otherwise there's this idea that it's a poor-man's sort of write-once-read-many organization, that's horizontally scalable, then that any assemblage
    of messages can be overlaid together, then groups files can be created
    on demand, then that as far as files go, the natural file-system cache, caches access to the files.

    The idea that the message is stored compressed is that many messages
    aren't much read, and most clients support compressed delivery,
    and the common deflate format allows "stitching" together in
    a reference algorithm, what results the header + glue + body.
    This will save much space and not be too complicated to assemble,
    where compression and encryption are a lot of the time,
    in Internet protocols.

    The message-id is part of the message, so there's some idea that
    it's also related to de-duplication under path, then that otherwise
    when two messages with the same message-id arrive, but different
    otherwise content, is wrong, about what to do when there are conflicts
    in content.

    All the groups files basically live in one folder, then with regards
    to their overviews, as that it sort of results just a growing file,
    where the idea is that "fixed length records" pretty directly relate
    a simplest sort of addressing, in a world where storage has grown
    to be unbounded, if slow, that it also works well with caches and
    mmap and all the usual facilities of the usual general purpose
    scheduler and such.

    Relating that to time-series data then and currency, is a key sort
    of thing, about here that the idea is to make for time-series
    organization that it's usually enough hierarchical YYYYMMDD,
    or for example YYMMDD, if for example this system's epoch
    is Jan 1 2000, with a usual sort of idea then to either have
    a list of message ID's, or, indices that are offsets to the group
    file, or, otherwise as to how to implement access in partition
    to relations of the items, for browsing and searching by date.

    Then it seems for authors there's a sort of "author-id" to get
    sorted, so that basically like threads is for making the
    set-associativity of messages and threads, and groups, to authors,
    then also as with regards to NOOBNB that there are
    New/Old/Off authors and Bot/Non/Bad authors,
    keeping things simple.

    Here the idea is that authors, who reply to other authors,
    are related variously, people they reply to and people who
    reply to them, and also the opposite, people who they
    don't reply to and people who don't reply to them.
    The idea is that common interest is reflected in replies,
    and that can be read off the messages, then also as
    for "direct" and "indirect" replies, either down the chain
    or on the same thread, or same group.

    (Cliques after Kudos and "Frenemies" after "Jabber",
    are about same, in "tendered response" and "tendered reserve",
    in groups, their threads, then into the domain of context.)

    So, the first part of SFF seems to be making OVERVIEW,
    which is usual key attributes, then relating authorships,
    then as about content. As well for supporting NNTP and IMAP,
    is for some default SFF supporting summary and retrieval.

    groups/group-id/

    ms <- messages

    <- overview ?
    <- thread heads/tails ?
    <- authors ?
    <- date ranges ?

    It's a usual idea that BFF, the backing file-format, and
    SFF, the search file-format, has that they're distinct
    and that SFF is just derived from BFF, and on-demand,
    so that it works out that search algorithms are implemented
    on BFF files, naively, then as with regards to those making
    their own plans and building their own index files as then
    for search and pointing those back to groups, messages,
    threads, authors, and so on.


    The basic idea of expiry or time-to-live is basically
    that there isn't one, yet, it's basically to result that
    the message-id folders get tagged in usual rotations
    over the folders in the arrival and date partitions,
    then marked out or expunged or what, as with regards
    to the write-once-read-many or regenerated groups
    files, and the presence or absence of messages by their ID.
    (And the state of authors, in time and date ranges.)

    [ page break 8 ]

    [2024/02/10]

    About TLS again, encryption, one of the biggest costs
    of serving data in time (CPU time), is encryption, the
    other usually being compression, here with regards
    to what are static assets or already generated and
    sort of digested.

    So, looking at the ciphersuites of TLS, is basically
    that after the handshake and negotiation, and
    as above there's the notion of employing
    renegotiation in 1.2 to share "closer certificates",
    that 1.3 cut out, that after negotiation then is
    the shared secret of the session that along in
    the session the usual sort of symmetric block-cipher
    converts the plain- or compressed-data, to,
    the encrypted and what results the wire data.
    (In TLS the client and server use the same
    "master secret" for the symmetric block/stream
    cipher both ways.)

    So what I'm wondering is about how to make it
    so, that the data is stored first compressed at
    rest, and in pieces, with the goal to make it so
    that usual tools like zcat and zgrep work on
    the files at rest, and for example inflate them
    for use with textutils. Then, I also wonder about
    what usual ciphersuites result, to make it so that
    there's scratch/discardable/ephemeral/ad-hoc/
    opportunistic derived data, that's at least already
    "partially encrypted", so that then serving it for
    the TLS session, results a sort of "block-cipher's
    simpler-finishing encryption".

    Looking at ChaCha algorithm, it employs
    "addition, complement, and rotate".
    (Most block and streaming ciphers aim to
    have the same size of the output as the input
    with respect to otherwise a usual idea that
    padding output reduces available information.)

    https://en.wikipedia.org/wiki/Block_cipher https://en.wikipedia.org/wiki/Stream_cipher

    So, as you can imagine, block-ciphers are
    a very minimal subset of ciphers altogether.

    There's a basic idea that the server just always
    uses the same symmetric keys so that then
    it can just encrypt the data at rest with those,
    and, serve them right up. But, it's a matter of
    the TLS Handshake establishing the "PreMaster
    secret" (or, lack thereof) and it's "pesudo-random function",
    what with regards to the server basically making
    for contriving its "random number" earlier in
    the handshake to arrive at some "predetermined
    number".

    Then the idea is for example just to make it
    so for each algorithm that the data's stored
    encrypted then that it kind of goes in and out
    of the block cipher, so that then it sort of results
    that it's already sort of encrypted and takes less
    rounds to line up with the session secret.

    https://datatracker.ietf.org/doc/html/rfc8446

    "All the traffic keying material is recomputed
    whenever the underlying Secret changes
    (e.g., when changing from the handshake
    to Application Data keys or upon a key update)."

    TLS 1.3: "The key derivation functions have
    been redesigned. The new design allows
    easier analysis by cryptographers due to
    their improved key separation properties.
    The HMAC-based Extract-and-Expand Key
    Derivation Function (HKDF) is used as an
    underlying primitive."

    https://en.wikipedia.org/wiki/HKDF

    So, the idea is "what goes into HKDF so
    that it results a known value, then
    having the data already encrypted for that."

    I'm not much interested in actual _strength_
    of encryption, just making it real simple in
    the protocol to have static data ready to
    send right over the wire according to the
    server indicating in the handshake how it will be.

    And that that can change on demand, ....

    "Values are defined in Appendix B.4."

    https://datatracker.ietf.org/doc/html/rfc8446#appendix-B.4

    So, I'm looking at GCM, CCM, and POLY1305,
    with respect to how to compute values that
    it results the HKDF is a given value.

    https://en.wikipedia.org/wiki/Cipher_suite

    Then also there's for basically TLS 1.2, just
    enough backward and forward that the server
    can indicate the ciphersuite, and the input to
    the key derivation function, for which its data is
    already ready.

    It's not the world's hardest problem to arrive
    at what inputs will make for a given hash
    algorithm that it will arrive at a given hash,
    but it's pretty tough. Here though it would
    allow this weak encryption (and caching of them)
    the static assets, then serving them in protocol,
    figuring that man-in-the-middle is already broken
    anyways, with regards to the usual 100's of
    "root CAs" bundled with usual User-Agentry.

    I.e., the idea here is just to conform with TLS,
    while, having the least cost to serve it, while, using
    standard algorithms, and not just plain-text,
    then, being effectively weak, and, not really
    expecting any forward privacy, but, saving
    the environment by using less watts.

    Then what it seems results is that the server just
    indicates ciphersuites that have that the resulting
    computed key can be made so for its hash,
    putting the cost on the handshake, then
    that the actual block cipher is a no-op.


    You like ...?

    [2024/02/11]

    So I'm looking at my hi-po C10K low-load/constant-load
    Internet text protocol server, then with respect to
    encryption and compression as usual, then I'm looking
    to make that in the framework, to have those basically
    be out-of-band, with respect to things like
    encryption and compression, or things like
    transport and HTTP or "upgrade".

    I.e., the idea here is to implement the servers first
    in "TLS-terminated" or un-encrypted, then as with
    respect to having enough aware in the protocol,
    to make for adapting to encrypting and compressing
    and upgrading front-ends, with regards to the
    publicly-facing endpoints and the internally-facing
    endpoints, which you would know about if you're
    usually enough familiar with client-server frameworks
    and server-oriented architecture and these kinds of
    things.

    The idea then is to offload the TLS-termination
    to a sort of dedicated layer, then as with regards
    to a generic sort of "out-of-band" state machine
    the establishment and maintenance of the connections,
    where still I'm mostly interested in "stateful" protocols
    or "connection-oriented" vis-a-vis the "datagram"
    protocols, or about endpoints and sockets vis-a-vis
    endpoints and datagrams, those usually enough sharing
    an address family while variously their transport (packets).

    Then there's sort of whether to host TLS-termination
    inside the runtime as usually, or next to it as sort of
    either in-process or out-of-process, similarly with
    compression, and including for example concepts
    of cache-ing, and upgrade, and these sorts things,
    while keeping it so that the "protocol module" is
    all self-contained and behaves according to protocol,
    for the great facility of the standardization and deployment
    of Internet protocols in a friendly sort of environment,
    vis-a-vis the DMZ to the wider Internet, as basically with
    the idea of only surfacing one well-known port and otherwise
    abstracting away the rest of the box altogether,
    to reduce the attack surface its vectors, for
    a usual goal of thread-modeling, reducing it.


    So people would usually enough just launch a proxy,
    but I'm mostly interested only in supporting TLS and
    perhaps compression in the protocol as only altogether
    a pass-through layer, then as with regards to connecting
    that in-process as possible, so passing I/O handles,
    otherwise with a usual notion of domain sockets
    or just plain Address Family UNIX sockets.

    There's basically whether the publicly-facing actually
    just serves on the usual un-encrypted port, for the
    insensitive types of things, and the usual encrypted
    port, or whether it's mostly in the protocol that
    STARTTLS or "upgrade" occurs, "in-band" or "out-of-band",
    and with respect to usually there's no notion at all
    of STREAMS or "out-of-band" in STREAMS, sockets,
    Address Family UNIX.


    The usual notion here is making it like so:

    NNTP
    IMAP -> NNTP
    HTTP -> IMAP -> NNTP

    for a Usenet service, then as with respect to
    that there's such high affinity of SMTP, then
    as with regards to HTTP more generally as
    the most usual fungible de facto client-server
    protocol, is connecting those locally after
    TLS-termination, while still having TLS-layer
    between the Internet and the server.

    So in this high-performance implementation it
    sort of relies directly on the commonly implemented
    and ubiquitously available non-blocking I/O of
    the runtime, here as about keeping it altogether
    simple, with respect to the process model,
    and the runtime according to the OS/virt/scheduler's
    login and quota and bindings, and back-end,
    that in some runtimes like an app-container,
    that's supposed to live all in-process, while with
    respect to off-loading load to right-sized resources,
    it's sort of general.

    Then I've written this mostly in Java and plan to
    keep it this way, where the Direct Memory for
    the service of non-blocking I/O, is pretty well
    understood, vis-a-vis actually just writing this
    closer to the user-space libraries, here as with
    regards to usual notions of cross-compiling and
    so on. Here it's kind of simplified because this
    entire stack has no dependencies outside the
    usual Virtual Machine, it compiles and runs
    without a dependency manager at all, then
    though that it gets involved the parsing the content,
    while simply the framework of ingesting, storing,
    and moving blobs is just damn fast, and
    very well-behaved in the resources of the runtime.

    So, setting up TLS termination for these sorts
    protocols where the protocol either does or
    doesn't have an explicit STARTTLS up front
    or always just opens with the handshake,
    basically has where I'm looking at how to
    instrument and connect that for the Hopper
    as above and how besides passing native
    file and I/O handles and buffers, what least
    needful results a useful approach for TLS on/off.

    So, this is a sort of approach, figuring for
    "nesting the protocols", where similarly is
    the goal of having the fronting of the backings,
    sort of like so, ...

    NNTP
    IMAP -> NNTP
    HTTP -> NNTP
    HTTP -> IMAP -> NNTP

    with the front being in the protocol, then
    that HTTP has a sort of normative protocol
    for IMAP and NNTP protocols, and IMAP
    has as for NNTP protocols, treating groups
    like mailboxes, and commands as under usual
    sorts HTTP verbs and resources.

    Similarly the same server can just serve each
    the relevant protocols on each the relevant ports.

    If you know these things, ....

    [2024/02/12]

    Looking at how Usenet moderated groups operate,
    well first there's PGP and control messages then
    later it seems there's this sort Stump/Webstump
    setup, or as with regards to moderators.isc.org,
    what is usual with regards to control messages
    and usual notions of control and cancel messages
    and as with regards to newsgroups that actually
    want to employ Usenet moderation sort of standardly.

    (Usenet trust is mostly based on PGP, or
    'Philip Zimmerman's Pretty Good Privacy',
    though there are variations and over time.)

    http://tools.ietf.org/html/rfc5537

    http://wiki.killfile.org/projects/usenet/faqs/nam/


    Reading into RFC5537 gets into some detail like
    limits in the headers field with respect to References
    or Threads:

    https://datatracker.ietf.org/doc/html/rfc5537#section-3.4.4

    https://datatracker.ietf.org/doc/html/rfc5537#section-3.5.1

    So, the agents are described as

    Posting
    Injecting
    Relaying
    Serving
    Reading

    Moderator
    Gateway

    then with respect to these sorts separations duties,
    the usual notions of Internet protocols their agents
    and behavior in the protocol, old IETF MUST/SHOULD/MAY
    and so on.

    So, the goal here seems to be to define a
    profile of "connected core services" of sorts
    of Internet protocol messaging, then this
    "common central storage" of this BFF/SFF
    and then reference implementations then
    for reference editions, these sorts things.

    Of course there already is one, it's called
    "Internet mail and news".

    [ page break 9 ]


    [2024/02/14]

    So one thing I want here is to make it so that data can
    be encrypted very weakly at rest, then, that, the SSL
    or TLS for TLS 1.2 or TLS 1.3, results that the symmetric
    key bits for the records is always the same as this what
    is the very-weak key.

    This way pretty much the entire CPU load of TLS is
    eliminated, while still the data is encrypted very-weakly
    which at least naively is entirely inscrutable.

    The idea is that in TLS 1.2 there's this

    client random cr ->
    <- server random sr
    client premaster cpm ->

    these going into PRF (cpm, 'blah', cr + sr, [48]), then
    whether renegotiation keeps the same client random
    and client premaster, then that the server can compute
    the server random to make it so derived the very-weakly
    key, or for example any of what results least-effort.

    Maybe not, sort of depends.

    Then the TLS 1.3 has this HKDF, HMAC Key Derivation Function,
    it can again provide a salt or server random, then as with
    regards to that filling out in the algorithm to result the
    very-weakly key, for a least-effort block cipher that's also
    zero-effort and being a pass-through no-op, so the block cipher
    stays out the way of the data already concatenably-compressed
    and very-weakly encrypted at rest.


    Then it looks like I'd be trying to make hash collisions which
    is practically intractable, about what goes into the seeds
    whether it can result things like "the server random is
    zero minus the client random, their sum is zero" and
    this kind of thing.


    I suppose it would be demonstrative to setup a usual
    sort of "TLS man-in-the-middle" Mitm just to demonstrate
    that given the client trusts any of Mitm's CAs and the
    server trusts any of Mitm's CAs that Mitm sits in the middle
    and can intercept all traffic.

    So, the TLS 1.2, PRF or pseudo-random function, is as of
    "a secret, a seed, and an identifying label". It's all SHA-256
    in TLS 1.2. Then it's iterative over the seed, that the
    secret is hashed with the seed-hashed secret so many times,
    each round of that concatenated ++ until there's enough bytes
    to result the key material. Then in TLS the seed is defined
    as "blah' ++ seed, so, to figure out how to figure to make it
    so that 'blah' ++ (client random + server random) makes it
    possible to make a spigot of the hash algorithm, of zeros,
    or an initial segment long enough for all key sizes,
    to split out of that the server write MAC and encryption keys,
    then to very-weakly encrypt the data at rest with that.

    Then the client would still be sending up with the client
    MAC and encryption keys, about whether it's possible
    to setup part of the master key or the whole thing.
    Whether a client could fabricate the premaster secret
    so that the data resulted very-weakly encryped on its
    own terms, doesn't seem feasible as the client random
    is sent first, but cooperating could help make it so,
    with regards to the client otherwise picking a weak
    random secret overall.

    (Figuring TLS interception is all based on Mitm,
    not "cryptanalysis and the enigma cipher", and
    even the very-weakly just look like 0's and 1's.)

    So, P_SHA256 is being used to generated 48 bytes,
    so that's two rounds, where the first round is
    32 bytes then second 32 bytes half those dropped,
    then if the client/server MAC/encrypt
    are split up into those, ..., or rather only the first
    32 bytes, then only the first SHA 256 round occurs,
    if the Initialization Vector IV's are un-used, ...,
    results whether it's possible to figure out
    whether "master secret" ++ (client random + server random),
    makes for any way for such a round of SHA-256,
    given an arbitrary input to result a contrived value.

    Hm..., reading thar Web suggests that "label + seed"
    is the concatenation of the 'blah' and the digits of
    client random + server random, as character digits.

    Let's see, a random then looks like so,

    struct {
    uint32 gmt_unix_time;
    opaque random_bytes[28];
    } Random;

    thus that's quite a bit to play with, but I'm
    not sure at all how to make it so that round after
    round of SHA-256, settles on down to a constant,
    given that 28 bytes' decimal digits worth of seed
    can be contrived, while the first 4 bytes of the
    resulting 32 bytes is a gmt_unix_time, with the
    idea that they may be scrambled, as it's not mentioned
    anywhere else to check the time in the random.

    "Clocks are not required to be set correctly
    by the basic TLS protocol; higher-level or
    application protocols may define additional
    requirements."

    So, the server-random can be contrived,
    what it results the 13 + 32 bytes that are
    the seed for the effectively 1-round SHA-256
    hash of an arbitrary input, that the 32 bytes
    can be contrived, then is for wondering
    about how to make it so that results a
    contrived very-weakly SHA-256 output.

    So the premaster secret is decrypted with
    the server's private key, or as with respect
    to the exponents of DH or what, then that's
    padded to 64 bytes, which is also the SHA-256
    chunk size, then the output of the first round
    the used keys and second the probably un-used
    initialization vectors, ...

    https://en.wikipedia.org/wiki/SHA-2#Pseudocode


    "The SHA-256 hash algorithm produces hash values
    that are hard to predict from the input."

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From john larkin@jl@glen--canyon.com to sci.electronics.design,sci.math on Thu Apr 23 12:23:11 2026
    From Newsgroup: sci.math

    On Thu, 23 Apr 2026 09:39:31 -0700, Ross Finlayson
    <ross.a.finlayson@gmail.com> wrote:


    a 4000-line post!



    On 12/01/2025 12:34 PM, Ross Finlayson wrote:
    [ The "Meta: a usenet server just for sci.math" and "Archive Any And All
    Text Usenet" threads transcribed. ]

    [ page break 1]
    [2016/12/01]


    I have an idea here to build a usenet server
    only for sci.math and sci.logic. The idea is
    to find archives of sci.math and sci.logic and
    to populate a store of the articles in a more
    or less enduring form (say, "on the cloud"),
    then to offer some usual news server access
    then to, say, 1 month 3 month 6 month retention,
    and then some cumulative retention (with a goal
    of unlimited retention of sci.math and sci.logic
    articles). The idea would be to have basically
    various names of servers then reflect those
    retentions for various uses for a read-only
    archival server and a read-only daily server
    and a read-and-write posting server. I'm willing
    to invest time and effort to write the necessary
    software and gather existing archives and integrate
    with existing usenet providers to put together these
    things.

    Then, where basically it's in part an exercise
    in vanity, I've been cultivating some various
    notions of how to generate some summaries or
    reports of various post, articles, threads, and
    authors, toward the specialization of the cultivation
    of summary for reporting and research purposes.


    So, I wonder others' idea about such a thing and
    how they might see it as a reasonably fruitful
    thing, basically for the enjoyment and for the
    most direct purposes of the authors of the posts.


    I invite comment, as I have begun to carry this out.

    [2016/12/02]

    So far I've read through the NNTP specs and looked
    a bit at the INND code. Then, the general idea is
    to define a filesystem layout convention, that then
    would be used for articles, then for having those
    on virtual disks (eg, "EBS volumes") or cloud storage
    (eg, "S3") in essentially a Write-Once-Read-Many
    configuration, where the goal is to implement data
    structures that have a forward state machine so that
    they remain consistent with unreliable computing
    resources (eg, "runtimes on EC2 hosts"), and that
    are readily cacheable (and horizontally scaleable).

    Then, the runtimes are of the collection and maintenance
    of posts ("infeeds" and "outfeeds", backfills), about
    summary generation (overview, metadata, key extraction,
    information content, working up auto-correlation), then
    reader servers, then some maintenance and admin. As a
    usual software design principle there is a goal of the
    both "stack-on-a-box" and also "abstraction of resources"
    and a usual separation of domain, library, routine, and
    runtime logic.

    So basically it looks like:
    1) gather mbox files of sci.math and sci.logic
    2) copy those to archive inputs
    3) break those out into a filesystem layout for each article
    (there are various filesystems that support this many files
    these days)
    4) generate partition and overview summaries
    5) generate various revisioning schemes (the "article numbers"
    of the various servers)
    6) figure out the incremental addition and periodic truncation
    7) establish a low-cost but high-availability endpoint runtime
    8) make elastic/auto-scaling service routine behind that
    9) have opportunistic / low cost periodic maintenance
    10) emit that as a configuration that anybody can run
    as "stack-on-a-box" or with usual "free tier" cloud accounts


    [2016/12/04]

    I've looked into this a bit more and the implementation is
    starting to look along these lines.

    First there's the ingestion side, or "infeed", basically
    the infeed connects and pushes articles. Here then the
    basic store of the articles will be an object store (or
    here "S3" as an example object store). This is durable
    and the object keys are the article's "unique" message-id.

    If the message-id already exists in the store, then the
    infeed just continues.

    The article is stored with matching the message-id, noting
    the body offset, and counting the lines, and storing that
    with the object. Then, the message-id pushed to
    a queue, can also have the headers as extracted from
    the article, that are relevant to the article and overview,
    and the arrival date or effective arrival date. The slow-
    and-steady database worker (or, distributed data structure
    on "Dynamo tables") then retrieves a queue item, at some
    metered rate, and gets an article number for each of the
    newsgroups (by some conditional update that might starve a thread)
    for each group that is in the newsgroups of the article and
    some "all" newsgroup, so that each article also has a (sequential) number. >>
    Assigning a sequence is a bit the wicket, because, here
    there's basically "eventual consistency" and "forward safe"
    operations. Any of the threads, connections, or boxes
    could die at any time, then the primary concern is "no
    drops, then, no dupes". So, there isn't really a transactional
    context to make atomic "for each group, give it the next
    sequence value, doing that together for each groups' numbering
    of articles in an atomic transaction". Luckily, while NNTP
    requires strictly increasing values, it allows gaps in the
    sequences. So, here, when mapping article-number to message-id
    and message-id to article-number, if some other thread has
    already stored a value for that article-number, then it can
    be re-tried until there is an unused article-number. Updating
    the high-water mark can fail if it was updated by another thread,
    then to re-try again with the new, which could lead to starvation.

    (There's a notion then, when an article-number is assigned, to
    toss that back onto queue for the rest of the transaction to
    be carried out.)

    Then, this having established a data structure for the message
    store, these are basically the live data structures, distributed,
    highly available, fault-tolerant and maintenance free, this
    implements the basic function for getting feeds (or new articles)
    and also the reader capability, which is basically a protocol
    listener that maintains the reader's current group and article.

    To implement then some further features of NNTP, there's an idea
    to store the article numbers for each group and "all" basically
    a bucket for each time period (eg, 1 day), then, that scans over
    the articles by their numbers find those as the partitions, then
    that sequentially (or rather, increasingly) the rest follow.

    To omit or remove articles or expire them for no-archive, that
    is basically ignored, but the idea is to maintain for the all
    group series of 1000 or 10000 articles then for what offsets in
    those series are cancelled. Basically the object store is
    write-once, immutable, and flat, where it's yet to be determined
    how to backfill the article store from archive files or suck
    feeds from live servers with long retentions. Then there's an
    idea to start the numbering at 1 000 000 or so an then have
    plenty of ranges where to fill in articles as archived or
    according to their receipt date header.

    Then, as the primary data stores would basically just implement
    a simple news server, there are two main notions of priority,
    to implement posting and to implement summaries and reports.

    Then, as far as I can tell, this pretty much fits within the
    "free tier" then that it's pretty economical.

    [2016/12/04]

    It's a matter of scale and configuration.

    It should scale quite well enough, though at some point
    it would involve some money. In rough terms, it looks
    like storing 1MM messages is ~$25/month, and supporting
    readers is a few cents a day but copying it would be
    twenty or thirty dollars. (I can front that.)

    I'm for it where it might be useful, where I hope to
    establish an archive with the goal of indefinite retention,
    and basically to present an archive and for my own
    purposes to generate narratives and timelines.

    The challenge will be to get copies of archives of these
    newsgroups. Somebody out of news.admin.peering might
    have some insight into who has the Dejanews CDs or what
    there might be in the Internet Archive Usenet Archive,
    then in terms of today's news servers which claim about
    ten years retention. Basically I'm looking for twenty
    plus years of retention.

    Now, some development is underway, and in no real hurry.
    Basically I'm looking at the runtimes and a software
    library to be written, (i.e., interfaces for the components
    above and local file-system versions for stack-on-a-box,
    implementing a subset of NNTP, in a simple service runtime
    that idles really low).

    Then, as above, it's kind of a vanity project or author-centric,
    about making it so that custom servers could be stood up with
    whatever newsgroups you want with the articles filtered
    however you'd so care, rendered variously.

    [2016/12/06]

    I've been studying this a bit more.

    I set up a linux development environment
    by installing ubuntu to a stick PC, then
    installing vim, gcc, java, mvn, git. While
    ubuntu is a debian distribution and Amazon
    Linux (a designated target) is instead along
    the lines of RedHat/Yellowdog (yum, was rpm,
    instead of apt-get, for component configuration),
    then I'm pretty familiar with these tools.

    Looking to the available components, basically
    the algorithm is being designed with data
    structures that can be local or remote. Then,
    these are usually that much more complicated
    than just the local or just the remote, and
    here also besides the routine or state machine
    also the exception or error handling and the
    having of the queues everywhere for both
    throttling and delay-retries (besides the
    usual inline re-tries and about circuit
    breaker). So, this is along the lines of
    "this is an object/octet store" (and AWS
    has an offering "Elastic File System" which
    is an NFS Networked File System that looks
    quite the bit more economical than S3 for
    this purpose), "this is a number allocator"
    (without sequence.nextVal in an RDBMS, the
    requirements allow some gaps in the sequence,
    here to use some DynamoDB table attribute's
    "atomic counter"), then along the lines of
    "this is a queue" and separately "I push to
    queues" and "I pop queues", and about "queue
    this for right now" and "queue this for later".
    Then, there's various mappings, like id to number
    and number to id, where again for no-drops / no-dupes
    / Murphy's-law that the state of the mappings is
    basically "forward-safe" and that retries make
    the system robust and "self-healing". Other mappings
    include a removed/deleted bag, this basically looks
    like a subset of a series or range of the assigned
    numbers, of the all-table and each group-table,
    basically numbers are added as attributes to the
    item for the series or range.

    Octet Store
    Queue
    Mapping

    Then, as noted above, with Murphy's law, any of the
    edges of the flowgraph can break at any time, about
    the request/response each that defines the boundary
    (and a barrier), there is basically defined an abstract
    generic exception "TryableException" that has only two
    subclasses, "Retryable" and "Nonretryable". Then, the
    various implementations of the data structures in the
    patterns of their use variously throw these in puking
    back the stack trace, then for inline re-tries, delay
    re-tries, and fails. Here there's usually a definition
    of "idempotence" for methods that are re-tryable besides
    exceptions that might go away. The idea is to build
    this into the procedure, so it's all built at compile-
    time the correctness of the composition of the steps
    of the flowgraph of the procedure.


    Then, for the runtime, basically it will be some Java
    container on the host or in a container, with basically
    a cheap simple watchdog/heartbeat that uses signals on
    unix (posix) to be keeping the service/routine nodes
    (that can fail) up, to bounce (restart) them with signals,
    and to reasonably fail and alarm if thrashing of the
    child process of the watchdog/nanny, with maybe some
    timer update up to the watchdog/heartbeat. Then basically
    this runner executes the routine/workflow logic in the jar,
    besides that then a mount of the NFS being the only admin
    on the box, everything else being run up out of the
    environment from the build artifact.

    The build artifact then looks that I'd use Spring for
    wiring a container and also configuration profiles and
    maybe Spring AOP and this kind of thing, i.e., just
    spring-core (toward avoiding "all" of spring-boot).

    Then, with local (in-memory and file) and remote
    (distributed) implementations, basically the
    design is to the distributed components, making
    abstract those patterns then implementing for the
    usual local implementation as standard containers
    and usual remote implementation as building transactions
    and defined behavior over the network.

    [2016/12/09]

    Having been researching this a bit more, and
    tapping at the code, I've written out most of
    the commands then to build a state machine of
    the results, and, having analyze the algorithm
    of article ingestion and group and session state,
    have defined interfaces suitable either for local
    or remote operation, with the notion that local
    operation would be self-contained (with a quite
    simple file backing) while remote operation would
    be quite usually durable and horizontally scalable.

    I've written up a message reader/writer interface
    or ("Scanner" and "Printer") for non-blocking I/O
    and implementing reading Commands and writing Results
    via non-blocking I/O. This should allow connection
    scaling, with threads on accepter/closer and reader/
    writer and an execution pool for the commands. The
    Scanner and Printer use some BufferPool (basically
    abut 4*1024 or 4K buffers), with an idea that that's
    pretty much all the I/O usage of RAM and is reasonably
    efficient, and that if RAM is hogged it's simple enough
    to self-throttle the reader for the writer to balance
    out.

    About the runtime, basically the idea is to have it
    installable as a "well-known service" for "socket
    activation" as via inetd or systemd. The runtime is
    really rather lean and starts quickly, here on-demand,
    that it can be configured as "on-demand" or "long-running".
    For some container without systemd or the equivalent,
    it could have a rather lean nanny. There's some notion
    of integrating heartbeat or status about Main.main(),
    then that it runs as "java -jar nntp.jar".

    Where the remote backing store or article file system
    is some network file system, it also seems that the
    runtime would so configure dependency on its file system
    resource with quite usual system configuration tools,
    for a fault-tolerant and graceful box that reboots as activable.

    It interests me that SMTP is quite similar to NNTP. With
    an idea of an on-demand server, which is quite rather usual,
    these service nodes run on the smallest cloud instances
    (here the "t2.nano") and scale to traffic, with a very low
    idle or simply the "on-demand" (then for "containerized").


    About usenet them I've been studying what it would mean to
    be compliant and example what to do with some "control" or
    "junk" (sideband) groups and otherwise what it would mean
    and take to make a horizontally scalable elastic cloud
    usenet server (and persistent store). This is where the
    service node is quite lean, the file store and database
    (here of horizontally scalable "tables") is basically unbounded.


    [2016/12/11]

    I've collected what RFC's or specs there are for usenet,
    then having surveyed the most of the specified use cases,
    have cataloged descriptions of the commands about the protocol
    that they are self-contained descriptions within the protocol
    of each command. Then, for where there is the protocol and
    perhaps any exchange or change of the protocol, for example
    for TLS, then that is also being worked into the state machine
    of sorts (simply enough a loop over the input buffer to generate
    command values from the input given the command descriptions),
    for that then as commands are generated (and maintained in their
    order) that the results (eg, in the parallel) are thus computed
    and returned (again back in the order).

    Then, within the protocol, and basically for encryption and
    compression, these are established within the protocol instead
    of, for example, externally to the protocol. So, there is
    basically a filter between the I/O reader and I/O writer and
    the scanner and the printer, as it were, that scans input data
    to commands and writes command results to output data. This is
    again with the "non-blocking I/O" then about that the blocks or
    buffers I've basically settled to 4 kibibyte (4KB) buffers, where,
    basically an entire input or output in the protocol (here a message
    body or perhaps a list of up to all the article numbers) would be
    buffered (in RAM), so I'm looking to spool that off to disk if it
    so results that essentially unbounded inputs and outputs are to be
    handled gracefully in the limited CPU, RAM, I/O, and disk resources
    of the usually quite reliable but formally unreliable computing node
    (and at cost).

    The data structures for access and persistence evolve as the in-memory
    and file-based local editions and networked or cloud remote editions.
    The semantics are built out to the remote editions, as then they can be
    erased in the difference for efficiencies of the local editions.
    The in-memory structures (with the article bodies themselves yet
    actually written to a file store) are quite efficient and bounded
    by RAM or the heap, the file-based structures which makes use of the
    memory-mapped files as you may well know comprise all the content of
    "free" RAM caching the disk files may be mostly persistent with
    a structure that can be bounded by disk size, then the remote network-
    based structures here have a usual expectation of being highly reliable
    (i.e., that the remote files, queues, and records have a higher reliability >> than any given component in their distributed design, at the corresponding >> cost in efficiency and direct performance, but of course, this is design
    for correctness).

    So, that said, then I'm tapping away at the implementation of a queue of
    byte buffers, or the I/O RAM convention. Basically, there is some I/O,
    and it may or may not be a complete datum or event in the protocol, which
    is 1-client-1-server or a stateful protocol. So, what is read off the
    I/O buffer, so the I/O controller can service that and other I/O lines,
    is copied to a byte buffer. Then, this is to be filtered as above as
    necessary, that it is copied to a list of byte buffers (a double ended
    queue or linked list). These buffers maintain their current position
    and limit, from their beginning, the "buffer" is these pointers and the
    data itself. So, that's their concrete type already, then the scanner
    or printer also maintains its scan or print position, that the buffer can
    be filled and holds some data, then that as the scan pointer moves past
    a buffer boundary, that buffer can be reclaimed, with only moving the
    scan pointer when a complete datum is read (here as defined for the scanner >> in small constant terms by the command descriptions as above).

    So, that is pretty much sorted out, then about that basically it should
    ingest articles just fine and be a mostly compliant NNTP server.

    Then, generating the overview and such is another bit to get figured out,
    which is summary.

    Another thing in this design to get figured out is how to implement the
    queue and database action for the remote, where, the cost efficiency of
    the (managed, durable, redundant) remote database, is on having a more-or- >> less constant (and small) rate of reads and writes. Then the distributed
    queue will hold the backlog, but, the queue consumer is to be constant
    rate not for the node but for the fleet, so I'm looking at how to implement >> some leader election (fault-tolerance) or otherwise to have loaner threads >> of the runtime for any service of the queue. This is where, ingestion is
    de-coupled from inbox, so, there's an idea of having a sentinel queue
    consumer
    (because this data might be high volume or low or zero) on a
    publish/subscribe,
    it listens to the queue and if it gets an item it refuses it and wakes up
    the constant-rate (or spiking) queue consumer workers, that then proceed
    with the workflow items and then retire themselves if and when traffic
    drops
    to zero again, standing back up the sentinel consumer.


    Anyways that's just about how to handle variable load but here there's
    that it's OK for the protocol to separate ingestion and inbox, otherwise
    establishing the completion of the workflow item from the initial request
    involves usual asynchronous completion considerations.


    So, that said, then, the design is seeming pretty flexible, then about,
    what extension commands might be suitable. Here the idea is about article
    transfer and which articles to transfer to other servers. The idea is to
    add some X-RETRANSFER-TO command or along these lines,

    X-RETRANSFER-TO host [group [dateBegin [dateEnd]]]

    then that this simply has the host open a connection to the other host
    and offer via IHAVE/CHECK/TAKETHIS all the articles so in the range
    or until the connection is closed. This way then, for example, if this
    NNTP system was running, and, someone wanted a subset of the articles,
    then this command would have them sent out-of-band, or, "automatic
    out-feed".
    Figuring out how to re-distribute or message routing besides simple
    message store and retrieval is its own problem.

    Another issue is expiry, I don't really intend to delete anything, because >> the purpose is archival, but people still use usenet in some corners of
    the internet for daily news, again that's its own problem. Handling
    out-of-order ingestion with the backfilling or archives as they can be
    discovered is another issue, with that basically being about filling a
    corpus of the messages, then trying to organize them that the message
    date is effectively the original injection date.


    Anyways, it proceeds along these lines.

    [2016/12/13]

    One of the challenges of writing this kind of system
    is vending the article-id's (or article numbers) for
    each newsgroup of each message-id. The message-id is
    received with the article as headers and body, or set
    as part of the injection info when the article is posted.
    So, vending a number means that there is known a previous
    number to give the next. Now, this is clear and simple
    in a stand-alone environment, with integer increment or
    "x = i++". It's not so simple in a distributed environment,
    with that the queuing system does not "absolutely guarantee"
    no dupes, with the priority being no drops, and also, the
    independent workers A and B can't know the shared value of
    x to make and take atomic increments, without establishing
    a synchronization barrier, here over the network, which is
    to be avoided (eg, blocking and locking on a database's
    critical transactional atomic sequence.nextval, with, say,
    a higher guarantee of no gaps). So, there is a database
    for vending strictly increasing numbers, each group of
    an article has a current number and there's an "atomic
    increment" feature thus that A working on A' will get
    i+1 and B working on B' will get i+2 (or maybe i+3, if
    for example the previous edition of B died). If A working
    on A' and B working on A' duplicated from the queue get
    i+1 and i+2, then, there is as mentioned above a conditional
    update to make sure the article number always increases,
    so there is a gap from the queue dupe or a gap from the
    worker drop, but then A or B has a consistent view of the
    article-id of A' or B'.

    So, then with having the number, once that's established,
    then all's well and good to associate the message-id, and
    the article-id.

    group: article-id -> message-id
    message: groups -> article-ids

    Then, looking at the performance, this logical association
    is neatly maintainable in the DB tables, with consistent
    views for A and B. But it's a limited resource, in this
    implementation, there are actually only so many reads and
    writes per period. So, workers can steadily chew away the
    intake queue, assigning numbers, but then querying for the
    numbers is also at a cost, which is primarily what the
    reader connections do.

    Then, the idea is to maintain the logical associations, of
    the message-id <-> article-id, also in a growing file, with
    a write-once read-many file about the NFS file system. There's
    no file locking, and, writes to the file that are disordered
    or contentious could (and by Murphy's law, would) write corrupt
    entries to the file. There are various notions of leader election
    or straw-pulling for exactly one of A or B to collect the numbers
    in order and write them to the article-ids file, one "row" (or 64
    byte fixed length record) per number, at the offset 64*number
    (as from some 0 or the offset from the first number). But,
    consensus and locking for serialization of tasks couples A and B
    which are otherwise running entirely independently. So, then
    the idea is to identify the next offset for the article-ids file,
    and collect a batch of numbers as make a block-sized block of
    the NFS implementation (eg 4Kb or 8Kb and hopefully configurably
    and not 1Mb which is about 64Kb records of 64b each). So, as
    A and B each collect the numbers (and detect if there were gaps
    now) then either (or both) completes a segment to append to the
    file. There aren't append modes of the NFS files, which is fine
    because actually the block now is written to the computed offset,
    which is the same for A and B. In the off chance A and B both
    make writes, file corruption doesn't follow because it's the
    same content, and it's block size, and it's an absolute offset.

    So, in this way, it seems that over time, the contents of the DB
    are written out to the sequence by article-id of message-id for
    each group

    group: article-id -> message-id

    besides that the message-id folder contains the article-ids

    message-id: groups -> article-id

    the content of which is known when the article-id numbers for
    the groups of the message are vended.


    Then, in the usual routine of looking up the message-id or
    article-id given the group, the DB table is authoritative,
    but, the NFS file is also correct, where a value exists.
    (Also it's immutable or constant and conveniently a file.)
    So, readers can map into memory the file, and consult the
    offset in the file, to find the message-id for the requested
    article-id, if that's not found, then the DB table, where it
    would surely be, as the message-id had vended an article-id,
    before the groups article-id range was set to include the
    new article.

    When a range of the article numbers is passed, then effectively,
    the lookup will always be satisfied by the file lookup instead
    of the DB table lookup, so there won't be the cost of the DB
    table lookup. In some off chance the open files of the NFS
    (also a limited resource, say 32K) are all exhausted, there's
    still a DB table to read, that is a limited and expensive
    resource, but also elastic and autoscalable.

    Anyways, this design issue also has the benefit of keeping it
    so that the file system has a convention with that all the data
    remains in the file system, with then usual convenience in
    backup and durability concerns, while still keeping it correct
    and horizontally scalable, basically with the notion of then
    even being able to truncate the database in any lull of traffic,
    for that the entire state is consistent on the file system.

    It remains to be figured out that NFS is OK with writing duplicate
    copies of a file block, toward having this highly reliable workflow
    system.


    That is basically the design issue then, I'm tapping away on this.


    [ page break 2 ]

    [2016/12/14]

    Tapping away at this idea of a usenet server system,
    I've written much of the read routine that is the
    non-blocking I/O with the buffer passing and for the
    externally coded data and any different coded data
    like the unencrypted or uncompressed. I've quite
    settled on 4KiB (2^12B) as the usual buffer page,
    and it looks that the NFS offering can be so tuned
    that its wsize (write size) is 4096 and with an
    async NFS write option that that page size will
    have that writes are incorruptible (though for
    whatever reason they may be lost), and that 4096B
    or 256 entries of 64B (2^6B) for a message-id or oversize-
    message-id entry will spool off the message-id's of
    the group's articles at an offset in the file that
    is article-id * (1 << 6). The MTU of Ethernet packets
    is often 1500 so having a wsize of 1KiB is not
    nonsensible, as many of the writes are of this
    granularity, the MTU might be 9001 or jumbo, which
    would carry 2 4KiB NFS packets in one Ethernet packet.
    Having the NFS rsize (read size) say 32KiB seems not
    unreasonable, with that the reads will be pages of the
    article-id's, or, the article contents themselves (split
    to headers, xrefs, body) from the filesystem that are
    mostly some few key and mostly quite altogether > 32 KiB,
    which is quite a lot considering that's less than a JPEG
    the size of "this". (99+% of Internet traffic was JPEG
    and these days is audio/video traffic, often courtesy JPEG.)

    Writing the read routine is amusing me with training the
    buffers and it amuses me to write code with quite the
    few +1 and -1 in the offsets. Usually having +-1 in
    the offset computations is a good or a bad thing, rarely
    good, with that often it's a sign that the method signature
    just isn't being used quite right in terms of the locals,
    if not quite as bad as "build a fence a mile then move it
    a foot". When +-1 offsets is a good thing, here the operations
    on the content of the buffers are rather agnostic the bounds
    and amount of the buffers, thus that I/O should be quite
    expedient in the routine.

    (Written in Java, it should run quite the same on any
    runtime with Java 1.4+.)

    That said then next I'm looking to implement the Executor pool.

    Acceptor -> Reader -> Scanner -> Executor -> Printer -> Writer

    The idea of the Executor pool is that there are many connections
    or sessions (the protocol is stateful), then that for one session,
    its command's results are returned in order, but, that doesn't say
    that the commands are executed in order, just that their results
    are returned in order. (For some commands, which affect the state
    of the session like current group or current article, that being
    pretty much it, those also have to be executed sequentially for
    consistency's sake.) So, I'm looking to have the commands be
    executed in any possible order, for the usual idea of saturating
    the bandwidth of the horizontally scalable backend. (Yeah, I
    know NFS has limits, but it's unbounded and durable, and there's
    overall a consistent, non-blocking toward lock-free view.)
    Anyways, basically the Session has a data structure of its
    outstanding commands, as they're enqueued to the task executor,
    then whether it can go into the out-of-order pool or must stay
    in the serial pool. Then, as the commands complete, or for
    example timeout after retries on some network burp, those are
    queued back up as the FIFO of the Results and as those arrive
    the Writer is re-registered with the SocketChannel's Selector
    for I/O notifications and proceeds to fill the socket's output
    buffer and retire the Command and Result. One aspect of this
    is that the Printer/Writer doesn't necessarily get the data on
    the heap, the output for example an article is composed from
    the FileChannels of the message-id's header, xref, body. Now,
    these days, the system doesn't have much of a limit in open
    file handles, but as mentioned above there are limits on NFS
    file handles. Basically then the data is retrieved as from the
    object store (or here an octet store but the entire contents of
    the files are written to the output with filesystem transfer
    direct to memory or the I/O channel). Then, releasing the
    NFS file handles expeditiously basically is to be figured out
    with caching the contents, for any retransmission or simply
    serving copies of the current articles to any number of
    connections. As all these are, read-only, it looks like the
    filesystems' built-in I/O caching with, for example, a read-only
    client view and no timeout, basically turns the box into a file
    cache, because that is what it is.

    Then, it looks like there is a case for separate reader and
    writer implementations altogether of the NFS or octet store
    (that here is an object store for the articles and their
    sections, and an octet store for the pages of the tables).
    This is with the goal of minimizing network access while
    maintaining the correct view. But, an NFS export can't
    be mounted twice from the same client (one for reads and
    one for writes), and, while ingesting the message can be
    done separately the client, intake has to occur from the
    client, then what with a usual distributed cloud queue
    implementation having size and content limits, it seems
    like it'll be OK.

    [2016/12/17]

    The next thing I'm looking at is how to describe the "range",
    as a data structure or in algorithms.

    Here a "range" class in the runtime library is usually a
    "bounds" class. I'm talking about a range, basically a
    1-D range, about basically a subset of the integers,
    then that the range is iterating over the subset in order,
    about how to maintain that in the most maintainable and
    accessible terms (in computational complexity's space and time
    terms).

    So, I'm looking to define a reasonable algebra of individuals,
    subsets, segments, and rays (and their complements) that
    naturally compose to objects with linear maintenance and linear
    iteration and constant access of linear partitions of time-
    series data, dense or sparse, with patterns and scale.

    This then is to define data structures as so compose that
    given a series of items and a predicate, establish the
    subset of items as a "range", that then so compose as
    above (and also that it has translations and otherwise
    is a fungible iterator).

    I don't have one of those already in the runtime library.

    punch-out <- punches have shapes, patterns? eg 1010
    knock-out <- knocks have area
    pin-out <- just one
    drop-out <-
    fall-out <- range is out

    Then basically there's a coalescence of all these,
    that they have iterators or mark bounds, of the
    iterator of the natural range or sequence, for then
    these being applied in order

    push-up <- basically a prioritization
    fill-in <- for a "sparse" range, like the complement upside-down
    pin-in
    punch-in
    knock-in

    Then all these have the basic expectation that a range
    is the combination of each of these that are expressions
    then that they are expressions only of the value of the
    iterator, of a natural range.

    Then, for the natural range being time, then there is about
    the granularity or fine-ness of the time, then that there is
    a natural range either over or under the time range.

    Then, for the natural range having some natural indices,
    the current and effective indices are basically one and
    zero based, that all the features of the range are shiftable
    or expressed in terms of these offsets.

    0 - history

    a - z

    -m,n

    Whether there are pin-outs or knock-outs rather varies on
    whether removals are one-off or half-off.

    Then, pin-outs might build a punch-out,
    While knock-outs might build a scaled punch-out

    Here the idea of scale then is to apply the notions
    of stride (stripe, stribe, striqe) to the range, about
    where the range is for example 0, 1, .., 4, 5 .., 8, 9
    that it is like 1, 3, 5, 7 scaled out.

    Then, "Range" becomes quite a first-class data structure,
    in terms of linear ranges, to implement usual iterators
    like forward ranges (iterators).

    Then, for time-forward searches, or to compose results in
    ranges from time-forward searches, without altogether loading
    into memory the individuals and then sorting them and then
    detecting their ranges, there is to be defined how ranges
    compose. So, the Range includes a reference to its space
    and the Bounds of the Space (in integers then extended
    precision integers).

    "Constructed via range, slices, ..." (gslices), ....



    Then, basically I want that the time series is a range,
    that expressions matching elements are dispatched to
    partitions in the range, that the returned or referenced
    composable elements are ranges, that the ranges compose
    basically pair-wise in constant time, thus linearly over
    the time series, then that iteration over the elements
    is linear in the elements in the range, not in the time
    series. Then, it's still linear in the time series,
    but sub-linear in the time series, also in space terms.

    Here, sparse or dense ranges should have the same small-
    linear space terms, with there being maintenance on the
    ranges, about there being hysteresis or "worst-case 50/50"
    (then basically some inertia for where a range is "dense"
    or "sparse" when it has gt or lt .5 elements, then about
    where it's just organized that way because there is a re-
    organization).

    So, besides composing, then the elements should have very
    natural complements, basically complementing the range by
    taking the complement of the ranges parts, that each
    sub-structure has a natural complement.

    Then, pattern and scale are rather related, about figuring
    that out some more, and leaving the general purpose, while
    identifying the true primitives of these.

    Then eventually there attachment or reference to values
    under the range, and general-purpose expressions to return
    an iteration or build a range, about the collectors that
    establish where range conditions are met and then collapse
    after the iteration is done, as possible.

    So, there is the function of the range, to iterate, then
    there is the building of the range, by iterating. The
    default of the range and the space is its bounds (or, in
    the extended, that there are none). Then, segments are
    identified by beginning and end (and perhaps a scale, about
    rigid translations and about then that the space is
    unsigned, though unbounded both left and right see
    some use). These are dense ranges, then for whether the
    range is "naturally" or initially dense or sparse. (The
    usual notion is "dense/full" but perhaps that's as
    "complement of sparse/empty".) Then, as elements are
    added or removed in the space, if they are added range-wise
    then that goes to a stack of ranges that any forward
    iterator checks before it iterators, about whether the
    natural space's next is in or out, or, whether there is
    a skip or jump, or a flip then to look for the next item
    that is in instead of out.

    This is where, the usual enough organization of the data
    as collected in time series will be bucketed or partitioned
    or sharded into some segment of the space of the range,
    that buiding range or reading range has the affinity to
    the relevant bucket, partition, or shard. (This is all
    1-D time series data, no need to make things complicated.)

    Then, the interface basically "builds" or "reads" ranges,
    building given an expression and reading as a read-out
    (or forward iteration), about that then the implementation
    is to compose the ranges of these various elements of a
    topological sort about the bounds/segments and scale/patterns
    and individuals.

    https://en.wikipedia.org/wiki/Allen%27s_interval_algebra

    This is interesting, for an algebra of intervals, or
    segments, but here so far I'd been having that the
    segments of contiguous individuals are eventually
    just segments themselves, but composing those would
    see the description as of this algebra. Clearly the
    goal is the algebra of the contents of sets of integers
    in the integer spaces.

    An algebra of sets and segments of integers in integer spaces

    An integer space defines elements of a type that are ordered.

    An individual integer is an element of this space.

    A set of integers is a set of integers, a segment of integers
    is a set containing a least and greatest element and all elements
    between. A ray of integers of a set containing a least element
    and all greater elements or containing a greatest element and
    all lesser elements.

    A complement of an individual is all the other individuals,
    a complement of a set is the intersection of all other sets,
    a complement of a segment is all the elements of the ray less
    than and the ray greater than all individuals of the segment.

    What are the usual algebras of the compositions of individuals,
    sets, segments, and rays?

    https://en.wikipedia.org/wiki/Region_connection_calculus



    Then basically all kinds of things that are about subsets
    of thing in a topological or ordered space should basically
    have a first-class representation as (various kinds of)
    elements in the range algebra.

    So, I'm wondering what there is already for
    "range algebra" and "range calculus".

    [2016/12/18]

    Some of the features of this subsets of a
    range of integers is available as a usual
    bit vector, eg with ffs ("find-first-set")
    memory scan instructions memory scan instructions,
    and as well usual notions of compressed bitmap
    indices, with some notion of random access to
    the value of a bit by its index and variously
    iterating over the elements. Various schemes
    to compress the bitmaps down to uncompressed
    regions with representing words' worths of bits
    may suit parts of the implementation, but I'm
    looking for a "pyramidal" or "multi-resolution"
    organization of efficient bits, and also flags,
    about associating various channels of bits with
    the items or messages.

    https://en.wikipedia.org/wiki/Bitmap_index

    Then, with having narrowed down the design for
    what syntax to cover, and, mostly selected data
    structures for the innards, then I've been looking
    to the data throughput, then some idea of support
    of client features.

    Throughput is basically about how to keep the
    commands moving through. For this, there's a
    single thread that reads off the network interface'
    I/O buffers, it was also driving the scanner, but
    adding encryption and compression layers, then there's
    also adding a separate thread to drive the scanner
    thus that the network interface is serviced on demand.
    Designing a concurrent data structure basically has
    a novel selector (as of the non-blocking I/O) to
    then pick off a thread from the pool to run the
    scanner. Then, on the "printer" side and writing
    off to the network interface, it is similar, with
    having the session or connection's resources run
    the compression and encryption, then for the I/O
    thread as servicing the network interface. Basically
    this is having put a collator/relay thread between
    the I/O threads and the scanner/printer threads
    (where the commands are run by the executor pool).


    Then, a second notion has been the support of TLS.
    It looks I would simply sign a certificate and expect
    users to check and install it themselves in their
    trust-store for SSL/TLS. That said, it isn't really
    a great solution, because, if someone compromises any
    of the CA's, certificate authorities, in the trust
    store (any of them), then a man-in-the-middle could
    sign a cert, and it would be on the server to check
    that the content hash reflected the server cert from
    the handshake. What might be better would be to have
    that each client, signs their own certificate, for the
    server to present. This way, the client and server
    each sign a cert, and those are exchanged. When the
    server gets the client cert, it restarts the negotiation
    now with using the client-signed cert as the server
    cert. This way, there's only a trust anchor of depth
    1 and the trust anchors are never exchanged and can
    not be cross-signed nor otherwise would ever share
    a trust root. Similarly the server get's the server-
    signed cert back from the client then that TLS could
    proceed with a session ticket and that otherwise there
    would be a stronger protection from compromised CA
    certs. Then, this could be pretty automatic with
    a simple enough browser interface or link to set up TLS.
    Then the server and client would only trust themselves
    and each other (and keep their secrets private).

    Then, for browsing, a reading of IMAP, the Internet
    Message Access Protocol, shows a strong affinity with
    the organization of Usenet messages, with newsgroups
    as mailboxes. As well, implementing an IMAP server
    that is backed by the NNTP server has then that the
    search artifacts and etcetera (and this was largely
    a reason why I need this improved "range" pattern)
    would build for otherwise making deterministic date-
    oriented searches over the messages in the NNTP server.
    IMAP has a strong affinity with NNTP, and is a very
    similar protocol and is implemented much the same
    way. Then it would be convenient for users with
    an IMAP client to simply point to "usenet.science"
    or what and get usenet through their email browser.


    [2016/12/23]

    About implementing usenet with reasonably
    modern runtimes and an eye toward
    unlimited retention, basically looking
    into "microtasks" for the routine or
    workflow instances, as are driven with
    non-blocking I/O throughout, basically
    looking to memoize the steps as through
    a finite state machine, for restarts as
    of a thread, then to go from "service
    oriented" to "message oriented".


    This involves writing a bit of an
    HTTP client for rather usual web
    service calls, but with high speed
    non-blocking I/O (less threads, more
    connections). Also this involves a
    sufficient abstraction.


    [ page break 3 ]

    [2017/01/06]

    This writing some software for usenet service
    is coming along with the idea of how to implement
    the fundamentally asynchronous non-blocking routine.
    This is crystallizing in pattern as a: re-routine,
    in reference to computing's usual: co-routine.

    The idea of the re-routine is that there are only
    so many workers, threads, of the runtime. The usual
    runtimes (and this one, Java, say) support preemptive
    multithreading as a means of implementing cooperative
    multithreading, with the maintenance of separate stacks
    (of, the stack machine of usual C-like procedural runtimes)
    and some thread-per-connection model. This is somewhat
    reasonable for the composition of blocking APIs, but
    not so much for the composition of non-blocking APIs
    and about how to not have many thread-per-connection
    resources with essentially zero duty cycle that instead
    could maintain for themselves the state machine of their
    routine (with simplified forward states and a general
    exception and error routine), for cooperative multi-threading.

    The idea of this re-routine then is to connect functions,
    there's a scope for variables in the scope, there is
    execution of the functions (or here the routines, as
    the "re-routines") then the instance of the re-routine
    is re-entrant in the sense that as partial results are
    accumulated the trace of the routine is marked out, with
    leaving in the scope the current or partial or intermediate
    results. Then, the asynchronous workers that fulfill each
    routine (eg, with a lookup, a system call, or a network
    call) are separate worker units dedicated to their domain
    (of the routine, not the re-routine, and they can be blocking,
    polling for their fleet, or callback with the ticket).

    Then, this is basically a network machine and protocol,
    here about NNTP and IMAP, and its resources are often
    then of network machines and protocols (eg networked
    file systems, web services). Then, these "machines"
    of the "re-routine" being built (basically for the
    streaming model instead of the batch model if you
    know what I'm talking about) defining the logical
    outcomes of the composition of the inputs and the
    resulting outputs in terms of scopes as a model of
    the cooperative multithreading, these re-routines
    then are seeing for the pattern then that the
    source template is about implicitly establishing
    the scope and the passing and calling convention
    (without a bunch of boilerplate or "callback confusion",
    "async hell"). This is where the re-routine, when
    a routine worker fills in a partial result and resubmits
    the re-routine (with the responsibility/ownership of
    the re-routine) that it is re-evaluated from the beginning,
    because it is constant linear in reading forward for the
    item the state of its overall routine, thusly implicit
    without having to build a state machine, as it is
    declaratively the routine.

    So, I am looking at this as my solution as to how to
    establish a very efficient (in resource and performance
    terms) formally correct protocol implementation (and
    with very simple declarative semantics of usual forward,
    linear routines).

    This "re-routine" pattern then as a model of cooperative
    multithreading sees the complexity and work into the
    catalog of blocking, polling, and callback support,
    then for usual resource injection of those as all
    supported with references to usual sequential processes
    (composition of routine).


    [2017/0121]

    I've about sorted out how to implement the re-routine.

    Basically a re-routine is a suspendable composite
    operation, with normal declarative flow-of-control
    syntax, that memo-izes its partial results, and
    re-executes the same block of statements then to
    arrive at its pause, completion, or exit.

    Then, the command and executor are passed to the
    implementation that has its own (or maybe the
    same) execution resources, eg a thread or connection
    pool. This resolves the value of the asynchronous
    operation, and then re-submits the re-routine to
    its originating executor. The re-routine re-runs
    (it runs through the branching or flow-of-control
    each time, but that's small in the linear and all
    the intermediate products are already computed,
    and the syntax is usual and in the language).
    The re-routine then either re-suspends (as it
    launches the next task) or completes or exits (errors).
    Whether it suspends, completes or exits, the
    re-routine just returns, and the executor then
    is specialized and just checks the re-routine
    whether it's suspended (and just drops it, the
    new responsible launched will re-submit it),
    or whether it's completed or errored (to call
    back to the originating commander the result of
    the command).


    In this manner, it seems like a neat way to basically
    establish the continuation, for this "non-blocking
    asynchronous operation", while at the same time
    the branching and flow of control is all in the
    language, with the usual un-suprising syntax and
    semantics, for cooperative multi-threading. The
    cost is in wrapping the functional callers of the
    routine and setting up their factories and otherwise
    as via injection (and they can block the calling
    thread, or have their own threads and block, or
    be asynchronous, without changing the definition
    of the routine).


    [2017/01/21]

    I've about sorted out how to implement the re-routine.

    Basically a re-routine is a suspendable composite
    operation, with normal declarative flow-of-control
    syntax, that memo-izes its partial results, and
    re-executes the same block of statements then to
    arrive at its pause, completion, or exit.

    Then, the command and executor are passed to the
    implementation that has its own (or maybe the
    same) execution resources, eg a thread or connection
    pool. This resolves the value of the asynchronous
    operation, and then re-submits the re-routine to
    its originating executor. The re-routine re-runs
    (it runs through the branching or flow-of-control
    each time, but that's small in the linear and all
    the intermediate products are already computed,
    and the syntax is usual and in the language).
    The re-routine then either re-suspends (as it
    launches the next task) or completes or exits (errors).
    Whether it suspends, completes or exits, the
    re-routine just returns, and the executor then
    is specialized and just checks the re-routine
    whether it's suspended (and just drops it, the
    new responsible launched will re-submit it),
    or whether it's completed or errored (to call
    back to the originating commander the result of
    the command).


    In this manner, it seems like a neat way to basically
    establish the continuation, for this "non-blocking
    asynchronous operation", while at the same time
    the branching and flow of control is all in the
    language, with the usual un-suprising syntax and
    semantics, for cooperative multi-threading. The
    cost is in wrapping the functional callers of the
    routine and setting up their factories and otherwise
    as via injection (and they can block the calling
    thread, or have their own threads and block, or
    be asynchronous, without changing the definition
    of the routine).

    So, having sorted this mostly out, then the usual
    work as of implementing the routines for the protocol
    can so proceed then with a usual notion of a framework
    of support for both the simple declaration of routine
    and the high performance (and low resource usage) of
    the delegation of routine, and support for injection
    for test and environment, and all in the language
    with minimal clutter, no byte-code modification,
    and a ready wrapper for libraries of arbitrary
    run-time characteristic.

    This solves some problems.


    [2017/01/22]

    Thanks for your interest, if you read the thread,
    I'm talking about an implementation of usenet,
    with modern languages and runtimes, but, with
    a filesystem convention, and a distributed redundant
    store, and otherwise of very limited hardware and
    distributed software resources or the "free tier"
    of cloud computing (or, any box).

    When it comes to message formats, usenet isn't
    limited to plain text, it's as simply usual
    MIME multimedia. (The user-agent can render
    text however it would so care.)

    A reputation system is pretty simply implemented
    with forwarding posts to various statistics groups
    that over time build profiles of authors that
    readers may adopt.

    Putting an IMAP interface in front of a NNTP gateway
    makes it pretty simple to have cross-platform user
    interfaces from any IMAP (eg, email) client.

    Then, my requirements include backfilling a store
    with the groups of interest for implementing summary
    and search for archival and research purposes.


    [2017/01/22]

    (About the 2nd law of thermodynamics, Moore's
    law, and the copper process with regards to the
    cross-talk about the VLSI or "ultra" VLSI or
    the epoch these days, and burning bits, what
    you might if interest is the development of
    the "reversible computing", which basically
    recycles the bits, and then also that besides
    the usual electronic transistor, and besides that
    today there can be free-form 3-D IC's or "custom
    logic", instead of just the planar systolic clock-
    driven chip, there are also "systems on chip" with
    regards to electron, photon, and heat pipes as
    about the photo-electic and Seebeck/Peltier,
    with various remarkably high efficiency models
    of computation, this besides the very novel
    serial and parallel computational units and
    logical machines afforded by 3-D IC' and optics.

    About "reasonably simple declaration of routine
    in commodity languages on commodity hardware
    for commodity engineers for enduring systems",
    at cost, see above.)


    [2017/02/07]

    Not _too_ much progress, has basically seen the adaptation
    of this re-routine pattern to the command implementations,
    with basically usual linear procedural logic then the
    automatic and agnostic composition of the asynchronous
    tasks in the usual declarative syntax that then the
    pooled (and to be metered) threads are possibly by
    design entirely non-blocking and asynchronous, and
    possibly by design blocking or otherwise agnostic of
    implementation, with then the design of the state
    machine of the routine as "eventually consistent"
    or forward and making efficient use of the computational
    and synchronization resources.

    The next part has been about implementing a client "machine"
    as complement to the server "machine", where a machine here
    is an assembly as it were of threads and executors about the
    "reactive" (or functional, event-driven) handling of the
    abstract system resources (small pojos, file name, and
    linked lists of 4K buffers). The server basically starts
    up listening on a port then accepts and starts a session
    for any connection and then a reader fills and moves buffers
    to each of the sessions of the connections, and signals the
    relay then for the scanning of the inputs and then composing
    the commands and executing those as these re-routines, that
    as they complete, then the results of the commands are then
    printed out to buffers (eg, encoded, compressed, encrypted)
    then the writer sends that back on the wire. The client
    machine then is basically a model of asynchronous and
    probably serial computation or a "web service call", these
    days often and probably on a pooled HTTP connections. This
    then is pretty simple with the callbacks and the addressing/
    routing of the response back to the re-routine's executor
    to then re-submit the re-routine to completion.

    I've been looking at other examples of continuations, the
    "reactive" programming or these days' "streaming model"
    (where the challenge is much in the aggregations), that
    otherwise non-blocking or asynchronous programming is
    often rather ... recursively ... rolled out where this
    re-routine gains even though the flow-of-control is
    re-executed over the memoized contents of the re-routines
    as they are so composed declaratively, that this makes
    what would be "linear" at worst "n squared", but that is
    only on how many commands there are in the procedure,
    not combined over their execution because all the
    intermediate results are memoized (as needed, because
    if the implementation is local or a mock instead, the
    re-routine is agnostic of asychronicity and just runs
    through linearly, but the relevant point is that the
    number of composable units is a small constant thus
    that it's square is a small constant, particularly
    as otherwise being a free model of cooperative multi-
    threading, here toward a lock-free design). All the
    live objects remain on the heap, but just the objects
    and not for example the stack as a serialized continuation.
    (This could work out to singleton literals or "coding"
    but basically it will have to auto-throttle off heap-max.)

    So, shuffling and juggling the identifiers and organizations
    around and sifting and sorting what elements of the standard
    concurrency and functional libraries (of, the "Java" language)
    to settle on for usual neat and concise (and re-usable and
    temporally agnostic) declarative flow-of-control (i.e., with
    "Future"'s everywhere and as about reasonable or least-surprising
    semantics, if any, with usual and plain code also being "in
    the convention"), then it is settling on a style.

    Well, thanks for reading, it's a rather stream-of-consciousness
    narrative, here about the design of pretty re-usable software.

    [2017/02/07]

    Sure, I'll limit this.

    There is plenty of usenet server software, but it is mostly
    INND or BNews/CNews, or a few commercial cousins. The design
    of those systems is tied to various economies that don't so much
    apply these days. (The use-case, of durable distributed message-
    passing, is still quite relevant, and there are many ecosystems
    and regimes small and large as about it.) In the days of managed
    commodity network and compute resources or "cloud computing", here
    as above about requirements, then a modernization is relevant, and
    for some developers with the skills, not so distant.

    Another point is that the eventual goal is archival, my goal isn't
    to start an offshoot, instead to build the system as a working
    model of an archive, basically from the author's view as a working
    store for extracting material, and from the developer's view as
    an example in design with low or no required maintenance and
    "scalable" operation for a long time.


    You mention comp.ai.philosophy, these days there's a lot more
    automated reasoning (or, mockingbird generators), as computing
    and development affords more and different forms of automated
    reasoning, here again the point is for an archival setting to
    give them something to read.

    Thanks, then, I'll limit this.

    [2017/03/21]

    I continued tapping away at this.

    The re-routines now sit beyond a module or domain definition.
    This basically defines the modules' value types like session,
    message, article, group, content, wildmat. Then, it also
    defines a service layer, as about the relations of the elements
    of the domain, so that then the otherwise simple value types
    have natural methods as relate them, all implemented behind
    a service layer, that implemented with these re-routines is
    agnostic of synchronous or asynchronous convention, and
    is non-blocking throughout with cooperative multithreading.
    This has a factory of factories or industry pattern that provides
    the object graph wiring and dynamic proxying to the routine
    implementations, that are then defined as traits, that the re-
    routine composes the routines as mixins (of the domain's
    services).

    (This is all "in the language" in Java, with no external dependencies.)

    The transport mechanism is basically having abstracted the
    attachment for a usual non-blocking I/O framework for the
    transport types as of the scattering/gathering or vector I/O
    as about then the interface between transport and protocol
    (here NNTP, but, generally). Basically in a land of 4K byte buffers,
    then those are fed from the Reader/Writer that is the endpoint to
    a Feeder/Scanner that is implemented for the protocol and usual
    features like encryption and compression, then making Commands
    and Results out of those (and modelling transactions or command
    sequences as state machines which are otherwise absent), those
    systolically carrying out as primitive or transport types to a Printer/
    Hopper, that also writes the response (or rather, consumes the buffers
    in a highly concurrent highly efficient event and selection hammering).
    The selector is another bounded resource, so it's configurable the
    SelectorAssignment and there might be a thread for each group of
    selectors about FD_SETSIZE, but that's not really at issue as select
    went to epoll, but provides an option for that eventuality.

    The transport and protocol routines are pretty well decoupled this
    way, and then the protocol domain, modules, and routines are as
    well so decoupled (and fall together pretty naturally), much using
    quite usual software design patterns (if not necessarily so formally,
    quite directly).

    The protocol then (here NNTP) then is basically in a few files detailing
    the semantics of the commands to the scanner as overriding methods
    of a Command class, and implementing the action in the domain from
    extending the TraitedReRoutine then for a single definition in the NNTP
    domain that is implemented in various modules or as collections of
    services.


    [2017/04/09]

    I'm still tapping away at this if rather more slowly (or, more
    sporadically).

    The "re-routine" async completion pattern is more than less
    figured out (toward high concurrency as a model of cooperative
    multi-threading, behind also a pattern of a domain layer, with mix-in
    nyms that is also some factory logic), a simple non-blocking I/O socket
    service routine is more than less figured out (the server not the client,
    toward again high concurrency and flexible and efficient use of machine
    or virtualized resources as they are), the commands and their bodies are
    pretty much typed up, then I've been trying to figure out some data
    structures basically in I/O (Input/Output), or here mostly throughput
    as it is about the streams.

    I/O datum FIFOs and holders:

    buffer queue
    handles queue
    buffer+handles queue
    buffer/buffer[] or buffer[]/buffer in loops
    byte[]/byte[] in steps
    Input/Output in Streams

    Basically any of the filters or adapters is specialized to these
    input/output
    data holders. Then, there are logically enough queues or FIFOs as there are >> really implicitly between any communicating sequential processes that are
    rate-limited or otherwise non-systolic ("real-time"), here for some
    ideas about
    data structures, as either implement or adapt unbounded single producer/
    single consumer (SPSC) queues.

    One idea is the making the linked container with then sentinel nodes
    and otherwise making it thread-safe (for a single producer and single
    consumer). This is where the queue (or, "monohydra" or "slique") is
    rather generally a container, and that here iterations are usually
    consuming the queue, but sometimes there are aggregates collected
    then to go over the queue. The idea then is that the producer and
    consumer have separate views of the queue that the producer does
    atomic swap on the tail of the queue and that a consumer's iterator
    of elements (as iterable and not just a queue, for using the queue as
    a holder and not just a FIFO) returns a marker to the end of the iteration, >> for example in computing bounds over the buffers then re-iterating and
    flipping the buffers then given the bounds moving the buffers' references
    to an output array thus consuming the FIFO.

    This then combines with the tasks that the tasks driving the I/O (as events >> drive the tasks) are basically constant tasks or runnables (constant to the >> session or attachment) that just have incremented a count of times to run
    thus that there's always a service of the FIFO after the atomic append.

    Another idea is this hybrid or serial mix-and-match (SPSC FIFO), of buffers >> and handles. This is where the buffer in the data in-line, the handle is a >> reference to the data. This is about passing through the handles where
    the channels support their transfer, and converting them to inline data
    where they don't. That's then about all the combined cases as the above
    I/O datum FIFOs and holders, with adapting them so the filter chain blasts >> (eg specialized operation), loops (transferring in and out of buffers),
    steps
    (statefully filling and levelling data), or moves (copying the
    references, the
    data in or out or on or off, then to perform the I/O operations) over them. >>
    It seems rather simpler to just adapt the data types to the boundary I/O
    data
    types which are byte buffers (here size-4K pooled memory buffers) and for
    that the domain shouldn't know concrete types so much as interfaces, but
    the buffers and handles (file handles) and arrays as they are are pretty
    much
    fungible to the serialization of the elements of the domain, that can then >> specialize how they build logical inputs and outputs of the commands.

    [2017/07/16]

    Implementing search is rather a challenge.

    Besides accepter/rejector and usual notions of matching
    (eg the superscalar on closed categories), find and query
    seems for where besides usual notions of object hashes
    as indices that there is to be built up from the accepter/
    rejector all sorts of indices as do/don't/don't-matter the
    machines of the accepters and rejectors, vis-a-vis going
    over input data and the corpus and finding relations (to
    the input, or here space of inputs), of the corpus.

    That's where, after finding an event for AP, whether
    you're interested in the next for him or the first
    for someone else. There are quite various ways to
    achieve those quite various goals, besides computing
    the first goal. Just as an example that's, for example,
    the first reasonable AP Maxwell equation (or reference)
    or for everybody else, like, who knows about the Maxwell
    equation(s).

    Search is a challenge, NNTP rather puts it off to IMAP first
    for free text search, then for the concept search or
    "call by meaning" you reference, basically refining
    estimates of the scope of what it takes to find out
    what that is.

    Then for events in time-series data there's a usual general
    model for things as they occur. That could be rather
    rich and where causal is separate from associative
    (though of course casuality is associative).

    With the idea of NNTP as a corpus, then a usual line
    for establishing tractability of search is to associate
    its contents some document then semantic model i.e.,
    then to generate and maintain that besides otherwise
    that the individual items or posts and their references
    in the meta-data besides the data are made tractable
    then for general ideas of things.

    I'm to get to this, the re-routine particularly amuses
    me as a programming idiom in the design of more-or-less
    detached service routine from the corpus, then about
    what body of data so more-than-less naturally results,
    with rather default and usual semantics.


    Such "natural language" meaning as can be compiled for
    efficiency to the very direct in storage and reference,
    almost then asks "what will AP come up with, next".

    [ page break 4 ]

    [2020/06/29]

    I haven't much worked on this. The idea of the industry
    pattern and for the re-routine makes for quite a bit simply
    the modules in memory or distributed and a default free-threaded
    machine.

    Search you mentioned and for example HTTP is adding the SEARCH verb,
    for example simple associative conditions that naturally only combine,
    and run in parallel, there are of course any number of whatever is the
    HTTP SEARCH implementations one might consider, here usenet's is
    rudimentary where for example IMAP over it is improved, what for
    contextual search and content representation.

    Information retrieval and pattern recognition and all that is
    plenty huge, here that terms define the corpus.

    My implementation of the high-performance selector routine,
    the networking I/O selector, with this slique I implemented,
    runs up and fine and great up to thousands of connections,
    but, it seems like running the standard I/O and non-blocking
    I/O in the same actual container, makes that I implemented
    the selecting hammering non-blocking I/O toward the 10KC,
    though it is is small blocks because here the messages are
    small, then for under what conditions it runs server class.

    With the non-blocking networking I/O, the scanning and parsing
    that assembles messages off the I/O, and that's after compression
    and encryption in the layers, that it's implemented in Java and
    Java does that, then inside that all the commands in the protocol
    then have their implementations in the re-routine, that all
    non-blocking itself and free-threaded, makes sense for
    co-operative multithreading, of an efficient server runtime
    with here the notion of a durable back-end (or running in memory).


    [2020/11/16]

    In traffic there are two kinds of usenet users,
    viewers and traffic through Google Groups,
    and, USENET. (USENET traffic.)

    Here now Google turned on login to view their
    Google Groups - effectively closing the Google Groups
    without a Google login.

    I suppose if they're used at work or whatever though
    they'd be open.



    Where I got with the C10K non-blocking I/O for a usenet server,
    it scales up though then I think in the runtime is a situation where
    it only runs epoll or kqueue that the test scale ups, then at the end
    or in sockets there is a drop, or it fell off the driver. I've implemented >> the code this far, what has all of NNTP in a file and then the "re-routine, >> industry-pattern back-end" in memory, then for that running usually.

    (Cooperative multithreading on top of non-blocking I/O.)

    Implementing the serial queue or "monohydra", or slique,
    makes for that then when the parser is constantly parsing,
    it seems a usual queue like data structure with parsing
    returning its bounds, consuming the queue.

    Having the file buffers all down small on 4K pages,
    has that a next usual page size is the megabyte.

    Here though it seems to make sense to have a natural
    4K alignment the file system representation, then that
    it is moving files.

    So, then with the new modern Java, it that runs in its own
    Java server runtime environment, it seems I would also
    need to see whether the cloud virt supported the I/O model
    or not, or that the cooperative multi-threading for example
    would be single-threaded. (Blocking abstractly.)

    Then besides I suppose that could be neatly with basically
    the program model, and its file model, being well-defined,
    then for NNTP with IMAP organization search and extensions,
    those being standardized, seems to make sense for an efficient
    news file organization.

    Here then it seems for serving the NNTP, and for example
    their file bodies under the storage, with the fixed headers,
    variable header or XREF, and the message body, then under
    content it's same as storage.

    NNTP has "OVERVIEW" then from it is built search.

    Let's see here then, if I get the load test running, or,
    just put a limit under the load while there are no load test
    errors, it seems the algorithm then scales under load to be
    making usually the algorithm serial in CPU, with: encryption,
    and compression (traffic). (Block ciphers instead of serial transfer.)

    Then, the industry pattern with re-routines, has that the
    re-routines are naturally co-operative in the blocking,
    and in the language, including flow-of-control and exception scope.


    So, I have a high-performance implementation here.

    [2020/11/16]

    It seems like for NFS, then, and having the separate read and write of
    the client,
    a default filesystem, is an idea for the system facility: mirroring the
    mounted file
    locally, and, providing the read view from that via a different route.


    A next idea then seems for the organization, the client views themselves
    organize over the durable and available file system representation, this
    provides anyone a view over the protocol with a group file convention.

    I.e., while usual continuous traffic was surfing, individual reads over
    group
    files could have independent views, for example collating contents.

    Then, extracting requests from traffic and threads seems usual.

    (For example a specialized object transfer view.)

    Making protocols for implementing internet protocols in groups and
    so on, here makes for giving usenet example views to content generally.

    So, I have designed a protocol node and implemented it mostly,
    then about designed an object transfer protocol, here the idea
    is how to make it so people can extract data, for example their own
    data, from a large durable store of all the usenet messages,
    making views of usenet running on usenet, eg "Feb. 2016: AP's
    Greatest Hits".

    Here the point is to figure that usenet, these days, can be operated
    in cooperation with usenet, and really for its own sake, for leaving
    messages in usenet and here for usenet protocol stores as there's
    no reason it's plain text the content, while the protocol supports it.

    Building personal view for example is a simple matter of very many
    service providers any of which sells usenet all day for a good deal.

    Let's see here, $25/MM, storage on the cloud last year for about
    a million messages for a month is about $25. Outbound traffic is
    usually the metered cloud traffic, here for example that CDN traffic
    support the universal share convention, under metering. What that
    the algorithm is effectively tunable in CPU and RAM, makes for under
    I/O that's it's "unobtrusive" or the cooperative in routine, for CPI I/O
    and
    RAM, then that there is for seeking that Network Store or Database Time
    instead effectively becomes File I/O time, as what may be faster,
    and more durable. There's a faster database time for scaling the ingestion >> here with that the file view is eventually consistent. (And reliable.)

    Checking the files would be over time for example with "last checked"
    and "last dropped" something along the lines of, finding wrong offsets,
    basically having to make it so that it survives neatly corruption of the
    store (by being more-or-less stored in-place).

    Content catalog and such, catalog.

    [2021/12/06]

    Then I wonder and figure the re-routine can scale.

    Here for the re-routine, the industry factory pattern,
    and the commands in the protocols in the templates,
    and the memory module, with the algorithm interface,
    in the high-performance computer resource, it is here
    that this simple kind of "writing Internet software"
    makes pretty rapidly for adding resources.

    Here the design is basically of a file I/O abstraction,
    that the computer reads data files with mmap to get
    their handlers, what results that for I/O map the channels
    result transferring the channels in I/O for what results,
    in mostly the allocated resource requirements generally,
    and for the protocol and algorithm, it results then that
    the industry factory pattern and making for interfaces,
    then also here the I/O routine as what results that this
    is an implementation, of a network server, mostly is making
    for that the re-routine, results very neatly a model of
    parallel cooperation.

    I think computers still have file systems and file I/O but
    in abstraction just because PAGE_SIZE is still relevant for
    the network besides or I/O, if eventually, here is that the
    value types are in the commands and so on, it is besides
    that in terms of the resources so defined it still is in a filesystem
    convention that a remote and unreliable view of it suffices.

    Here then the source code also being "this is only 20-50k",
    lines of code, with basically an entire otherwise library stack
    of the runtime itself, only the network and file abstraction,
    this makes for also that modularity results. (Factory Industry
    Pattern Modules.)

    For a network server, here, that, mostly it is high performance
    in the sense that this is about the most direct handle on the channels
    and here mostly for the text layer in the I/O order, or protocol layer,
    here is that basically encryption and compression usually in the layer,
    there is besides a usual concern where encryption and compression
    are left out, there is that text in the layer itself is commands.

    Then, those being constants under the resources for the protocol,
    it's what results usual protocols like NNTP and HTTP and other protocols
    with usually one server and many clients, here is for that these protocols >> are defined in these modules, mostly there NNTP and IMAP, ..., HTTP.

    These are here defined "all Java" or "Pure Java", i.e. let's be clear that >> in terms of the reference abstraction layer, I think computers still use
    the non-blocking I/O and filesystems and network to RAM, so that as
    the I/O is implemented in those it actually has those besides instead for
    example defaulting to byte-per-channel or character I/O. I.e. the usual
    semantics for servicing the I/O in the accepter routine and what makes
    for that the platform also provides a reference encryption implementation, >> if not so relevant for the block encoder chain, besides that for example
    compression has a default implementation, here the I/O model is as simply
    in store for handles, channels, ..., that it results that data
    especially delivered
    from a constant store can anyways be mostly compressed and encrypted
    already or predigested to serve, here that it's the convention, here is for >> resulting that these client-server protocols, with usually reads > postings >> then here besides "retention", basically here is for what it is.

    With the re-routine and the protocol layer besides, having written the
    routines in the re-routine, what there is to write here is this industry
    factory, or a module framework, implementing the re-routines, as they're
    built from the linear description a routine, makes for as the routine
    progresses
    that it's "in the language" and that more than less in the terms, it
    makes for
    implementing the case of logic for values, in the logic's
    flow-of-control's terms.

    Then, there is that actually running the software is different than just
    writing it, here in the sense that as a server runtime, it is to be made a >> thing, by giving it a name, and giving it an authority, to exist on the
    Internet.

    There is basically that for BGP and NAT and so on, and, mobile fabric
    networks,
    IP and TCP/IP, of course IPv4 and IPv6 are the coarse fabric main space,
    with
    respect to what are CIDR and 24 bits rule and what makes for TCP/IP, here
    entirely the course is using the TCP/IP stack and Java's TCP/IP stack, with >> respect to that TCP/IP is so provided or in terms of process what results
    ports mostly and connection models where it is exactly the TCP after the
    IP,
    the Transport Control Protocol and Internet Protocol, have here both this
    socket and datagram connection orientation, or stateful and stateless or
    here that in terms of routing it's defined in addresses, under that names
    and routing define sources, routes, destinations, ..., that routine numeric >> IP addresses result in the usual sense of the network being behind an IP
    and including IPv4 network fabric with respect to local routers.

    I.e., here to include a service framework is "here besides the routine,
    let's
    make it clear that in terms of being a durable resource, there needs to be >> some lockbox filled with its sustenance that in some locked or constant
    terms results that for the duration of its outlay, say five years, it is
    held
    up, then, it will be so again, or, let down to result the carry-over
    that it
    invested to archive itself, I won't have to care or do anything until
    then".


    About the service activation and the idea that, for a port, the routine
    itself
    needs only run under load, i.e. there is effectively little traffic on
    the old archives,
    and usually only the some other archive needs any traffic. Here the
    point is
    that for the Java routine there is the system port that was accepted for
    the
    request, that inetd or the systemd or means the network service was
    accessed,
    made for that much as for HTTP the protocol is client-server also for IP
    the
    protocol is client-server, while the TCP is packets. This is a general
    idea for
    system integration while here mostly the routine is that being a detail:
    the filesystem or network resource that results that the re-routines
    basically
    make very large CPU scaling.

    Then, it is basically containerized this sense of "at some domain name,
    there
    is a service, it's HTTP and NNTP and IMAP besides, what cares the world".

    I.e. being built on connection oriented protocols like the socket layer,
    HTTP(S) and NNTP(S) and IMAP(S) or with the TLS orientation to
    certificates,
    it's more than less sensible that most users have no idea of installing
    some
    NNTP browser or pointing their email to IMAP so that the email browser
    browses the newsgroups and for postings, here this is mostly only talk
    about implementing NNTP then IMAP and HTTP that happens to look like that, >> besides for example SMTP or NNTP posting.

    I.e., having "this IMAP server, happens to be this NNTP module", or
    "this HTTP server, happens to be a real simple mailbox these groups",
    makes for having partitions and retentions of those and that basically
    NNTP messages in the protocol can be more or less the same content
    in media, what otherwise is of a usual message type.

    Then, the NNTP server-server routine is the progation of messages
    besides "I shall hire ten great usenet retention accounts and gently
    and politely draw them down and back-fill Usenet, these ten groups".

    By then I would have to have made for retention in storage, such contents, >> as have a reference value, then for besides making that independent in
    reference value, just so that it suffices that it basically results "a
    usable
    durable filesystem that happens you can browse it like usenet". I.e. as
    the pieces to make the backfill are dug up, they get assigned reference
    numbers
    of their time to make for what here is that in a grand schema of things,
    they have a reference number in numerical order (and what's also the
    server's "message-number" besides its "message-id") as noted above this
    gets into the storage for retention of a file, while, most services for
    this
    are instead for storage and serving, not necessarily or at all retention.

    I.e., the point is that as the groups are retained from retention, there
    is an
    approach what makes for an orderly archeology, as for what convention
    some data arrives, here that this server-server routine is besides the
    usual
    routine which is "here are new posts, propagate them", it's "please deliver >> as of a retention scan, and I'll try not to repeat it, what results as
    orderly
    as possible a proof or exercise of what we'll call afterward entire
    retention",
    then will be for as of writing a file that "as of the date, from start
    to finish,
    this site certified these messages as best-effort retention".

    It seems then besides there is basically "here is some mbox file, serve it >> like it was an NNTP group or an IMAP mailbox", ingestion, in terms of that >> what is ingestion, is to result for the protocol that "for this protocol,
    there is actually a normative filesystem representation that happens to
    be pretty much also altogether definede by the protocol", the point is
    that ingestion would result in command to remain in the protocol,
    that a usual file type that "presents a usual abstraction, of a filesystem, >> as from the contents of a file", here with the notion of "for all these
    threaded discussions, here this system only cares some approach to
    these ten particular newgroups that already have mostly their corpus
    though it's not in perhaps their native mbox instead consulted from
    services".

    Then, there's for storing and serving the files, and there is the usual
    notion that moving the data, is to result, that really these file
    organizations
    are not so large in terms of resources, being "less than gigabytes" or so, >> still there's a notion that as a durable resource they're to be made
    fungible here the networked file approach in the native filesystem,
    then that with respect to it's a backing store, it's to make for that
    the entire enterprise is more or less to made in terms of account,
    that then as a facility on the network then a service in the network,
    it's basically separated the facility and service, while still of course
    that the service is basically defined by its corpus.


    Then, to make that fungible in a world of account, while with an exit
    strategy so that the operation isn't not abstract, is mostly about the
    domain name, then that what results the networking, after trusted
    network naming and connections for what result routing, and then
    the port, in terms of that there are usual firewalls in ports though that
    besides usually enough client ports are ephemeral, here the point is
    that the protocols and their well-known ports, here it's usually enough
    that the Internet doesn't concern itself so much protocols but with
    respect to proxies, here that for example NNTP and IMAP don't have
    so much anything so related that way after startTLS. For the world of
    account, is basically to have for a domain name, an administrator, and,
    an owner or representative. These are to establish authority for changes
    and also accountability for usage.

    Basically they're to be persons and there is a process to get to be an
    administrator of DNS, most always there are services that a usual person
    implementing the system might use, besides for example the numerical.

    More relevant though to DNS is getting servers on the network, with respect >> to listening ports and that they connect to clients what so discover
    them as
    via DNS or configuration, here as above the usual notion that these are
    standard services and run on well-known ports for inetd or systemd.
    I.e. there is basically that running a server and dedicated networking,
    and power and so on, and some notion of the limits of reliability, is then >> as very much in other aspects of the organization of the system, i.e.
    its name,
    while at the same time, the point that a module makes for that basically
    the provision of a domain name or well-known or ephemeral host, is the
    usual notion that static IP addresses are a limited resource and as about
    the various networks in IPv4 and how they route traffic, is for that these >> services have well-known sections in DNS for at least that the most usual
    configuration is none.

    For a usual global reliability and availability, is some notion
    basically that
    each region and zone has a service available on the IP address, for that
    "hostname" resolves to the IP addresses. As well, in reverse, for the IP
    address and about the hostname, it should resolve reverse to hostname.

    About certificates mostly for identification after mapping to port, or
    multi-home Internet routing, here is the point that whether the domain
    name administration is "epochal" or "regular", is that epochs are defined
    by the ports behind the numbers and the domain name system as well,
    where in terms of the registrar, the domain names are epochal to the
    registrar, with respect to owners of domain names.

    Then if DNS is a datagram or UDP service is for ICMP as for TCP/IP,
    and also BGP and NAT and routing and what are local and remote
    addresses, here is for not-so-much "implement DNS the protocol
    also while you're at it", rather for what results that there is a durable
    and long-standing and proper doorman, for some usenet.science.

    Here then the notion seems to be whether the doorman basically
    knows well-known services, is a multi-homing router, or otherwise
    what is the point that it starts the lean runtime, with respect to that
    it's a container and having enough sense of administration its operation
    as contained. I.e. here given a port and a hostname and always running
    makes for that as long as there is the low (preferable no) idle for
    services
    running that have no clients, is here also for the cheapest doorman that
    knows how to standup the client sentinel. (And put it back away.)

    Probably the most awful thing in the cloud services is the cost for
    data ingress and egress. What that means is that for example using
    a facility that is bound by that as a cost instead of under some constant
    cost, is basically why there is the approach that the containers needs a
    handle to the files, and they're either local files or network files, here >> with the some convention above in archival a shared consistent view
    of all the files, or abstractly consistent, is for making that the doorman >> can handle lots of starting and finishing connections, while it is out of
    the way when usually it's client traffic and opening and closing
    connections,
    and the usual abstraction is that the client sentinel is never off and
    doorman
    does nothing, here is for attaching the one to some lower constant cost,
    where for example any long-running cost is more than some low constant
    cost.

    Then, this kind of service is often represented by nodes, in the usual
    sense
    "here is an abstract container with you hope some native performance under >> the hypervisor where it lives on the farm on its rack, it basically is
    moved the
    image to wherever it's requested from and lives there, have fun, the
    meter is on".
    I.e. that's just "this Jar has some config conventions and you can make the >> container associate it and watchdog it with systemd for example and use the >> cgroups while you're at it and make for tempfs quota and also the best
    network
    file share, which you might be welcome to cache if you care just in the
    off-chance
    that this file-mapping is free or constant cost as long as it doesn't
    egress the
    network", is for here about the facilities that work, to get a copy of
    the system
    what with respect to its usual operation is a piece of the Internet.

    For the different reference modules (industry factories) in their
    patterns then
    and under combined configuration "file + process + network + fare", is that >> the fare of the service basically reflects a daily coin, in the sense
    that it
    represents an annual or epochal fee, what results for the time there is
    what is otherwise all defined the "file + process + network + name",
    what results it perpetuates in operation more than less simply and
    automatically.

    Then, the point though is to get it to where "I can go to this service, and >> administer it more or less by paying an account, that it thus lives in its >> budget and quota in its metered world".

    That though is very involved with identity, that in terms of "I the account >> as provided this sum make this sum paid with respect to an agreement",
    is that authority to make agreements must make that it results that the
    operation of the system, is entirely transparent, and defined in terms of
    the roles and delegation, conventions in operation.

    I.e., I personally don't want to administer a copy of usenet, but, it's
    here
    pretty much sorted out that I can administer one once then that it's to
    administer itself in the following, in terms of it having resources to
    allocate
    and resources to disburse. Also if nobody's using it it should basically
    work
    itself out to dial its lights down (while maintaining availability).

    Then a point seems "maintain and administer the operation in effect,
    what arrangement sees via delegation, that a card number and a phone
    number and an email account and more than less a responsible entity,
    is so indicated for example in cryptographic identity thus that the
    operation
    of this system as a service, effectively operates itself out of a kitty,
    what makes for administration and overhead, an entirely transparent
    model of a miniature business the system as a service".

    "... and a mailing address and mail service."

    Then, for accounts and accounts, for example is the provision of the
    component
    as simply an image in cloud algorithms, where basically as above here
    it's configured
    that anybody with any cloud account could basically run it on their own
    terms,
    there is for here sorting out "after this delegation to some business
    entity what
    results a corporation in effect, the rest is business-in-a-box and
    more-than-less
    what makes for its administration in state, is for how it basically
    limits and replicates
    its service, in terms of its own assets here as what administered is
    abstractly
    "durable forever mailboxes with private ownership if on public or
    managed resources".

    A usual notion of a private email and usenet service offering and
    business-in-a-box,
    here what I'm looking at is that besides archiving sci.math and copying
    out its content
    under author line, is to make such an industry for example here that
    "once having
    implemented an Internet service, an Internet service of them results
    Internet".

    I.e. here the point is to make a corporation and a foundation in effect,
    what in terms
    of then about the books and accounts, is about accounts for the business
    accounts
    that reflect a persistent entity, then what results in terms of
    computing, networking,
    and internetworking, with a regular notion of "let's never change this
    arrangement
    but it's in monthly or annual terms", here for that in overall
    arrangements,
    it results what the entire system more than less runs in ways then to
    either
    run out its limits or make itself a sponsored effort, about more-or-less
    a simple
    and responsible and accountable set of operations what effect the business >> (here that in terms of service there is basically the realm of agreement)
    that basically this sort of business-in-a-box model, is then besides
    itself of
    accounts, toward the notion as pay-as-you-go and "usual credits and
    their limits".

    Then for a news://usenet.science, or for example sci.math.usenet.science,
    is the idea that the entity is "some assemblage what is so that in DNS,
    and,
    in the accounts payable and receivable, and, in the material matters of
    arrangement and authority for administration, of DNS and resources and
    accounts what result durably persisting the business, is basically for a
    service
    then of what these are usual enough tasks, as that are interactive
    workflows
    and for mechanical workflows.

    I.e. the point is for having the service than an on/off button and more
    or less
    what is for a given instance of the operation, what results from some
    protocol
    that provides a "durable store" of a sort of the business, that at any
    time basically
    some re-routine or "eventually consistent" continuance of the operation
    of the
    business, results basically a continuity in its operations, what is
    entirely granular,
    that here for example the point is to "pick a DNS name, attach an
    account service,
    go" it so results that in the terms, basically there are the
    placeholders of the
    interactive workflows in that, and as what in terms are often for
    example simply
    card and phone number terms, account terms.

    I.e. a service to replenish accounts as kitties for making accounts only
    and
    exactly limited to the one service, its transfers, basically results
    that there
    is the notion of an email address, a phone number, a credit card's
    information,
    here a fixed limit debit account that works as of a kitty, there is a
    regular workflow
    service that will read out the durable stores and according to the
    timeliness of
    their events, affect the configuration and reconciliation of payments
    for accounts
    (closed loop scheduling/receiving).

    https://datatracker.ietf.org/doc/draft-flanagan-regext-datadictionary/
    https://www.rfc-editor.org/rfc/rfc9022.txt

    Basically for dailies, monthlies, and annuals, what make weeklies,
    is this idea of Internet-from-a- account, what is services.

    [ page break 5 ]


    [2023/03/08]

    After implementing a store, and the protocol for getting messages, then
    what seems relevant here in the
    context of the SEARCH command, is a fungible file-format, that is
    derived from the body of the message
    in a normal form, that is a data structure that represents an index and
    catalog and dictionary and summary
    of the message, a form of a data structure of a "search index".

    These types files should naturally compose, and result a data structure
    that according to some normal
    forms of search and summary algorithms, result that a data structure
    results, that makes for efficient
    search of sections of the corpus for information retrieval, here that
    "information retrieval is the science
    of search algorithms".

    Now, for what and how people search, or what is the specification of a
    search, is in terms of queries, say,
    here for some brief forms of queries that advise what's definitely
    included in the search, what's excluded,
    then perhaps what's maybe included, or yes/no/maybe, which makes for a
    predicate that can be built,
    that can be applied to results that compose and build for the terms of a
    filter with yes/no/maybe or
    sure/no/yes, with predicates in values.

    Here there is basically "free text search" and "matching summaries",
    where text is the text and summary is
    a data structure, with attributes as paths the leaves of the tree of
    which match.

    Then, the message has text, its body, and and headers, key-value pairs
    or collections thereof, where as well
    there are default summaries like "a histogram of words by occurrence" or
    for example default text like "the
    MIME body of this message has a default text representation".

    So, the idea developing here is to define what are "normal" forms of
    data structures that have some "normal"
    forms of encoding that result that these "normalizing" after "normative"
    data structures define well-behaved
    algorithms upon them, which provide well-defined bounds in resources
    that return some quantification of results,
    like any/each/every/all, "hits".

    This is where usually enough search engines' or collected search
    algorithms ("find") usually enough have these
    de-facto forms, "under the hood", as it were, to make it first-class
    that for a given message and body that
    there is a normal form of a "catalog summary index" which can be
    compiled to a constant when the message
    is ingested, that then basically any filestore of these messages has
    alongside it the filestore of the "catsums"
    or as on-demand, then that any algorithm has at least well-defined
    behavior under partitions or collections
    or selections of these messages, or items, for various standard
    algorithms that separate "to find" from
    "to serve to find".

    So, ..., what I'm wondering are what would be sufficient normal forms in
    brief that result that there are
    defined for a given corpus of messages, basically at the granularity of
    messages, how is defined how
    there is a normal form for each message its "catsum", that catums have a
    natural algebra that a
    concatenation of catums is a catsum and that some standard algorithms
    naturally have well-defined
    results on their predicates and quantifiers of matching, in serial and
    parallel, and that the results
    combine in serial and parallel.

    The results should be applicable to any kind of data but here it's more
    or less about usenet groups.

    [2023/03/08]

    So I start browsing the Information Retrieval section in Wikipedia and
    more or less get to reading
    Luhn's 1958 "automatic coding of document summaries" or "The Automatic
    Creation of Literature
    Abstracts". Then, what I figure, is that the histogram, is an
    associative array of keys to counts,
    and what I figure is to compute both the common terms, and, the rare
    terms, so that there's both
    "common-weight" and "rare-weight" computed, off of the count of the
    terms, and the count of
    distinct terms, where it is working up that besides catums, or catsums,
    it would result a relational
    algebra of terms in, ..., terms, of counts and densities and these type
    things. This is where, first I
    would figure the catsum would be deterministic before it's at all
    probabilistic, because the goal is
    match-find not match-guess, while still it's to support the less
    deterministic but more opportunistic
    at the same time.

    Then, the "index" is basically like a usual book's index, for each term
    that's not a common term in
    the language but is a common term in the book, what page it's on, here
    that that is a read-out of
    a histogram of the terms to pages. Then, compound terms, basically get
    into grammar, and in terms
    of terms, I don't so much care to parse glossolalia as what result
    mostly well-defined compound terms
    in usual natural languages, for the utility of a dictionary and
    technical dictionaries. Here "pages" are
    both according to common message threads, and also the surround of
    messages in the same time
    period, where a group is a common message thread and a usenet is a
    common message thread.

    (I've had a copy of "the information retrieval book" before, also
    borrowed one "data logic".)

    "Spelling mistakes considered adversarial."

    https://en.wikipedia.org/wiki/Subject_indexing#Indexing_theory

    Then, there's lots to be said for "summary" and "summary in statistic".


    A first usual data structure for efficiency is the binary tree or
    bounding tree. Then, there's
    also what makes for divide-and-conquer or linear speedup.


    About the same time as Luhn's monograph or 1956, there was published a
    little book
    called "Logic and Language", Huppe and Kaminsky. It details how
    according to linguistics
    there are certain usual regular patterns of words after phonemes and
    morphology what
    result then for stems and etymology that then for vocabulary that
    grammar or natural
    language results above. Then there are also gentle introductions to
    logic. It's very readable
    and quite brief.


    [2023/04/29]

    I haven't much been tapping away at this,
    but it's pretty simple to stand up a usenet peer,
    and pretty simple to slurp a copy,
    of the "Big 8" usenet text groups, for example,
    or particularly just for a few.

    [2023/12/22]

    Well, I've been thinking about this, and there are some ideas.

    One is about a system of reputation, the idea being
    New/Old/Off/Bad/Bot/Non,
    basically figuring that reputation is established by action.

    Figuring how to categorize spam, UCE, vice, crime, and call that Bad, then >> gets into basically two editions, with a common backing, Cur (curated)
    and Raw,
    with Old and New in curated, and Off and Bot a filter off that, and Bad
    and Non
    excluded, though in the raw feed. Then there's only to forward what's
    curated,
    or current.

    Here the idea is that New graduates to Old, Non might be a
    false-negative New,
    but is probably a negative Bad or Off, and then Bot is a sort of honor
    system, and
    Old might wander to Off and vice-versa, then that Off and Old can
    vacillate.

    Then for renditions, is basically that the idea is that it's the same
    content
    behind NNTP, with IMAP, then also an HTTP gateway, Atom/RDF feed, ....

    (It's pretty usually text-only but here is MIME.)

    There are various ways to make for posting that's basically for that Old
    can post what they want, and Off, then for something like that New,
    gets an email in reply to their post, that they reply to that, to
    round-trip a post.

    (Also mail-to-news and news-to-mail are pretty usual. Also there are
    notions of humanitarian inputs.)

    Similarly there are the notions above about using certificates and TLS to
    use technology and protocol to solve technology protocol abuse problems.

    For surfacing the items then is about technologies like robots.txt and
    Dublin Core metadata, and similar notions with respect to uniqueness.
    If you have other ideas about this, please chime in.

    Then for having a couple sorts of organizations of both the domain name
    and the URL's as resources, makes for example for sub-domains for groups,
    for example then with certificate conventions in that, then usual sorts of >> URL's that are, you know, URL's, and URN's, then, about URL's, URI's,
    and URN's.

    Luckily it's all quite standardized so quite stock NTTP, IMAP, and HTTP
    browsers,
    and about SMTP and IMAP, and with TLS, make of course a fungible sort of
    system.


    How to pay for it all? At about $500 a year for all text usenet,
    about a day's golf foursome and a few beers can stand up a new Usenet peer. >>
    [2024/01/22]

    Basically thinking about a "backing file format convention".

    The message ID's are universally unique. File-systems support various
    counts and depths
    of sub-directories. The message ID's aren't necessarily opaque
    structurally as file-names.
    So, the first thing is a function that given a message-ID, results a
    message-ID-file-name.

    Then, as it's figured that groups, are, separable, is about how, to,
    either have all the
    messages in one store, or, split it out by groups. Either way the idea
    is to convert the
    message-ID-file-name, to a given depth of directories, also legal in
    file names, so it
    results that the message's get uniformly distributed in sub-directories
    of approximately
    equal count and depth.

    A....D...G <- message-ID

    ABCDEFG <- message-ID-file-name

    /A/B/C/D/E/F/ABCDEFG <- message-ID-directory-path

    So, the idea is that the backing file format convention, basically
    results uniform lookup
    of a file's existence, then about ingestion and constructing a message,
    then, moving
    that directory as a link in the filesystem, so it results atomicity in
    the file system that
    supports that the existence of a message-ID-directory-path is a function
    of message-ID,
    and usual filesystem guarantees.



    About the storage of the files, basically each message is only "header +
    body". Then,
    when the message is served, then it has appended to its header the
    message numbers
    according to the group, "header + numbers + body".

    So, the idea is to store the header and body compressed with deflate,
    then that there's
    a pretty simple implementation of a first-class treatment of deflated
    data, to compute
    the deflated "numbers" on demand, and result that concatenation results
    "header + numbers
    + body". It's figured that clients would either support deflated,
    compressed data natively,
    or, that the server would instead decompress data is compression's not
    supported, then
    figuring that otherwise the data's stored at-rest as compressed. There's
    an idea that the
    entire backing could be stored partially encrypted also, at-rest, but
    that would be special-purpose,

    The usual idea that the backing-file-format-convention, is a physical
    interface for all access,
    and also results that tar'ing that up to a file results a transport file
    also, and that, simply
    the backing-file-formats can be overlaid or make symlinks farms together
    and such.


    There's an idea then to make metadata, of, the, message-date, basically
    to have partitions
    by day, where Jan 1 2020 = Jan 1 1970 - 18262,

    YYYY/MM/DD/A/B/C/D/E/F/ABCDEFG -> symlink to /A/B/C/D/E/F/ABCDEFG/


    This is where, the groups' file, which relate their message-numbers to
    message-ID's, only
    has the message-numbers, vis-a-vis, browsing by date, in terms of,
    taking the intersection
    of message-numbers' message-ID's and time-partitions' message-ID's.


    Above, the idea of the groups file, is that message-ID's have a limit,
    and that, the groups file,
    would have a fixed-size or fixed-length record, with the index and
    message-number being the offset,
    and the record being the message-ID, then its header and body accessed
    as the message-ID-directory-path.

    So, toward working out a BFF convention is to make it possible that file
    operation tools
    like tar and cp and deflate and other usual command line tools, or
    facilities, make it so that
    then while there should be a symlink free approach, also then as to how
    to employ symlinks,
    with regards to usual indexes from axes of access to enumeration.

    As above then I'm wondering to figure out how to make it so, that for
    something like a mailbox format,
    then to have that round-trip from BFF format, but mostly how to make it
    so that any given collection
    of messages, given each has a unique ID, and according to its headers
    its groups and injection date,
    it results an automatic sort of building or rebuilding then the groups
    files.

    Another key sort of thing is the threading. Also, there is to be
    consider the multi-post or cross-post.


    Then, for metadata, is the idea of basically into supporting the
    protocol's overview and wildmat,
    then for the affinity with IMAP, then up into usual notions of
    key-attribute filtering, and as with
    regards to full content search, about a sort of "search file format", or
    indices, again with the goal
    of that being fungible variously, and constructible according to simple
    bounds, and, resulting
    that the goal is to reduce the size of the files at rest, figuring
    mostly the files at rest aren't accessed,
    or when they are, they're just served raw as compressed, because
    messages once authored are static.

    That said, the groups their contents grow over time, and also there is
    for notions of no-archive
    and retention, basically about how to consider that in those use cases,
    to employ symlinks,
    which result natural partitions, then to have usual rotation of
    truncation as deleting a folder,
    invalidating all the symlinks to it, then a usual handler of ignoring
    broken symlinks, or deleting them,
    so that maintenance is simple along the lines of "rm group" or "rm year".

    So, there's some thinking involved to make it so the messages each, have
    their own folders,
    and then parts in those, as above, this is the thinking here along the
    lines of "BFF/SFF",
    then for setting up C10+K servers in front of that for NNTP, IMAP, and a
    simple service
    mechanism for surfacing HTTP, these kinds of things. Then, the idea is
    that metadata
    gets accumulated next to the messages in their folders, then those also
    to be concatenable,
    to result that then for search, that corpuses or corpi are built off
    those intermediate data,
    for usual searches and specialized searches and these kinds things.

    Then, the idea is to make for this BFF/SFF convention, then to start
    gathering "certified corpi"
    of groups over time, making for those then being pretty simply
    distributable like the old
    idea of an mbox mailbox format, with regards to that being one file that
    results the entire thing.

    Then, threads and the message numbers, where threading by message number
    is the

    header + numbers + body

    the numbers part, sort of is for open and closed threads, here though of
    course that threads
    are formally always open, or about composing threads of those as over
    them being partitioned
    in usual reasonable times, for transient threads and long-winded threads
    and recurring threads.



    Then, besides "control" and "junk" and such or relating administration,
    is here for the sort
    of minimal administration that results this NOOBNB curation. This and
    matters of relay
    ingestion and authoring ingestion and ingestion as concatenation of BFF
    files,
    is about these kinds of things.

    [2024/01/22]

    The idea of "NOOBNB curation" seems a reasonable sort of simple-enough
    yet full-enough way to start building a usual common open messaging system, >> with as well the omission of the overall un-wanted and illicit.

    The idea of NOOBNB curation, is that it's like "Noob NB: Nota Bene for
    Noobs",
    with splitting New/Old/Off or "NOO" and Bot/Non/Bad or BNB, so that the
    curation
    delivers NOO, or Nu, while the raw includes be-not-back, BNB.

    So, the idea for New/Old/Off, is that there is Off traffic, but, "caveat
    lector",
    reader be aware, figuring that people can still client-side "kill file"
    the curated feed.

    Then, Bot/Non/Bad, basically includes that Bot would include System Bot,
    and Free Bot,
    sort of with the idea of that if Bots want feed then they get raw, while
    System Bot can
    post metadata of what's Bot/Non/Bad and it gets simply excluded from the
    curated.

    Then, for this it seems the axis of metadata is the Author, about the
    relation of Authors
    to posts. I.e. it's the principal metadata axis of otherwise posts,
    individual messages.

    Here the idea is that generally that once some author's established as
    "Old", then
    they always go into NOO, as either Old or Off, while "New" is the
    establishment
    of this maturity, to at least follow the charter and otherwise for
    take-it-or-leave-it.


    Then, "Non" is basically that "New", according to Author, basically
    either gets accepted,
    or not, according to what must be some "objective standards of
    topicality and etiquette".

    Then "Bad" is pretty much that anybody who results Bad basically gets
    marked Bad.

    Now, it's a temporal thing, and it's possible that attacks would result
    false positives
    and false negatives, a.k.a. Type I and Type II errors. There's a general
    idea to attenuate
    "Off" and even "Bad", figuring "Off" reverts to "Old" and "Bad" reverts
    to "Non", according
    to Author, or for example "Injection Site".


    Then, for the posting side, there are some things involved. There are
    legal things involved,
    illicit content or contraband, have some safe harbor provisions in usual
    first-world countries,
    vis-a-vis, for example, the copyright claim. Responsiveness to copyright
    claims, would basically
    be marking spammers of warez as Bad, and not including them in the
    curated, that being figured
    the extent of responsibility.

    There's otherwise a usual good-faith expectation of fair-use,
    intellectual-property wise.


    Otherwise then it's that "Usenet the protocol relies on email identity".
    So, the idea to implement
    that posts round-trip through email, is considered the bar.

    Here then furthermore is considered how to make a general sort of
    Injection-Site algorithm,
    in terms of peering or peerages, and compeering, as with respect to
    Sites, their policies, and then
    here with respect to the dual outfeeds, curated and raw, figuring
    curated is good-faith and raw,
    includes garbage, or for example to just pipe raw to /dev/null, and for
    automatically curating in-feed.

    The idea is to support establishment of association of an e-mail
    identity, so that a usual sort
    of general-purpose responsible algorithm, can work up various factors
    authentication, in
    the usual notions of authentication AuthN and authorization AuthZ, with
    respect to
    login and "posting allowed", or as via delegation in what's called
    Federated identity,
    that resulting being the responsibility of peers, their hosts, and so on.

    Then, about that for humanitarian and free-press sorts reasons,
    "anonymity", well first
    off there's anonymity is not part of the charter, and indeed the charter
    says to use
    your real name and your real e-mail address. I.e., anonymity on the one
    has a reasonable
    sort of misdirection from miscreants attacking anybody, on the other
    hand those same
    sorts miscreants abuse anonymity, so, here it's basically the idea that
    "NOOBNB" is a very
    brief system of reputation as of the vouched identity of an author by
    email address,
    or the opaque value that results gets posted in the sender field by
    whatever peer injects whatever.

    How then to automatically characterize spam and the illicit is sort of a
    thing,
    while that the off-topic but otherwise according to charter including
    the spirit
    of the charter as free press, with anonymity to protect while not
    anonymity to attack,
    these are the kinds of things that help make for that "NOOBNB curation",
    is to result
    a sort of addendum to Usenet charter, that results though same as the
    old Usenet charter.

    Characterization could include for example "MIME banned", "glyph ranges
    banned",
    "subjects banned", "injection sites banned", these being open then so
    that legitimate
    posters run not afoul, that while bad actors could adapt, then they
    would get funneled
    into "automatic semantic text characterization bans".

    The idea then is that responsible injection sites will have measures in
    place to prevent
    "Non" authors from becoming "New" authors, those maturing, "Old" and
    "Off" post freely,
    that among "Bot" is "System Bot" and "Tag Bot", then that according to
    algorithms in
    data in the raw Bot feed, is established relations that attenuate to Bad
    and Non,
    so that it's a self-describing sort of data set, and peers pick up
    either or both.


    Then the other key notion is to reflect an ID generator, so that, every
    post, gets
    exactly and uniquely, one ID, identifier, a global and universally
    unique identifer.
    This was addressed as above and it's a usual notion of a common
    facility, UUID dispenser.
    The idea of identifying those over times, is for that over the corpus,
    is established
    a sort of digit-by-digit stamp generator, to check for IDs over the
    entire corpus,
    or here a compact and efficient representation of same, then for issuing
    ranges,
    for usual expectations of the order of sites on the order of posters the
    order of posts.

    Luckily it's sort of already the case that all the messages already do
    have unique ID's.

    "Usenet: it has a charter."

    [2024/01/23]

    About build-time and run-time, here the idea is to make some specifications >> what reflect the BFF/SFF filesystem and file-format conventions, then to
    make it so that algorithms and servers run on those, as then with respect
    to reference implementations, and specification conformance, of the client >> protocols, and the server and service protocols, what are all pretty much
    standardized, inside and outside, usual sorts Internet text protocols,
    and usual sorts data facilities.

    I figure the usual sort of milieu these days for common, open systems,
    is something like "Git Web", or otherwise in terms of git hosting,
    in terms of that it's an idea that setting up a git server, makes it
    pretty simple to clone code and so on. I'm most familiar with this
    tooling compared to RCS, CVS, svn, hg, tla, arch, or other sorts usual
    "source control", systems. Most people might know: git.


    So, the idea is to make reference implementations in various editions of
    tooling,
    that result the establishment of the common backing, this filesystem
    convention
    or BFF the backing file-format, best friends forever, then basically
    about making
    for their being cataloged archives of groups their messages in
    time-series data,
    then to simply start a Usenet archive by concatenating those together as
    overlaying
    them, then as to generating the article numbers, as where the article
    numbers are
    specific to the installation, where there are globally unique IDs of
    message-IDs,
    then article numbers indicate the server's handles to messages by group.

    The sources of reference implementations of services and algorithms are
    sources
    and go in source control, but the notion of archives fungibly in BFF files, >> represent static assets for where a given corpus of a month's messages
    basically represent the entirety, or what "25 million messages" is,
    vis-a-vis
    low-volume groups like Big 8 text Usenet, and here curated and raw feeds
    after NOOBNB.

    So, there's a general idea to surface the archive files, those being
    fungible anyways,
    then some bootstrap scripts in terms of data-install and code-install,
    for config/code/data,
    so that anybody can rent a node, clone these scripts, download a year's
    Usenet,
    run some scripts if to setup SFF files, then launch a Usenet service.

    So, that is about common sources and provisioning of code and data.

    The compeering then is the other idea about the usual idea of pull and
    push feeds,
    and suck feeds, where NNTP is mostly push feeds, and compeers are
    expected to
    be online and accept CHECK, IHAVE, and TAKETHIS, and these kinds
    use-cases of
    ingestion, of the propagation of posts.

    There's a notion of a sort of compeering topology, basically in terms of
    "the lot of us
    will hire each some introductory resources, and use them up, passing
    around the routing
    according to DNS, what serves making ingress and egress, from a named
    Internet protocol port".

    https://datatracker.ietf.org/doc/html/rfc3977
    https://datatracker.ietf.org/doc/html/rfc4644


    (Looking at WILDMAT, it's cool that a sort of this yes/no/maybe or
    sure/no/yes, which
    is a sort of very composable filtering. I sort of invented one of those
    for rich front-end
    data tables since looking at the specs here, "filterPredicate",
    composable, front-end/back-end,
    yes/no/maybe.)

    I.e., NNTP has a static (network) topology, expecting peers to be online
    usually, while here
    the idea is that "compeering", will include push and pull, about the
    "X-RETRANSFER-TO",
    and along the lines of the Message Transfer Agent, queuing messages for
    opportunistic
    delivery, and in-line with the notions of e-mail traditionally and the
    functions of DNS and
    the Internet protocols.

    https://datatracker.ietf.org/doc/html/rfc4642
    https://datatracker.ietf.org/doc/html/rfc1036
    https://datatracker.ietf.org/doc/html/rfc2980
    https://datatracker.ietf.org/doc/html/rfc4644
    https://datatracker.ietf.org/doc/html/rfc4643

    This idea of compeering sort of results that as peers come online, then
    to start
    in the time-series data of the last transmission, then to launch a push
    feed
    up to currency. It's similar with that simply being periodic in
    real-time (clock time),
    or message-driven, pushing messages as they arrive.

    The message feeds in-feeds and out-feeds reflect sorts of system accounts
    or peering agreements, then for the compeering to establish what are the
    topologies, then for something like a message transfer agent, to fill a
    basket
    with the contents, for caches or a sort of lock-box approach, as well
    aligned
    with SMTP, POP3, IMAP, and other Internet text protocols of messaging.

    The idea is to implement some use-cases of compeering, with e-mail,
    news2mail and mail2news, as the Internet protocols have high affinity
    for each other, and are widely implemented.

    So, besides the runtime (code and data, config), then is also involved
    the infrastructure,
    resources of the runtime and resources of the networking. It's pretty
    simple to write
    code and not very difficult to get data, then infrastructure gets into
    cost. This was
    described above as the idea of "business-in-a-box".

    Well, tapping away at this, ....


    [ page break 6 ]

    [2024/01/24]

    Yeah, when there's a single point of ingress, is pretty much simpler than
    when there's federated ingress, or here NNTP peerage, vis-a-vis a site's
    own postings.

    Here it's uncomplicated when all messages get propagated to all peers,
    with the idea that NOOBNB pattern is going to ingest raw and result curated >> (curated, cured, cur).


    How to figure out for each incoming item, whether to have System Tag Bot
    result appending another item marking it, or, just storing a stub for the
    item as excluded, gets into "deep inspection", or as related to the things. >>
    Because Usenet is already an ongoing concern, it's sort of easy to identify >> old posters already, then about the issue of handling New/Non, and as
    with regards to identifying Bad, as what it results Cur is New/Old/Off
    and Raw includes Bot/Non/Bad, or rather that it excludes Bot/Non/Bad,
    with regards to whether the purpose of Bot is to propagate Bans.


    It's sort of expected that the Author field makes for a given Author,
    but some posters for example mutilate the e-mail address or result
    something non-unique. Disambiguating those, then, is for the idea
    that either the full contents of the Author field make a thing or that
    otherwise Authors would need to make some way to disambiguate Sender.

    About propagation and stubbing, the idea is that propagation should
    generally result, then that presence of articles or stubs either way
    results the relevant response code, as with regards to either
    "propagating raw including Non and Bad" or just "propagating Raw
    only Non-Tag and Bad-Tag Tag-Bot, generated messages", basically
    with the idea of semantics of "control" and "junk", or "just ignore it".


    The use case of lots of users of Usenet isn't a copy of Usenet, just
    a few relevant groups. Others for example appreciate all the _belles
    lettres_
    of text, and nothing from binaries. Lots of users of Usenet have it
    as mostly a suck-feed of warez and vice. Here I don't much care about
    except _belles lettres_.


    So, here NOOBNB is a sort of white-list approach, because Authors is
    much less than messages, to relate incoming messages, to Authors,
    per group, here that ingestion is otherwise constant-rate for assigning
    numbers in the groups a message is in, then as with regards to threading
    and bucketing, about how to result these sorts ideas sort of building up
    from "the utility of bitmaps" to this "patterns in range" and "region
    calculus",
    here though what's to result partially digested intermediate results for an >> overall concatenation strategy then for selection and analysis,
    all entirely write-once-read-many.

    It's figured that Authors will write and somebody will eventually read
    them,
    with regards to that readings and replies result the Author born as New
    and then maturing to Old, what results after Author infancy, to result
    a usual sort of idea that Authors that read Bad are likely enough Bad
    themselves.

    I.e., there's a sort of hysteresis to arrive at born as New, in a group,
    then a sort of gentle infancy to result Old, or Off, in a group, as
    with regards to the purgatory of Non or banishment of Bad.

    happy case:
    Non -> New -> Old (good)
    Non -> Bad (bad)

    Old -> Off
    Off -> Old


    The idea's that nobody's a moderator, but anybody's a reviewer,
    and correspondent, then that correspondents to spam or Bad get
    the storage of a signed quantity, about the judgment, of what
    is spam, in the error modes.

    error modes:
    Non -> false New
    Non -> false not Bad


    New -> Bad
    Old -> Bad

    (There's that reviewers and correspondents
    Old <-> Old
    Off <-> Old
    Old <-> Off
    Off <-> Off
    result those are all same O <-> O.)

    The idea's that nobody's a moderator, and furthermore then all
    the rules of the ignorance of Non and banishment of Bad,
    then though are as how to arrive at that Non's, get a chance
    to be reviewed by Old/Off and New, with respect to New and New
    resulting also the conditions of creation, of a group, vis-a-vis,
    the conditions of continuity, of a group.


    I.e. the relations should so arise that creating a group and posting
    to it, should result "Originator" or a sort of class of Old, about these
    ideas of the best sort of reasonable performance and long-lived scalability >> and horizontal scalability, that results interpreting any usual sort of
    messaging with message-ID's and authors, in a reference algorithm
    and error-detection and error-correction, "NOOBNB".

    There's an idea that Bot replies to new posters, "the Nota Bene",
    but, another that Bot replies to Non and Bad, and another that
    there's none of that at all, or not guaranteed.


    Then, the idea is that this is matters of convention and site policy,
    what it results exactly the same as a conformant Usenet peer,
    in "NOOBNB compeering: slightly less crap".


    Then, getting into relating readings (reviews) and correspondence
    as a matter of site policy in readings or demonstration in correspondence, >> results largely correspondence discriminates Old from Bad, and New from
    Non.

    Then as "un-moderated" there's still basically "site-policy",
    basically in layers that result "un-abuse", "dis-abuse".

    I.e. the disabusement of abuse, is of this Old <-> Off for the venial,
    and about the ceremony of infancy via some kind of interaction
    or the author's own origination, about gating New, then figuring
    that New matures to Old and then the compute cost is on News,
    that long-running conversations result constants, called stability.

    Well I'm curious your opinion of this sort of approach, it's basically
    as of
    defining conventions of common messaging, what result a simplest
    and most-egalitarian common resource of correspondents in _belles lettres_. >>
    [2024/01/24]

    Then it seems the idea is to have _three_ editions,

    Cur: current, curated, New/Old/Off
    Pur: purgatory, Non/New/Old/Off
    Raw: raw, Non/New/Old/Off/Bot/Bad

    Then, the idea for bot, seems to be for system, to have delegations,
    of Bot to Old, with respect to otherwise usually the actions of Old,
    to indicate correspondence.

    Then, with regards to review, it would sort of depend on some Old
    or Off authors reviewing Pur, with regards to review and/or correspondence, >> what results graduating Non to New, then that it results that
    there's exactly a sort of usual write-once-read-many, common
    backing store well-defined by presence in access (according to filesystem). >>


    Then, for the groups files, it's figured there's the main message-Id's,
    as with respect to cur/pur/raw, then with regards to author's on the
    groups, presence in the authors files indicating Old, then with regards
    to graduation Non to New and New to Old.

    Keeping things simple, then the idea is to make it so that usual New
    have a way to graduate from Non, where there is or isn't much traffic
    or is or isn't much attention paid to Pur.

    The idea is that newbies log on to Pur, then post there on their own
    or in replies to New/Old/Off, that thus far this is entirely of a monadic
    or pure function the routine, which is thusly compile-able and
    parallelizable,
    and about variables in effect, what result site policy, and error modes.


    There's an idea that Non's could reply to their own posts,
    as to eventually those graduating altogether, or for example
    just that posting is allowed, to Pur, until marked either New or Bad.


    The ratio of Bad+Non+Bot to Old+Off+New, basically has that it's figured
    that due to attacks like the one currently underway from Google Groups,
    would be non-zero. The idea then is whether to grow the groups file,
    in the sequence of all message-IDs, and whether to maintain one edition
    of the groups file, and ever modify it in place, that here the goal is
    instead
    growing files of write-once-read-many, and because propagation is
    permanent.

    Raw >= Pur >= Cur

    I.e., every message-id gets a line in the raw feed, that there is one,
    then as
    with regards to whether the line has reserved characters, where otherwise
    it's a fixed-length record up above the maximum length of message-id,
    the line, of the groups file, the index of its message-numbers.


    See, the idea here is a sort of reference implementation, and a
    normative implementation,
    in what are fungible and well-defined resources, here files, with
    reasonable performance
    and horizontal scale-ability and long-time performance with minimal or
    monotone maintenance.

    Then the files are sort of defined as either write-once and final or
    write-once and growing,
    given that pretty much unbounded file resources result a quite most
    usual runtime.



    Don't they already have one of these somewhere?


    [2024/01/26]

    I suppose the idea is to have that Noobs post to alt.test, then as with
    regards to
    various forms to follow, like:

    I read the charter
    I demonstrated knowledge of understanding the charter's definitions and
    intent
    I intend to follow the charter

    How I do or don't is my own business, how others do or don't is their
    own business

    I can see the exclusion rules
    I understand not to post against the exclusion rules
    I understand that the exclusion rules are applied unconditionally to all

    ... is basically for a literacy test and an etiquette assertion.


    Basically making for shepherding Noobs through alt.test, or that people
    who post
    in alt.test aren't Noobs, yet still I'm not quite sure how to make it
    for usual first-time
    posters, how to get them out of Purgatory to New. (Or ban them to Bad.)

    This is where federated ingestion basically will have that in-feeds are
    either

    these posts are good,
    these posts are mixed,
    these posts are bad,

    with regards then to putting them variously in Cur, Pur, Raw.

    Then, there's sort exclusions and bans, with regards to posts, and authors. >> This is that posts are omitted by exclusion, authors' posts are omitted
    by ban.

    Then, trying to associate all the author's of a mega-nym, in this case
    the Google's spam flood to make a barrier-to-entry of having open
    communications,
    is basically attributing those as a class those authors to a banned
    mega-nym.

    Yet, then there is the use case of identity fraud's abuses, disabusing
    an innocent dupe,
    where logins basically got hacked or the path to return to innocence.


    This sort of results a yes/no/maybe for authors, sort of like:

    yes, it's a known author, it's unlikely they are really bad
    (... these likely frauds are Non's?)

    no, it's a known excluded post, open rules
    no, it's a known excluded author, criminal or a-topical solicitation
    no, it's a new excluded author, associated with an abstract criminal or
    a-topical solicitation

    maybe (yes), no reason why not

    that a "rules engine" is highly efficient deriving decisions yes/no/maybe, >> in both execution and maintenance of the rules (data plane / control
    plane).

    Groups like sci.math have a very high bar to participation, literacy
    in mostly English and the language of mathematics. Groups have
    a very low bar to pollution, all else.

    So, figuring out a common "topicality standard", here is the idea to
    associate
    concepts with charter with topicality, then for of course a very loose and >> egalitarian approach to participation, otherwise free.

    (Message integrity, irrepudiability, free expression, free press, free
    speech,
    not inconsequence, nor the untrammeled.)


    [2024/01/28]

    Well, "what is spam", then, I suppose sort of follows from the
    "spam is a word coined on Usenet for unsolicated a-topical posts",
    then the ideas about how to find spam, basically make for that
    there are some ways to identify these things.

    The ideas of
    cohort: a group, a thread, a poster
    cliques: a group, posts that reply to each other

    Then
    content: words and such
    clicks: links

    Here the idea is to categorize content according to cohorts and cliques,
    and content and clicks,

    It's figured that all spam has clicks in it, then though that of course
    clicks
    are the greatest sort of thing for hypertext, with regards to

    duplicate links
    duplicate domains

    and these sorts of things.

    The idea is that it costs resources to categorize content, is according
    to the content, or the original idea that "spam must be identified by
    its subject header alone", vis-a-vis the maintenance of related data,
    and the indicator of matching various aspects of relations in data.

    So, clicks seem the first way to identify spam, basically that a histogram >> of links by their domain and path, results duplicates are spam, vis-a-vis, >> that clicks in a poster's sig or repeated many times in a long thread,
    are not.

    In this sense there's that posts are collections of their context,
    about how to make an algorithm in best effort to relate context
    to the original posts, usually according to threading.

    The idea here is that Non's can be excluded when first of all they
    have links, then for figuring that each group has usual sites that
    aren't spam, like their youtube links or their doc repo links or their
    wiki links or their arxiv or sep or otherwise, usual sorts good links,
    while that mostly it's the multiplicity of links that represent a spam
    attack,
    then just to leave all those in Purgatory.

    It's figured then that good posters when they reach Old, pretty much
    are past spamming, then about that posters are New for quite a while,
    and have some readings or otherwise mature into Old, about that
    simply Old and Off posters posts go right through, New posters posts
    go right through, then to go about categorizing for spam, excluding spam.


    I.e., the "what is spam", predicate, is to be an open-rules sort of
    composition,
    that basically makes it so that spamverts would be ineffective because
    spammers exploit lazy and if their links don't go through, get nothing.

    Then, there's still "what is spam" with regards to just link-less spam,
    about that mostly it would be about "repeated junk", that "spam is not
    unique".
    This is the usual notion of "signal to noise", basically finding whether
    it's just noise in Purgatory, that signal in Purgatory is a good sign of
    New.

    So, "what is spam" is sort of "what is not noise". Again, the goal is
    open-rules
    normative algorithms that operate on write-once-read-many graduated feeds, >> what result that the Usenet compeering, curates its federated ingress, then >> as for feeding its out-feed, with regards to other Usenet compeers
    following
    the same algorithm, then would get the same results.

    Then, the file-store might still have copies of all the spams, with the
    idea then
    that it's truncatable, because spam-campaigns are not long-running for
    archival,
    then to drop the partitions of Purgatory and Raw, according to retention.
    This then also is for fishing out what are Type I / Type II errors,
    about promoting
    from Non to New or also about the banishment of Non to Bad, or, Off to Bad. >> I.e., there's not so much "cancel", yet there's still for "no-archive",
    about how
    to make it open and normative how these kinds of things are.

    Luckily the availability of unbounded in size filesystems is pretty
    large these days,
    and, implementing things write-once-read-many, makes for pretty simple
    routines
    that make maintenance.


    It's like "whuh how do I monetize that?" and it's like "you don't", and
    "you figure
    that people will buy into free speech, free association, and free press".
    You can make your own front-end and decorate it with what spam you want,
    it just won't get federated back in the ingress of this Usenet Compeerage. >>
    Then it's like "well I want to only see Archimedes Plutonium and his
    co-horts"
    then there's the idea that there's to be generated some files with
    relations,
    the summaries and histrograms, then for those to be according to
    time-series
    buckets, making tractable sorts metadata partially digested, then for
    making
    digests of those, again according to normative algorithms with well-defined >> access patternry and run-times, according to here pretty a hierarchical
    file-system.
    Again it's sort of a front-end thing, with surfacing either the back-end
    files
    or the summaries and digests, for making search tractable in many
    dimensions.

    So, for the cohort, seems for sort of accumulated acceptance and rejection, >> about accepters and rejectors and the formal language of hierarchical data >> that's established by its presence and maintenance, about "what is spam"
    according to the entire cohort, and cliques, then with regards to Old/Off
    and spam or Non, with regards to spam and Bad.

    So, "what is spam" is basically that whatever results excluded was spam.


    [ page break 7 ]


    [2024/02/03]


    Well, with the great spam-walling of 2024 well underway, it's a bit too
    late to setup
    very easy personal Internet, but, it's still pretty simple, the Internet
    text protocols,
    and implementing standards-based network-interoperable systems, and
    there are
    still even some places where you can plug into the network and run your
    own code.

    So anyways the problem with the Internet today is that anything that's
    public facing
    can expect to get mostly not-want-traffic, where the general idea is to
    only get want-traffic.

    So, it looks like that any sort of public facing port, where TCP/IP
    sockets for the connection-oriented
    protocols like here the Internet protocols are basically as for the
    concept that the two participants
    in a client-server or two-way communication are each "host" and "port",
    then as for protocol, and
    as with respect to binding of the ports and so on or sockets or about
    the 7-layer ISO model of
    networking abstraction, here it's hosts and ports or what result IP
    addresses and packets
    destined for ports, those multiplexed and reassembled by the TCP/IP
    protocols' stacks on
    the usual commodity hardware's operating systems, otherwise as with
    respect to network
    devices, their addresses as in accords with the network topology's
    connection and routing
    logic, and that otherwise a connection-oriented protocol is in terms of
    listening and ephemeral
    ports, with respect to the connection-oriented protocols, theirs sockets
    or Address Family UNIX
    sockets, and, packets and the TCP/IP protocol semantics of the NICs and
    their UARTS, as with
    regards to usual intrusive middleware like PPP, NAT, BGP, and other
    stuff in the way of IP, IPv4, and IPv6.


    Thus, for implementing a server, is basically the idea then that as
    simply accepting connections,
    then is to implement for the framework, that it has at least enough
    knowledge of the semantics
    of TCP/IP, and the origin of requests, then as with regards to
    implementing a sort of "Load Shed"
    or "Load Hold", where Load Shedding is to dump not-want-traffic and Load
    Holding is to feed
    it very small packets at very infrequent intervals within socket
    timeouts, while dropping immediately
    anything it sends and using absolutely minimal resources otherwise in
    the TCP/IP stack, to basically
    give unwanted traffic a connection that never completes, as a sort of
    passive-aggressive response
    to unwanted traffic. "This light never changes."


    So, for Linux it's sockets and Windows it's like WSASocket and Java it's
    java.nio.channels.SocketChannel,
    about that the socket basically has responsibilities for happy-case
    want-traffic, and enemy-case not-want-traffic.


    Then, where in my general design for Internet protocol network
    interfaces, what I have filled in
    here is basically this sort of

    Reader -> Scanner -> Executor -> Printer -> Writer

    where the notions of the "home office equipment" like the multi-function
    device has here that in
    metaphor it basically considers the throughput as like a combination
    scanner/printer fax-machine,
    then the idea is that there needs to be some sort of protection mostly
    on the front, basically that
    the "Hopper" then has about the infeed and outfeed Hoppers, or with the
    Stamper at the end,
    figuring the Hopper does Shed/Hold, or Shed/Fold/Hold, while, the
    Stamper does the encryption
    and compression, about that Encryption and Compression are simply
    regular concerns what result
    plain Internet protocol text (and, binary) commands in the middle.

    Hopper -> Reader -> Scanner -> Executor -> Printer -> Writer

    Then, for Internet protocols like, SMTP, NNTP, IMAP, HTTP, usual sorts
    request/response client/server
    protocols, then I suppose I should wonder about multiplexing
    connections, though, HTTP/2 really
    is just about multiple calls with pretty much the same session, and
    getting into the affinity of sessions,
    about client/server protocols, logins, requests/responses, and sessions,
    here with the idea of
    pretty much implementing a machine, for implementing protocol, for the
    half-dozen usual messaging
    and web-service protocols mentioned above, and a complement of their
    usual options,
    implementing a sort of usual process designed to be exposed on its own
    port, resulting a
    sort shatter-proof protocol implementation, figuring the Internet is an
    ugly place and
    the Hopper is regularly clearing the shit out of the front.

    So anyways, then about how to go about implementing a want-traffic feed
    is basically the
    white-list approach, from the notion that there is want and not want,
    but not to be racist,
    basically a want-list approach, and a drop-list. The idea is that you
    expect to get email from
    people you've sent email, or their domain, and then, sometimes when you
    plan to expect an
    email, then the idea is to just maintain a window and put in terms what
    you expect to get or
    expect to have recently gotten, then to fish those out from all the
    trash, basically over time
    to put in the matches for the account, that messages to the account,
    given matches surface
    the messages, otherwise pretty much just maintaining a rotating queue of
    junk that dumps
    off the junk when it rotates, while basically having a copy of the
    incoming junk, for as
    necessary looking through the junk for the valuable message.


    The Internet protocols then for what they are the messaging level or
    user land, of the user-agents,
    have a great affinity and common implementation.

    SMTP -> POP|IMAP

    IMAP -> NNTP

    NNTP
    HTTP -> NNTP
    HTTP -> IMAP -> NNTP

    SMTP -> NNTP
    NNTP -> SMTP


    I'm really quite old-fashioned, and sort of rely on natural written
    language, while, still, there's
    the idea that messages are arbitrarily large and of arbitrary format and
    of arbitrary volume
    over an arbitrary amount of time, or 'unbounded' if '-trary' sounds too
    much like 'betrayedly',
    with the notion that there's basically for small storage and large
    storage, and small buffers
    and large buffers, and bounds, called quota or limits, so to result that
    usual functional message
    passing systems among small groups of people using modest amounts of
    resources can distance
    themselves from absolute buffoon's HDTV'ing themselves picking their nose. >>
    So, back to the Hopper, or Bouncer, then the idea is that everything
    gets in an input queue,
    because, spam-walls can't necessarily be depended on to let in the
    want-traffic. Then the
    want-list (guest-list) is used to bring those in to sort of again what
    results this, "NOOBNB",
    layout, so it sort of results again a common sort of "NOOBNB BFF/SFF",
    layout, that it results
    the layout can be serialized and tore down and set back up and commenced
    same, serialized.

    Then, this sort of "yes/no/maybe" (sure/no/yes, "wildmat"), has the idea
    of that still there
    can be consulted any sorts accepters/rejectors, and it builds a sort of
    easy way to make
    for the implementation, that it can result an infeed and conformant
    agent, on the network,
    while both employing opt-in sort spam-wall baggage, or, just winging it
    and picking ham deliberately.

    In this manner NOOBNB is sort of settling into the idea of the physical
    layout, then for the
    idea of this Load: Roll/Fold/Shed/Hold, is for sorts policies of
    "expect happy case", "expect
    usual case", "forget about it", and "let them think about it".

    The idea here is sort of to design modes of the implementation of the
    protocols, in
    simple and easy-to-remember terms like "NOOBNB", "BFF/SFF",
    "Roll/Fold/Shed/Hold",
    what results pragmatic and usual happy-case Internet protocols, on an
    Internet full
    of fat-cats spam-walling each other, getting in the way of the ham.
    (That "want" is ham,
    and "not-want" is spam.) "Ham is not spam, spam is spiced canned ham."


    Then, after the Internet protocols sitting behind a port on a host with
    an address,
    and that the address is static or dynamic in the usual sense, but that
    every host has one,
    vis-a-vis networks and routing, then the next thing to figure out is
    DNS, the name of
    the host, with respect to the overall infrastructure of the
    implementation of agents,
    in the protocols, on the network, in the world.

    Then, I don't know too much about DNS, as with respect to that in the
    old days it was sort
    of easy to register in DNS, that these days becoming a registrar is
    pretty involved, so after
    hiring some CPU+RAM+DISK+NET sitting on a single port (then for its
    ephemeral connections
    as up above that, but ports entirely in the protocol), with an address,
    is how to get traffic
    pointed at the address, by surfacing its address in DNS, or, just making
    an intermediary service
    for the discovery of addresses and ports and configuring one's own DNS
    resolver, but here
    of course to keep things simple for publicly-facing services that are
    good actors on the network
    and in Internet protocols.

    So I don't know too much about DNS, and it deserves some more study.
    Basically the DNS resolver
    algorithm makes lookups into a file called "the DNS file" and thusly a
    DNS resolver results
    addresses or lookup hosts for addresses and sorts of DNS records, like
    the "Mail Exchanger" record,
    or "the A record", "the CNAME", "various text attributes", "various
    special purpose attributes",
    then that DNS resolvers will mostly look those up to point their proxies
    they insert to it,
    then present those as addresses at the DNS resolver. (Like I said, the
    Internet protocols
    are pretty simple.)

    So, for service discovery pretty much, it looks like the DNS
    "authoritative name server",
    basically is to be designed for the idea that there are two user-agents
    that want to connect,
    over the Internet, and they're happy, then anything else that connects,
    is usual, so there's
    basically the idea that the authoritative name server, is to work itself
    up in the DNS protocols,
    so it results that anybody using the addresses of its names will have
    found itself with some
    reverse lookups or something like that, helping meet in the middle.

    https://en.wikipedia.org/wiki/Domain_Name_System

    RR Resource Records
    SOA Start of Authority
    A, AAAA IP addresses
    MX, Mail Exchanger
    NS, Name Server
    PTR, Reverse DNS Lookups
    CNAME, domain name aliases

    RP Responsible Person
    DNSSEC
    TXT ...

    ("Unsolicited email"? You mean lawyers and whores won't even touch them?)

    So, DNS runs over both UDP and TCP, so, there's for making that the Name
    Server,
    is basically that anybody who comes looking for a domain, it should
    result that
    then there's the high-availability Name Server, special-purpose for
    managing
    address resolution, and as within the context of name cache-ing, with
    regards
    to personal Internet services designed to run reliably and correctly in
    a more-or-less
    very modest and ad-hoc fashion. (Of primary importance of any Internet
    protocol
    implementation is to remain a good actor on the network, of course among
    other
    important things like protecting the users the agents their persons.)

    https://en.wikipedia.org/wiki/BIND

    "BIND 9 is intended to be fully compliant with the IETF DNS standards
    and draft standards."

    https://datatracker.ietf.org/wg/dnsop/documents/

    Here the point seems to be to make it mostly so that response fit in a
    single
    user datagram or packet, with regards to UDP implementation, while TCP
    implementation is according to this sort of "HRSEPW" throughput model.

    I.e. mostly the role here is for personal Internet services, not
    surfacing a
    vended layer of a copy of the Internet for a wide proxy all snuffling
    the host.
    (Though, that has also its role, for example creating wide and deep traffic >> sniffing, and for example buddy-checking equivalent views of the network,
    twisting up TLS exercises and such. If you've read the manuals, ....)


    Lots of the DNS standards these days are designed to aid the giants,
    from clobbering each other, here the goal mostly is effective
    industrious ants,
    effective industrious and idealistic ants, dedicated to their gents.


    So, "dnsops" is way too much specifications to worry about, instead just
    reading through those to arrive at what's functionally correct,
    and peels away to be correct backwards.

    https://datatracker.ietf.org/doc/draft-ietf-dnsop-rfc8499bis/

    "The Domain Name System (DNS) is defined in literally dozens of
    different RFCs."

    Wow, imagine the reading, ....

    "This document updates RFC 2308 by clarifying the definitions of
    "forwarder" and "QNAME"."


    "In this document, the words "byte" and "octet" are used interchangably. " >>

    "Any path of a directed acyclic graph can be
    represented by a domain name consisting of the labels of its
    nodes, ordered by decreasing distance from the root(s) (which is
    the normal convention within the DNS, including this document)."

    The goal seems implementation of a Name Server with quite correct cache-ing >> and currency semantics, TTLs, and with regards to particularly the Mail
    Exchanger,
    reflecting on a usual case of mostly receiving in a spam-filled
    spam-walled world,
    while occasionally sending or posting in a modest and personal fashion,
    while
    in accords with what protocols, result well-received ham.

    "The header of a DNS message is its first 12 octets."

    "There is no formal definition of "DNS server", but RFCs generally
    assume that it is an Internet server that listens for queries and
    sends responses using the DNS protocol defined in [RFC1035] and its
    successors."

    So, it seems that for these sorts of personal Internet services, then
    the idea
    is that a DNS Name Server is the sort of long-running and highly-available >> thing to provision, with regards to it being exceedingly small and fast,
    and brief in implementation, then as with regards to it tenanting the
    lookups
    for the various and varying, running on-demand or under-expectations.
    (Eg, with the sentinel pattern or accepting a very small amount of traffic >> while starting up a larger dedicated handler, or making for the sort of
    sentinel-to-wakeup or wakeup-on-service pattern.)

    https://en.wikipedia.org/wiki/DNS_Certification_Authority_Authorization
    https://en.wikipedia.org/wiki/Incident_Object_Description_Exchange_Format


    Then it looks like I'm supposed to implement Session Initiation Protocol,
    and have it do service discovery and relation or Dynamic DNS, but I sort of >> despise Session Initiation Protocol as it's so abused and twisted, yet,
    there's
    some idea to make a localhost server that fronts personal Internet agents
    that could drive off either SIP or DDNS, vis-a-vis starting up the
    agents on demand,
    as with respect to running the agents essentially locally and making
    peer-to-peer.

    https://en.wikipedia.org/wiki/Zero-configuration_networking#DNS-based_service_discovery


    But, it's simplest to just have a static IP and then run the agents as
    an MTA,
    here given that the resources are so cheap that personal Internet agents
    is economical,
    or as where anything resolves to a host and a well-known port, to
    virtualize that
    to well known ports at an address.

    PIA: in the interests of PII.

    [2024/02/08]

    So, if you know all about old-fashioned
    Internet protocols like DNS, then NNTP,
    IMAP, SMTP, HTTP, and so on, then where
    it's at is figuring out these various sorts
    conventions then to result a sort-of, the
    sensible, fungible, and tractable, conventions
    of the data structures and algorithms, in
    the protocols, what result keeping things
    simple and standing up a usual Internet
    messaging agentry.


    BFF: backing-file formats, "Best friends forever"

    Message files
    Group files

    Thread link files
    Date link files

    SFF: search-file formats, "partially digested metadata"



    NOOBNB: Noob Nota Bene: Cur/Pur/Raw

    Load Roll/Fold/Shed/Hold: throughput/offput



    Then, the idea is to make it so that by constructing
    the files or a logical/physical sort of distinction,
    that then results a neat tape archive then that
    those can just be laid down together and result
    a corpus, or filtered on down and result a corpus,
    where the existence standard is sort of called "mailbox"
    or "mbox" format, with the idea muchly of
    "converting mbox to BFF".


    Then, for enabling search, basically the idea or a
    design principle of the FF is that they're concatenable
    or just overlaid and all write-once-read-many, then
    with regards to things like merges, which also should
    result as some sort of algorithm in tools, what results,
    that of course usual sorts tools like textutils, working
    on these files, would make it so that usual extant tools,
    are native on the files.

    So for metadata, the idea is that there are standard
    metadata attributes like the closed categories of
    headers and so on, where the primary attributes sort
    of look like

    message-id
    author

    delivery-path
    delivery-metadata (account, GUID, ...)

    destinations

    subject
    size
    content

    hash-raw-id <- after message-id
    hash-invariant-id <- after removing inconstants
    hash-uncoded-id <- after uncoding out to full

    Because messages are supposed to be unique,
    there's an idea to sort of detect differences.


    The idea is to sort of implement NNTP's OVERVIEW
    and WILDMAT, then there's IMAP, figuring that the
    first goals of SFF is to implement the normative
    commands, then with regards to implementations,
    basically working up for HTTP SEARCH, a sort of
    normative representation of messages, groups,
    threads, and so on, sort of what results a neat sort
    standard system for all sorts purposes these, "posts".


    Anybody know any "normative RFC email's in HTTP"?
    Here the idea is basically that a naive server
    simply gets pointed at BFF files for message-id
    and loads any message there as an HTTP representation,
    with regards to HTTP, HTML, and so on, about these
    sorts "sensible, fungible, tractable" conventions.


    It's been a while since I studied the standards,
    so I'm looking to get back tapping at the C10K server
    here, basically with hi-po full throughput then with
    regards to the sentinel/doorman bit (Load R/F/S/H).

    So, I'll be looking for "partially digested and
    composable search metadata formats" and "informative
    and normative standards-based message and content".

    They already have one of those, it's called "Internet".


    [2024/02/09]

    Reading up on anti-spam, it seems that Usenet messages have
    a pretty simple format, then with regards to all of Internet
    messages, or Email and MIME and so on, gets into basically
    the nitty-gritty of the Internet Protocols like SMTP, IMAP, NNTP,
    and HTTP, about figuring out what's the needful then for things
    like Netnews messages, Email messages, HTTP messages,
    and these kinds of things, basically for message multi-part.

    https://en.wikipedia.org/wiki/MIME

    (DANE, DKIM, DMARC, ....)

    It's kind of complicated to implement correctly the parsing
    of Internet messages, so, it should be done up right.

    The compeering would involve the conventions of INND.
    The INND software is very usual, vis-a-vis Tornado or some
    commercial cousins, these days.

    The idea seems to be "run INND with cleanfeed", in terms
    of control and junk and the blood/brain barrier or here
    the text/binaries barrier, I'm only interested in setting up
    for text and then maybe some "richer text" or as with
    regards to Internet protocols for messaging and messages.

    Then the idea is to implement this "clean-room", so it results
    a sort of plain description of data structures logical/physical
    then a reference implementation.

    The groups then accepted/rejected for compeering basically
    follow the WILDMAT format, which is pretty reasonable
    in terms of yes/no/maybe or sure/no/yes sorts of filters.

    https://www.eyrie.org/~eagle/software/inn/docs-2.6/newsfeeds.html

    https://www.eyrie.org/~eagle/software/inn/docs-2.6/libstorage.html

    https://www.eyrie.org/~eagle/software/inn/docs-2.6/storage.conf.html#S2

    It refers to the INND storageApi token so I'll be curious about
    that and BFF. The tradspool format, here as it partitions under
    groups, is that BFF instead partitions under message-ID, that
    then groups files have pointers into those.

    message-id/

    id <- "id"

    hd <- "head"
    bd <- "body"

    td <- "thread", reference, references
    rd <- "replied to", touchfile

    ad <- "author directory", ... (author id)
    yd <- "year to date" (date)

    xd <- "expired", no-archive, ...
    dd <- "dead", "soft-delete"
    ud <- "undead", ...

    The files here basically indicate by presence then content,
    what's in the message, and what's its state. Then, the idea
    is that some markers basically indicate any "inconsistent" state.

    The idea is that the message-id folder should be exactly on
    the order of the message size, only. I.e. besides head and body,
    the other files are only presence indicators or fixed size.
    And, the presence files should be limited to fit in the range
    of the alphabet, as above it results single-letter named files.

    Then the idea is that the message-id folder is created on the
    side with id,hd,bd then just moved/renamed into its place,
    then by its presence the rest follows. (That it's well-formed.)

    The idea here again is that the storage is just stored deflated already,
    with the idea that then as the message is served up with threading,
    where to litter the thread links, and whether to only litter the
    referring post's folder with the referenced post's ID, or that otherwise
    there's this idea that it's a poor-man's sort of write-once-read-many
    organization, that's horizontally scalable, then that any assemblage
    of messages can be overlaid together, then groups files can be created
    on demand, then that as far as files go, the natural file-system cache,
    caches access to the files.

    The idea that the message is stored compressed is that many messages
    aren't much read, and most clients support compressed delivery,
    and the common deflate format allows "stitching" together in
    a reference algorithm, what results the header + glue + body.
    This will save much space and not be too complicated to assemble,
    where compression and encryption are a lot of the time,
    in Internet protocols.

    The message-id is part of the message, so there's some idea that
    it's also related to de-duplication under path, then that otherwise
    when two messages with the same message-id arrive, but different
    otherwise content, is wrong, about what to do when there are conflicts
    in content.

    All the groups files basically live in one folder, then with regards
    to their overviews, as that it sort of results just a growing file,
    where the idea is that "fixed length records" pretty directly relate
    a simplest sort of addressing, in a world where storage has grown
    to be unbounded, if slow, that it also works well with caches and
    mmap and all the usual facilities of the usual general purpose
    scheduler and such.

    Relating that to time-series data then and currency, is a key sort
    of thing, about here that the idea is to make for time-series
    organization that it's usually enough hierarchical YYYYMMDD,
    or for example YYMMDD, if for example this system's epoch
    is Jan 1 2000, with a usual sort of idea then to either have
    a list of message ID's, or, indices that are offsets to the group
    file, or, otherwise as to how to implement access in partition
    to relations of the items, for browsing and searching by date.

    Then it seems for authors there's a sort of "author-id" to get
    sorted, so that basically like threads is for making the
    set-associativity of messages and threads, and groups, to authors,
    then also as with regards to NOOBNB that there are
    New/Old/Off authors and Bot/Non/Bad authors,
    keeping things simple.

    Here the idea is that authors, who reply to other authors,
    are related variously, people they reply to and people who
    reply to them, and also the opposite, people who they
    don't reply to and people who don't reply to them.
    The idea is that common interest is reflected in replies,
    and that can be read off the messages, then also as
    for "direct" and "indirect" replies, either down the chain
    or on the same thread, or same group.

    (Cliques after Kudos and "Frenemies" after "Jabber",
    are about same, in "tendered response" and "tendered reserve",
    in groups, their threads, then into the domain of context.)

    So, the first part of SFF seems to be making OVERVIEW,
    which is usual key attributes, then relating authorships,
    then as about content. As well for supporting NNTP and IMAP,
    is for some default SFF supporting summary and retrieval.

    groups/group-id/

    ms <- messages

    <- overview ?
    <- thread heads/tails ?
    <- authors ?
    <- date ranges ?

    It's a usual idea that BFF, the backing file-format, and
    SFF, the search file-format, has that they're distinct
    and that SFF is just derived from BFF, and on-demand,
    so that it works out that search algorithms are implemented
    on BFF files, naively, then as with regards to those making
    their own plans and building their own index files as then
    for search and pointing those back to groups, messages,
    threads, authors, and so on.


    The basic idea of expiry or time-to-live is basically
    that there isn't one, yet, it's basically to result that
    the message-id folders get tagged in usual rotations
    over the folders in the arrival and date partitions,
    then marked out or expunged or what, as with regards
    to the write-once-read-many or regenerated groups
    files, and the presence or absence of messages by their ID.
    (And the state of authors, in time and date ranges.)

    [ page break 8 ]

    [2024/02/10]

    About TLS again, encryption, one of the biggest costs
    of serving data in time (CPU time), is encryption, the
    other usually being compression, here with regards
    to what are static assets or already generated and
    sort of digested.

    So, looking at the ciphersuites of TLS, is basically
    that after the handshake and negotiation, and
    as above there's the notion of employing
    renegotiation in 1.2 to share "closer certificates",
    that 1.3 cut out, that after negotiation then is
    the shared secret of the session that along in
    the session the usual sort of symmetric block-cipher
    converts the plain- or compressed-data, to,
    the encrypted and what results the wire data.
    (In TLS the client and server use the same
    "master secret" for the symmetric block/stream
    cipher both ways.)

    So what I'm wondering is about how to make it
    so, that the data is stored first compressed at
    rest, and in pieces, with the goal to make it so
    that usual tools like zcat and zgrep work on
    the files at rest, and for example inflate them
    for use with textutils. Then, I also wonder about
    what usual ciphersuites result, to make it so that
    there's scratch/discardable/ephemeral/ad-hoc/
    opportunistic derived data, that's at least already
    "partially encrypted", so that then serving it for
    the TLS session, results a sort of "block-cipher's
    simpler-finishing encryption".

    Looking at ChaCha algorithm, it employs
    "addition, complement, and rotate".
    (Most block and streaming ciphers aim to
    have the same size of the output as the input
    with respect to otherwise a usual idea that
    padding output reduces available information.)

    https://en.wikipedia.org/wiki/Block_cipher
    https://en.wikipedia.org/wiki/Stream_cipher

    So, as you can imagine, block-ciphers are
    a very minimal subset of ciphers altogether.

    There's a basic idea that the server just always
    uses the same symmetric keys so that then
    it can just encrypt the data at rest with those,
    and, serve them right up. But, it's a matter of
    the TLS Handshake establishing the "PreMaster
    secret" (or, lack thereof) and it's "pesudo-random function",
    what with regards to the server basically making
    for contriving its "random number" earlier in
    the handshake to arrive at some "predetermined
    number".

    Then the idea is for example just to make it
    so for each algorithm that the data's stored
    encrypted then that it kind of goes in and out
    of the block cipher, so that then it sort of results
    that it's already sort of encrypted and takes less
    rounds to line up with the session secret.

    https://datatracker.ietf.org/doc/html/rfc8446

    "All the traffic keying material is recomputed
    whenever the underlying Secret changes
    (e.g., when changing from the handshake
    to Application Data keys or upon a key update)."

    TLS 1.3: "The key derivation functions have
    been redesigned. The new design allows
    easier analysis by cryptographers due to
    their improved key separation properties.
    The HMAC-based Extract-and-Expand Key
    Derivation Function (HKDF) is used as an
    underlying primitive."

    https://en.wikipedia.org/wiki/HKDF

    So, the idea is "what goes into HKDF so
    that it results a known value, then
    having the data already encrypted for that."

    I'm not much interested in actual _strength_
    of encryption, just making it real simple in
    the protocol to have static data ready to
    send right over the wire according to the
    server indicating in the handshake how it will be.

    And that that can change on demand, ....

    "Values are defined in Appendix B.4."

    https://datatracker.ietf.org/doc/html/rfc8446#appendix-B.4

    So, I'm looking at GCM, CCM, and POLY1305,
    with respect to how to compute values that
    it results the HKDF is a given value.

    https://en.wikipedia.org/wiki/Cipher_suite

    Then also there's for basically TLS 1.2, just
    enough backward and forward that the server
    can indicate the ciphersuite, and the input to
    the key derivation function, for which its data is
    already ready.

    It's not the world's hardest problem to arrive
    at what inputs will make for a given hash
    algorithm that it will arrive at a given hash,
    but it's pretty tough. Here though it would
    allow this weak encryption (and caching of them)
    the static assets, then serving them in protocol,
    figuring that man-in-the-middle is already broken
    anyways, with regards to the usual 100's of
    "root CAs" bundled with usual User-Agentry.

    I.e., the idea here is just to conform with TLS,
    while, having the least cost to serve it, while, using
    standard algorithms, and not just plain-text,
    then, being effectively weak, and, not really
    expecting any forward privacy, but, saving
    the environment by using less watts.

    Then what it seems results is that the server just
    indicates ciphersuites that have that the resulting
    computed key can be made so for its hash,
    putting the cost on the handshake, then
    that the actual block cipher is a no-op.


    You like ...?

    [2024/02/11]

    So I'm looking at my hi-po C10K low-load/constant-load
    Internet text protocol server, then with respect to
    encryption and compression as usual, then I'm looking
    to make that in the framework, to have those basically
    be out-of-band, with respect to things like
    encryption and compression, or things like
    transport and HTTP or "upgrade".

    I.e., the idea here is to implement the servers first
    in "TLS-terminated" or un-encrypted, then as with
    respect to having enough aware in the protocol,
    to make for adapting to encrypting and compressing
    and upgrading front-ends, with regards to the
    publicly-facing endpoints and the internally-facing
    endpoints, which you would know about if you're
    usually enough familiar with client-server frameworks
    and server-oriented architecture and these kinds of
    things.

    The idea then is to offload the TLS-termination
    to a sort of dedicated layer, then as with regards
    to a generic sort of "out-of-band" state machine
    the establishment and maintenance of the connections,
    where still I'm mostly interested in "stateful" protocols
    or "connection-oriented" vis-a-vis the "datagram"
    protocols, or about endpoints and sockets vis-a-vis
    endpoints and datagrams, those usually enough sharing
    an address family while variously their transport (packets).

    Then there's sort of whether to host TLS-termination
    inside the runtime as usually, or next to it as sort of
    either in-process or out-of-process, similarly with
    compression, and including for example concepts
    of cache-ing, and upgrade, and these sorts things,
    while keeping it so that the "protocol module" is
    all self-contained and behaves according to protocol,
    for the great facility of the standardization and deployment
    of Internet protocols in a friendly sort of environment,
    vis-a-vis the DMZ to the wider Internet, as basically with
    the idea of only surfacing one well-known port and otherwise
    abstracting away the rest of the box altogether,
    to reduce the attack surface its vectors, for
    a usual goal of thread-modeling, reducing it.


    So people would usually enough just launch a proxy,
    but I'm mostly interested only in supporting TLS and
    perhaps compression in the protocol as only altogether
    a pass-through layer, then as with regards to connecting
    that in-process as possible, so passing I/O handles,
    otherwise with a usual notion of domain sockets
    or just plain Address Family UNIX sockets.

    There's basically whether the publicly-facing actually
    just serves on the usual un-encrypted port, for the
    insensitive types of things, and the usual encrypted
    port, or whether it's mostly in the protocol that
    STARTTLS or "upgrade" occurs, "in-band" or "out-of-band",
    and with respect to usually there's no notion at all
    of STREAMS or "out-of-band" in STREAMS, sockets,
    Address Family UNIX.


    The usual notion here is making it like so:

    NNTP
    IMAP -> NNTP
    HTTP -> IMAP -> NNTP

    for a Usenet service, then as with respect to
    that there's such high affinity of SMTP, then
    as with regards to HTTP more generally as
    the most usual fungible de facto client-server
    protocol, is connecting those locally after
    TLS-termination, while still having TLS-layer
    between the Internet and the server.

    So in this high-performance implementation it
    sort of relies directly on the commonly implemented
    and ubiquitously available non-blocking I/O of
    the runtime, here as about keeping it altogether
    simple, with respect to the process model,
    and the runtime according to the OS/virt/scheduler's
    login and quota and bindings, and back-end,
    that in some runtimes like an app-container,
    that's supposed to live all in-process, while with
    respect to off-loading load to right-sized resources,
    it's sort of general.

    Then I've written this mostly in Java and plan to
    keep it this way, where the Direct Memory for
    the service of non-blocking I/O, is pretty well
    understood, vis-a-vis actually just writing this
    closer to the user-space libraries, here as with
    regards to usual notions of cross-compiling and
    so on. Here it's kind of simplified because this
    entire stack has no dependencies outside the
    usual Virtual Machine, it compiles and runs
    without a dependency manager at all, then
    though that it gets involved the parsing the content,
    while simply the framework of ingesting, storing,
    and moving blobs is just damn fast, and
    very well-behaved in the resources of the runtime.

    So, setting up TLS termination for these sorts
    protocols where the protocol either does or
    doesn't have an explicit STARTTLS up front
    or always just opens with the handshake,
    basically has where I'm looking at how to
    instrument and connect that for the Hopper
    as above and how besides passing native
    file and I/O handles and buffers, what least
    needful results a useful approach for TLS on/off.

    So, this is a sort of approach, figuring for
    "nesting the protocols", where similarly is
    the goal of having the fronting of the backings,
    sort of like so, ...

    NNTP
    IMAP -> NNTP
    HTTP -> NNTP
    HTTP -> IMAP -> NNTP

    with the front being in the protocol, then
    that HTTP has a sort of normative protocol
    for IMAP and NNTP protocols, and IMAP
    has as for NNTP protocols, treating groups
    like mailboxes, and commands as under usual
    sorts HTTP verbs and resources.

    Similarly the same server can just serve each
    the relevant protocols on each the relevant ports.

    If you know these things, ....

    [2024/02/12]

    Looking at how Usenet moderated groups operate,
    well first there's PGP and control messages then
    later it seems there's this sort Stump/Webstump
    setup, or as with regards to moderators.isc.org,
    what is usual with regards to control messages
    and usual notions of control and cancel messages
    and as with regards to newsgroups that actually
    want to employ Usenet moderation sort of standardly.

    (Usenet trust is mostly based on PGP, or
    'Philip Zimmerman's Pretty Good Privacy',
    though there are variations and over time.)

    http://tools.ietf.org/html/rfc5537

    http://wiki.killfile.org/projects/usenet/faqs/nam/


    Reading into RFC5537 gets into some detail like
    limits in the headers field with respect to References
    or Threads:

    https://datatracker.ietf.org/doc/html/rfc5537#section-3.4.4

    https://datatracker.ietf.org/doc/html/rfc5537#section-3.5.1

    So, the agents are described as

    Posting
    Injecting
    Relaying
    Serving
    Reading

    Moderator
    Gateway

    then with respect to these sorts separations duties,
    the usual notions of Internet protocols their agents
    and behavior in the protocol, old IETF MUST/SHOULD/MAY
    and so on.

    So, the goal here seems to be to define a
    profile of "connected core services" of sorts
    of Internet protocol messaging, then this
    "common central storage" of this BFF/SFF
    and then reference implementations then
    for reference editions, these sorts things.

    Of course there already is one, it's called
    "Internet mail and news".

    [ page break 9 ]


    [2024/02/14]

    So one thing I want here is to make it so that data can
    be encrypted very weakly at rest, then, that, the SSL
    or TLS for TLS 1.2 or TLS 1.3, results that the symmetric
    key bits for the records is always the same as this what
    is the very-weak key.

    This way pretty much the entire CPU load of TLS is
    eliminated, while still the data is encrypted very-weakly
    which at least naively is entirely inscrutable.

    The idea is that in TLS 1.2 there's this

    client random cr ->
    <- server random sr
    client premaster cpm ->

    these going into PRF (cpm, 'blah', cr + sr, [48]), then
    whether renegotiation keeps the same client random
    and client premaster, then that the server can compute
    the server random to make it so derived the very-weakly
    key, or for example any of what results least-effort.

    Maybe not, sort of depends.

    Then the TLS 1.3 has this HKDF, HMAC Key Derivation Function,
    it can again provide a salt or server random, then as with
    regards to that filling out in the algorithm to result the
    very-weakly key, for a least-effort block cipher that's also
    zero-effort and being a pass-through no-op, so the block cipher
    stays out the way of the data already concatenably-compressed
    and very-weakly encrypted at rest.


    Then it looks like I'd be trying to make hash collisions which
    is practically intractable, about what goes into the seeds
    whether it can result things like "the server random is
    zero minus the client random, their sum is zero" and
    this kind of thing.


    I suppose it would be demonstrative to setup a usual
    sort of "TLS man-in-the-middle" Mitm just to demonstrate
    that given the client trusts any of Mitm's CAs and the
    server trusts any of Mitm's CAs that Mitm sits in the middle
    and can intercept all traffic.

    So, the TLS 1.2, PRF or pseudo-random function, is as of
    "a secret, a seed, and an identifying label". It's all SHA-256
    in TLS 1.2. Then it's iterative over the seed, that the
    secret is hashed with the seed-hashed secret so many times,
    each round of that concatenated ++ until there's enough bytes
    to result the key material. Then in TLS the seed is defined
    as "blah' ++ seed, so, to figure out how to figure to make it
    so that 'blah' ++ (client random + server random) makes it
    possible to make a spigot of the hash algorithm, of zeros,
    or an initial segment long enough for all key sizes,
    to split out of that the server write MAC and encryption keys,
    then to very-weakly encrypt the data at rest with that.

    Then the client would still be sending up with the client
    MAC and encryption keys, about whether it's possible
    to setup part of the master key or the whole thing.
    Whether a client could fabricate the premaster secret
    so that the data resulted very-weakly encryped on its
    own terms, doesn't seem feasible as the client random
    is sent first, but cooperating could help make it so,
    with regards to the client otherwise picking a weak
    random secret overall.

    (Figuring TLS interception is all based on Mitm,
    not "cryptanalysis and the enigma cipher", and
    even the very-weakly just look like 0's and 1's.)

    So, P_SHA256 is being used to generated 48 bytes,
    so that's two rounds, where the first round is
    32 bytes then second 32 bytes half those dropped,
    then if the client/server MAC/encrypt
    are split up into those, ..., or rather only the first
    32 bytes, then only the first SHA 256 round occurs,
    if the Initialization Vector IV's are un-used, ...,
    results whether it's possible to figure out
    whether "master secret" ++ (client random + server random),
    makes for any way for such a round of SHA-256,
    given an arbitrary input to result a contrived value.

    Hm..., reading thar Web suggests that "label + seed"
    is the concatenation of the 'blah' and the digits of
    client random + server random, as character digits.

    Let's see, a random then looks like so,

    struct {
    uint32 gmt_unix_time;
    opaque random_bytes[28];
    } Random;

    thus that's quite a bit to play with, but I'm
    not sure at all how to make it so that round after
    round of SHA-256, settles on down to a constant,
    given that 28 bytes' decimal digits worth of seed
    can be contrived, while the first 4 bytes of the
    resulting 32 bytes is a gmt_unix_time, with the
    idea that they may be scrambled, as it's not mentioned
    anywhere else to check the time in the random.

    "Clocks are not required to be set correctly
    by the basic TLS protocol; higher-level or
    application protocols may define additional
    requirements."

    So, the server-random can be contrived,
    what it results the 13 + 32 bytes that are
    the seed for the effectively 1-round SHA-256
    hash of an arbitrary input, that the 32 bytes
    can be contrived, then is for wondering
    about how to make it so that results a
    contrived very-weakly SHA-256 output.

    So the premaster secret is decrypted with
    the server's private key, or as with respect
    to the exponents of DH or what, then that's
    padded to 64 bytes, which is also the SHA-256
    chunk size, then the output of the first round
    the used keys and second the probably un-used
    initialization vectors, ...

    https://en.wikipedia.org/wiki/SHA-2#Pseudocode


    "The SHA-256 hash algorithm produces hash values
    that are hard to predict from the input."

    John Larkin
    Highland Tech Glen Canyon Design Center
    Lunatic Fringe Electronics
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Ross Finlayson@ross.a.finlayson@gmail.com to sci.electronics.design,sci.math on Thu Apr 23 13:23:36 2026
    From Newsgroup: sci.math

    On 04/23/2026 12:23 PM, john larkin wrote:
    On Thu, 23 Apr 2026 09:39:31 -0700, Ross Finlayson <ross.a.finlayson@gmail.com> wrote:


    a 4000-line post!



    Internet Text Protocols for Internet Messages like Usenet, Email,
    and so on, over various transports, while incredibly ubiquitous,
    arguably are very verbose compared to more compressed binary formats,
    and their specifications, also in text, run into the hundreds of
    kilobytes, or, if a bit less so, kibibytes.

    That post is a bit of a digest over posts over some years.


    And think, for all that, a JPEG picture of a housecat is only in
    the megabytes, or mebibytes, and worth _at least_ a thousand words.


    Please excuse (and ignore) if it's unwanted, then a few different
    ideas in it explore things like "region connection calculus"
    and "pyramidal hierarchical" approaches to sifting through files.


    --- Synchronet 3.21f-Linux NewsLink 1.2