• Bloat of Wiki generated HTML pages

    From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to alt.html on Mon Oct 28 21:41:05 2024
    From Newsgroup: alt.html

    I'm accessing some public Wiki information (~2100 Wiki pages).
    Due to the extreme time and space demands I started to extract
    that information (from the original MD files) to either create
    a huge text file or to generate a HTML file.[*]

    My post's intention is to understand whether the time and space
    bloat that I observed with that Wiki data is typical or just an
    effect of the underlying tool used.[**]

    For example, a typical MD file has
    10 header lines and 27 information lines (including links)
    in 48 (non-empty) lines and requiring 2'354 bytes.
    From this MD file they create HTML information with
    56'551 lines(!) requiring 3'744'427 bytes(!)
    and this file also loads 63(!) JS files with another
    4'104'887 bytes(!) of storage requirements.

    So the net storage demands for a *single* HTML page is about 8 MB
    and there's also the runtime considerations of the JS code. All
    that for 48 lines of information! - Is that typical for Wikis? -
    And *every* click on some link adds to those storage/time demands.

    (I'm regularly astonished how badly software is written nowadays,
    but these numbers appear to me to be beyond all hope.)

    BTW, I'd also be interested in hints if there's some free tool(s)
    to do such a MD-files -> HTML-file (or -> PDF-file) conversion so
    that I don't need to (unnecessarily) re-invent the wheel.

    Janis

    [*] I wished I wouldn't have to do that, though; I had hoped
    the Wiki authors would have a way to provide some PDF or an
    all-in-one-page HTML with the tool they're using to create the
    HTML structure. Alas, they (or the tool) seem to not be able to
    provide that.

    [**] I have no information about that tool (and I anyway don't
    intend any "tool blaming" with my post, so that information is
    of minor importance to me).
    --- Synchronet 3.21d-Linux NewsLink 1.2