• Re: Simple conversions from HTML to simple markups are disappointing

    From rtr@rtr@nospam.invalid to comp.infosystems.gemini,comp.infosystems.gopher on Sun Jan 23 20:37:50 2022
    From Newsgroup: comp.infosystems.gopher

    On Sun, 23 Jan 2022 13:25:29 +0100
    Luca Saiu <luca@ageinghacker.net> wrote:

    [...]

    Now, it is possible to obtain a better conversion by spending more
    effort: in particular lynx (which of course was never designed for
    this task) is inadequate in preserving markup information. It is
    possible to parse HTML instead, and start from an AST. On the other
    hand some fault lies in the HTML source document as well: The
    document could have used, for example, CSS for icons instead of <img> elements when the content was not significant enough to deserve
    translation. However some style information only encoded in CSS
    would be significant for translation: had I used CSS in the place of old-style <tt> elements, recognising rCLcoderCY-type elements would have
    been an issue. My html-to-gemini or html-to-gopher conversion would
    need a lot of the complexity I want to avoid.

    I have come to believe that the only really practical solution is
    translating in the opposite direction: starting from a simple and
    clean markup (I would say Gemini) and from that generating other
    simple markups (Gopher) and the legacy system (HTML). This can and
    should handle relative, intra-server links.

    Interesting. I also do think that gemini/gopher -> html is easier to
    deal with rather than the other way around. When I was first starting to
    get into gemini I also dabbled with the idea of just converting my HTML
    pages to gemtext. I figured that it's just easier to strip everything
    of formatting and starting with plaintext and convert that gemtext.
    Granted I don't have that much posts to mess with so that probably
    played into my decision making process.
    --
    Give them an inch and they will take a mile.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From bunburya@bunburya@tilde.club to comp.infosystems.gemini,comp.infosystems.gopher on Sun Jan 23 14:03:27 2022
    From Newsgroup: comp.infosystems.gopher

    On 23/01/2022 12:25, Luca Saiu wrote:
    I have come to believe that the only really practical solution is
    translating in the opposite direction: starting from a simple and clean markup (I would say Gemini) and from that generating other simple
    markups (Gopher) and the legacy system (HTML). This can and should
    handle relative, intra-server links.

    I believe this is correct, because the features supported by HTML are a superset of those supported by gemtext. So going from HTML -> gemtext
    almost always results in a loss of some information, which means a
    choice must be made as to how to handle the loss of information. I
    suspect the optimal solution to the problem will depend heavily on the context, so it is hard to create a perfect, generalised HTML -> gemtext converter.

    Alternatively, you could consider starting in markdown, which lies
    somewhere between gemtext and HTML in terms of features. Markdown ->
    HTML is easy and commonly done. In principle, markdown -> gemtext
    suffers from the same issues as HTML -> gemtext (loss of information due
    to moving to a format with fewer supported features), but much less information is lost as markdown is much closer to gemtext to begin with.
    There are some tools out there already to convert from markdown to
    gemtext, such as https://pypi.org/project/md2gemini/

    I'm guessing the main thing you are missing in gemtext is inline links.
    If you start from markdown, these can be preserved perfectly when
    converting to HTML, and handled sensibly when converting to gemtext
    (there are a few common ways to do this, such as converting to
    footnotes, which the above tool supports).
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Luca Saiu@luca@ageinghacker.net to comp.infosystems.gemini,comp.infosystems.gopher on Sun Jan 23 17:58:18 2022
    From Newsgroup: comp.infosystems.gopher

    Hello bunburya.

    On 2022-01-23 at 14:03 +0000, bunburya wrote:

    In principle, markdown -> gemtext suffers from the same issues as HTML
    gemtext (loss of information due to moving to a format with fewer
    supported features), but much less information is lost as markdown is
    much closer to gemtext to begin with.

    Agreed.

    I'm guessing the main thing you are missing in gemtext is inline links.

    To me the lack of control on preformatted text is more serious than the
    lack of inline links, possibly because of the technical topics I
    normally write about.

    Not being able to display source code clearly is a fatal flaw for me;
    and notice how Gopher is less flawed than Gemini in this sense, by
    virtue of being less abstract.

    I do not particularly mind non-inline links; in fact they may promote a
    clear style. However numbered footnote-style links obtained by
    conversion, without descriptive labels, are difficult to follow without interrupting the flow of reading: see the end of my conversion example,
    which I believe is representative in terms of clutter.

    gemini://ageinghacker.net/test-conversion.gmi

    One has to write, from the beginning, in the new rCLminimal markuprCY style.

    Incidentally I am not saying that Gemini is perfect. In fact I miss a
    comment syntax, which I would use to encode my own information
    (examples: list of keywords, tags, priority in a site-wide page map).
    But what I am thinking is a set of semantic extensions, the kind which
    Gemini is designed to prevent; that is fair.

    In the same way I also miss italic and bold -- call them rCLemphasisrCY if
    you will -- but here it might be healthy to let them go altogether.
    Part of this entire exercise is detoxing from the overabundance of
    irrelevant information. For decades I have been planning never to use
    smileys again and write text a dignified style, with meaning conveyed
    through words instead of some flashy semi-literate replacement for them.
    In the end once in a while laziness wins.


    The more I consider the issue the more I lean towards defining my own
    source format from which to machine-generate even simple formats like
    Gemini. In the longer term it would be something powerful and
    extensible, like M4 without the awful quoting mechanism. For the time
    being it can be a trivial system.

    I would keep my public site source tree, with markup files identified by
    a specific extension linking each other, along with data files such as
    images. A script would generate copies of the entire tree, with
    symbolic links where appropriate, with notes translated into Gemini,
    Gopher or HTML.


    If I get to write this tool and make it even vaguely usable by others I
    will announce it here.

    How do you, and other people here, solve the problem? Do you write
    sites accessible only to Gemini or only to Gopher?

    Thanks for the conversation.
    --
    Luca Saiu -- http://ageinghacker.net
    I support everyone's freedom of mocking any opinion or belief, no
    matter how deeply held, with open disrespect and the same unrelented
    enthusiasm of a toddler who has just learned the word "poo".
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From meff@email@example.com to comp.infosystems.gemini,comp.infosystems.gopher on Sun Jan 23 20:02:52 2022
    From Newsgroup: comp.infosystems.gopher

    On 2022-01-23, Luca Saiu <luca@ageinghacker.net> wrote:
    Disgusted by the web with its anti-features, its enormous gratuitous complexity and its essentially proprietary nature (the effort of re-implementing a significant component from scratch is unrealistic for single developers), I have recently opened Gemini and Gopher services

    Hm I didn't think Jehova's Witnesses made it from the Web onto the net
    also...

    After experimenting for one day or two I have to admit that the result
    is disappointing. The conversion is unnatural and I find that at the
    same time some important information is lost (<tt> and <pre>) while some which is irrelevant is preserved (icons). Having out-of-line links does
    not help readability when references are numerous.

    Indeed it's hard to move from HTML to more "lean" markup formats when
    you are, at least somewhat, relying on the semantic information that
    HTML is providing.

    I have come to believe that the only really practical solution is
    translating in the opposite direction: starting from a simple and clean markup (I would say Gemini) and from that generating other simple
    markups (Gopher) and the legacy system (HTML). This can and should
    handle relative, intra-server links.

    HTML offers semantic information in its markup and browsers can take
    that semantic information and make sense of it. There's obviously a
    lot of conflation between visual and semantic information in HTML, but
    the semantic information present makes it hard to translate to
    non-semantic markup formats. Markdown, Gemtext, etc are mostly just
    visual formats (where the browser is usually fairly "dumb" about how
    to display the format.) It might make sense to format your information
    in a non-semantic way first and then add semantic niceities, like
    <pre> afterword.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From bunburya@bunburya@tilde.club to comp.infosystems.gemini,comp.infosystems.gopher on Mon Jan 24 21:24:14 2022
    From Newsgroup: comp.infosystems.gopher


    On 23/01/2022 16:58, Luca Saiu wrote:
    To me the lack of control on preformatted text is more serious than the
    lack of inline links, possibly because of the technical topics I
    normally write about.

    Not being able to display source code clearly is a fatal flaw for me;
    and notice how Gopher is less flawed than Gemini in this sense, by
    virtue of being less abstract.

    What is the issue you are having with pre-formatted text? Line numbering
    and syntax highlighting are the main things that come to mind - I think
    these could be achieved on the client side, though I'm not aware of any
    client that currently does so. (I know there was some discussion of
    syntax highlighting in pre-formatted text on the mailing list a while
    ago; the majority view seemed to be that the alt text part of the pre-formatted text block could indicate the language, though some
    disagreed with using alt text in that way).


    How do you, and other people here, solve the problem? Do you write
    sites accessible only to Gemini or only to Gopher?

    Personally I publish only to Gemini; however, I write very little anyway (really just my gemlog, which is not updated all that often) so I don't
    claim to be any kind of example to follow.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Luca Saiu@luca@ageinghacker.net to comp.infosystems.gemini,comp.infosystems.gopher on Wed Jan 26 01:10:59 2022
    From Newsgroup: comp.infosystems.gopher

    On 2022-01-24 at 21:24 +0000, bunburya wrote:

    On 23/01/2022 16:58, Luca Saiu wrote:
    To me the lack of control on preformatted text is more serious than the
    lack of inline links, possibly because of the technical topics I
    normally write about.
    Not being able to display source code clearly is a fatal flaw for me;
    and notice how Gopher is less flawed than Gemini in this sense, by
    virtue of being less abstract.

    What is the issue you are having with pre-formatted text?

    On Gemini we have to clearly indicate what is pre-formatted and what is
    not, because the default is that whitespace can be congealed and lines
    broken and moved in order to fill paragraphs; Gopher does not do it, but
    that means that paragraphs may end up displayed too narrow or too wide
    for the client.

    If I convert from HTML my quick hack based on Lynx fails because the information on what was pre-formatted is lost. Converting *well* from
    HTML requires analysing CSS as well.

    For new text, not obtained by conversion, the Gemini solution works
    well.

    Line numbering and syntax highlighting are the main things that come
    to mind - I think these could be achieved on the client side, though
    I'm not aware of any client that currently does so.

    Yes. I am not against these features as long as line numbering does not interfere with cut-and-paste.

    (I know there was some discussion of syntax highlighting in
    pre-formatted text on the mailing list a while ago; the majority view seemed to
    be that the alt text part of the pre-formatted text block could indicate the language, though some disagreed with using alt text in that way).

    The alt text is a good feature by itself (example: this kind of
    colour-coding for different programming languages or abstraction layers: http://ageinghacker.net/projects/jitter-tutorial/ ), but in my opinion
    not very philosophically coherent with the rest of Gemini which is
    otherwise so minimalistic. And the alt text, again, lends itself to
    semantic extension.

    How do you, and other people here, solve the problem? Do you write
    sites accessible only to Gemini or only to Gopher?

    Personally I publish only to Gemini; however, I write very little
    anyway (really just my gemlog, which is not updated all that often) so
    I don't claim to be any kind of example to follow.

    I see. Thanks.

    I think I will write some simple translator to generate Gemini, Gopher
    and HTML from the same source.
    --
    Luca Saiu -- http://ageinghacker.net
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Andrea Biscuola@a@abiscuola.com to comp.infosystems.gemini,comp.infosystems.gopher on Wed Mar 8 21:05:47 2023
    From Newsgroup: comp.infosystems.gopher

    Hi Luca.

    I have come to believe that the only really practical solution is
    translating in the opposite direction: starting from a simple and clean markup (I would say Gemini) and from that generating other simple
    markups (Gopher) and the legacy system (HTML). This can and should
    handle relative, intra-server links.

    This is what I do through gmi2html:

    https://lab.abiscuola.org/gmnxd/dir?ci=tip&name=src/gmi2html

    I write my articles in gemtext and, through a script, convert the pages to
    HTML with that tool (BTW, I wrote it).

    My capsule:

    gemini://gemini.abiscuola.com

    --
    Luca Saiu -- http://ageinghacker.net
    I support everyone's freedom of mocking any opinion or belief, no
    matter how deeply held, with open disrespect and the same unrelented enthusiasm of a toddler who has just learned the word "poo".
    --
    Andrea
    --- Synchronet 3.21d-Linux NewsLink 1.2