• Question: Do Winston's headers cause charset issues for anyone else?

    From Maria Sophia@mariasophia@comprehension.com to alt.comp.os.windows-10,news.software.readers,alt.comp.microsoft.windows on Wed Mar 11 17:32:58 2026
    From Newsgroup: news.software.readers

    Question:
    Do Winston's headers cause charset issues for anyone else?
    Or just me?

    I am trying to understand something about how different newsreaders handle malformed headers because my home-grown "newsreader" has "problems" when responding to Winston's posts due to the way he formats his "FROM" header.
    From: ...wi+o#n+ <winstonmvp@gmail.com>

    That line apparently contains non-ASCII characters in the display name:
    i (U+00A1)
    + (U+00F1)
    o (U+00A7)
    # (U+00B1)
    n (U+00A4)
    another + (U+00F1)

    When I reply to posts from Winston (the ones where his display name
    contains characters like "w!no#nn"), my own outgoing article sometimes gets corrupted on the way out. The corruption seems to happen, I think, because Winston's display name or headers contain raw 8-bit characters that are not valid UTF-8 and not MIME-encoded.

    Usenet (like email) requires that all headers must be pure ASCII unless
    they use MIME encoded-words, which means Winston's headers maybe should be
    From: =?UTF-8?Q?w=C2=A1=C3=B1=C2=A7=C2=B1=C2=A4=C3=B1?= <winstonmvp@gmail.com> This is fully legal for Usenet and will not trigger nntp server rewrites.

    I wrote my own newsreader so I manually enforce strict 7-bit ASCII in my outgoing posts by the use of an extensive shortcuts.xml conversion macro.

    However, when Winston's illegal header bytes get copied into my attribution line or reply headers, some NNTP servers rewrite the article to "fix" the mismatch, which ends up mangling my otherwise clean pure-7-bit ASCII text.

    My question of others, since you're using "normal" newsreaders, is:
    Q: Do any of you see charset or encoding issues when replying to Winston's
    posts, or do your newsreaders and servers silently fix the illegal
    bytes so you never notice?

    I am trying to determine whether this is something unique to my strict
    ASCII workflow, or whether other clients also have to deal with it.

    Thanks for any insight.

    Note that I'm implementing the following shortcuts.xml to fix this,
    but nobody else will be using that conversion so it's just an N.B.

    <!-- Remove Unicode garbage from quoted Usenet 'X wrote:' lines -->
    <!-- (e.g., Winston's illegal headers) so my posts stay 7-bit clean -->
    <ReplaceRE Find="[^\x00-\x7F]" Replace="" />
    <ReplaceRE Find="^.*wrote:" Replace="Winston wrote:" />
    <ReplaceRE Find="^(References|In-Reply-To):.*" Replace="" />
    --
    When you write your own newsreader, you have to do everything yourself.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Carlos E.R.@robin_listas@es.invalid to alt.comp.os.windows-10,news.software.readers,alt.comp.microsoft.windows on Thu Mar 12 02:09:02 2026
    From Newsgroup: news.software.readers

    On 2026-03-12 01:32, Maria Sophia wrote:
    Question:
    Do Winston's headers cause charset issues for anyone else?
    Or just me?

    No.

    Asking TB to produce the raw message, it comes as

    From: =?UTF-8?B?Li4ud8Khw7HCp8KxwqTDsQ==?= <...>

    Which is legal, obviously. (Reasoning: if TB does it, then it is legal)

    Message-ID: <10obf37$3koaa$1@dont-email.me>
    MIME-Version: 1.0
    Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit

    User-Agent: Mozilla Thunderbird
    Content-Language: en-US



    Looking at the stored file in my computer:

    00000070 50 4F 53 54 roe 45 44 21 6E roe 6F 74 2D 66 roe 6F 72 2D 6D roe 61 69 6C 0A roe 46 72 6F 6D roe 3A 20 3D 3F POSTED!not-for-mail.From: =?
    0000008C 55 54 46 2D roe 38 3F 42 3F roe 4C 69 34 75 roe 64 38 4B 68 roe 77 37 48 43 roe 70 38 4B 78 roe 77 71 54 44 UTF-8?B?Li4ud8Khw7HCp8KxwqTD
    000000A8 73 51 3D 3D roe 3F 3D 20 3C roe 77 69 6E 73 roe 74 6F 6E 6D roe 76 70 40 67 roe 6D 61 69 6C roe 2E 63 6F 6D sQ==?= <..........@gmail.com
    000000C4 3E 0A 4E 65 roe 77 73 67 72 roe 6F 75 70 73 roe 3A 20 61 6C roe 74 2E 63 6F roe 6D 70 2E 6F roe 73 2E 77 69 >.Newsgroups: alt.comp.os.wi

    Which has been processed by Leafnode, without any problem.
    --
    Cheers, Carlos.
    ESEfc-Efc+, EUEfc-Efc|;
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From snipeco.2@snipeco.2@gmail.com (Wader of Doom) to alt.comp.os.windows-10,news.software.readers,alt.comp.microsoft.windows on Thu Mar 12 01:37:18 2026
    From Newsgroup: news.software.readers

    Maria Sophia <mariasophia@comprehension.com> wrote:

    Question:
    Do Winston's headers cause charset issues for anyone else?
    Or just me?

    Hi, Arlen, how's your User-Agent header?
    --
    Darth Wader [breathe, breathe...]
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Maria Sophia@mariasophia@comprehension.com to alt.comp.os.windows-10,news.software.readers,alt.comp.software.thunderbird on Wed Mar 11 19:36:48 2026
    From Newsgroup: news.software.readers

    Carlos E.R. wrote:
    Question:
    Do Winston's headers cause charset issues for anyone else?
    Or just me?

    I am trying to understand something about how different newsreaders handle malformed headers because my home-grown "newsreader" has "problems" when responding to Winston's posts due to the way he formats his "FROM" header.
    From: ...w-i|#-o-#-n|# <winstonmvp@gmail.com>

    That line apparently contains non-ASCII characters in the display name:
    -i (U+00A1)
    |# (U+00F1)
    -o (U+00A7)
    -# (U+00B1)
    -n (U+00A4)
    another |# (U+00F1)

    When I reply to posts from Winston (the ones where his display name
    contains characters like "w!n-o-#-nn"), my own outgoing article sometimes gets
    corrupted on the way out. The corruption seems to happen, I think, because Winston's display name or headers contain raw 8-bit characters that are not valid UTF-8 and not MIME-encoded.

    Usenet (like email) requires that all headers must be pure ASCII unless
    they use MIME encoded-words, which means Winston's headers maybe should be
    From: =?UTF-8?Q?w=C2=A1=C3=B1=C2=A7=C2=B1=C2=A4=C3=B1?= <winstonmvp@gmail.com>
    This is fully legal for Usenet and will not trigger nntp server rewrites.

    I wrote my own newsreader so I manually enforce strict 7-bit ASCII in my outgoing posts by the use of an extensive shortcuts.xml conversion macro.

    However, when Winston's illegal header bytes get copied into my attribution line or reply headers, some NNTP servers rewrite the article to "fix" the mismatch, which ends up mangling my otherwise clean pure-7-bit ASCII text.

    My question of others, since you're using "normal" newsreaders, is:
    Q: Do any of you see charset or encoding issues when replying to Winston's
    posts, or do your newsreaders and servers silently fix the illegal
    bytes so you never notice?

    I am trying to determine whether this is something unique to my strict
    ASCII workflow, or whether other clients also have to deal with it.

    Thanks for any insight.

    Note that I'm implementing the following shortcuts.xml to fix this,
    but nobody else will be using that conversion so it's just an N.B.

    <!-- Remove Unicode garbage from quoted Usenet 'X wrote:' lines -->
    <!-- (e.g., Winston's illegal headers) so my posts stay 7-bit clean -->
    <ReplaceRE Find="[^\x00-\x7F]" Replace="" />
    <ReplaceRE Find="^.*wrote:" Replace="Winston wrote:" />
    <ReplaceRE Find="^(References|In-Reply-To):.*" Replace="" />
    --
    When you write your own newsreader, you have to do everything yourself.

    Carlos E.R. wrote:
    On 2026-03-12 01:32, Maria Sophia wrote:
    Question:
    Do Winston's headers cause charset issues for anyone else?
    Or just me?


    No.

    Asking TB to produce the raw message, it comes as

    From: =?UTF-8?B?Li4ud8Khw7HCp8KxwqTDsQ==?= <...>

    Which is legal, obviously. (Reasoning: if TB does it, then it is legal)

    Message-ID: <10obf37$3koaa$1@dont-email.me>
    MIME-Version: 1.0
    Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit

    User-Agent: Mozilla Thunderbird
    Content-Language: en-US



    Looking at the stored file in my computer:

    00000070 50 4F 53 54 roe 45 44 21 6E roe 6F 74 2D 66 roe 6F 72 2D 6D roe 61 69 6C 0A roe 46 72 6F 6D roe 3A 20 3D 3F POSTED!not-for-mail.From: =?
    0000008C 55 54 46 2D roe 38 3F 42 3F roe 4C 69 34 75 roe 64 38 4B 68 roe 77 37 48 43 roe 70 38 4B 78 roe 77 71 54 44 UTF-8?B?Li4ud8Khw7HCp8KxwqTD
    000000A8 73 51 3D 3D roe 3F 3D 20 3C roe 77 69 6E 73 roe 74 6F 6E 6D roe 76 70 40 67 roe 6D 61 69 6C roe 2E 63 6F 6D sQ==?= <..........@gmail.com
    000000C4 3E 0A 4E 65 roe 77 73 67 72 roe 6F 75 70 73 roe 3A 20 61 6C roe 74 2E 63 6F roe 6D 70 2E 6F roe 73 2E 77 69 >.Newsgroups: alt.comp.os.wi

    Which has been processed by Leafnode, without any problem.

    Hi Carlos,

    Thanks for helping out as I'm as aware as anyone how dangerous it is to
    answer any question on Usenet that is out of the norm for most people.

    My ASCII-only home-grown newsreader is likely proper for the original
    RFCs, but it is just as likely outdated when it comes to newer readers.

    AFAICT, WinstonrCOs display name appears to me to contain raw 8-bit bytes
    U+00A1 U+00F1 U+00A7 U+00B1 U+00A4 U+00F1
    Which, as far as I'm aware, are illegal in RFC-compliant headers
    unless they're MIME encoded where, apparently, Thunderbird, Leafnode,
    and probably most modern servers automatically rewrite this into:
    =?UTF-8?B?Li4ud8Khw7HCp8KxwqTDsQ==?=
    Which is a valid MIME encoded word.

    But my home-grown newsreader does not rewrite it, of course.
    So it passes the raw bytes through.

    When I quote Winston, the reputed illegal bytes enter my own articles.
    w-i|#-o-#-n|# wrote:
    Since those bytes don't appear to be ascii or UTF-8 or Mime encoded,
    my outgoing article contains a header/body character-set mismatch.

    I'm not sure what NNTP servers are doing, but apparently some NNTP servers
    try to "fix" that apparently mismatch by rewriting my article charsets.

    Apparently, based on your kind response, Thunderbird users never see this because TB apparently rewrites WinstonrCOs header before quoting it.

    By doing that, TB always emits valid UTF-8, which means downstream NNTP
    servers see a valid MIME-encoded header and do nothing about it.

    So you end up seeing:
    From: =?UTF-8?B?Li4ud8Khw7HCp8KxwqTDsQ==?=
    Which your NNTP server accepts without modification.

    I think that implies your TB & Leafnode newsreaders, and likely slrn, Pan, Gnus, Outlook Express, etc., automatically sanitize illegal header bytes
    before quoting them.

    Drat.
    My home-grown newsreader is expecting the original Usenet rule to hold:
    All headers must be 7-bit ASCII unless MIME-encoded.
    Whereas, modern newsreader clients seem to be enforcing a more lax rule:
    All headers must be valid UTF-8 or MIME-encoded.
    Winston's posts apparently violate rules, but modern clients fix them.

    It's unfortunate that my workflow is strict enough to expose the problem.
    As an aside, I'm working on fixing it on my side, but it's not easy.

    <ReplaceRE Find="[^\x00-\x7F]" Replace="" />
    <ReplaceRE Find="^.*wrote:"
    Replace="Winston wrote:" />

    Newsgroups: alt.comp.os.windows-10,alt.comp.os.windows-11,alt.comp.microsoft.windows
    Subject: Re: Tutorial: Notepad++ shortcuts.xml macro converts unicode to the 95-keyboard ASCII characters
    Date: Wed, 11 Mar 2026 17:19:11 -0700
    Message-ID: <10ot0q0$1110$1@nnrp.usenet.blueworldhosting.com>

    In summary, I guess it's only a problem for me, since my home-grown
    newsreader (which is really mostly just telnet, stunnel & gVim) requires a post-processing step of the attribute line to make it 7-bit ASCII.

    Nothing yet in my newsreader chain rewrites illegal header bytes or MIME-encodes display names, or normalizes UTF-8 in order to sanitize attribution lines in order to comply with RFC 2047 encoded-word rules.
    --
    When you write your own newsreader, you have to consider the oddest things.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Maria Sophia@mariasophia@comprehension.com to alt.comp.os.windows-10,news.software.readers,alt.comp.software.thunderbird on Wed Mar 11 23:16:43 2026
    From Newsgroup: news.software.readers

    To add further value to what Carlos kindly tested using Thunderbird, apparently, those on Thunderbird see not this (which is what I see):
    From: ...wi+o#n+ <winstonmvp@gmail.com>
    Which, is comprised of...
    i (U+00A1)
    + (U+00F1)
    o (U+00A7)
    # (U+00B1)
    n (U+00A4)
    another + (U+00F1)

    But they actually see this instead (according to what Carlos reported):
    From: =?UTF-8?B?Li4ud8Khw7HCp8KxwqTDsQ==?=

    By RFC, Usenet headers must be pure 7-bit ASCII where, if a display name contains any character outside ASCII (for example Winston's inverted exclamation mark, n-tilde, section sign, plus-minus, currency sign, etc.),
    then the header must be encoded using RFC 2047 rules.

    The valid format is:

    =?charset?encoding?encoded-text?=

    Hence, if we break Winston's header down:

    =?UTF-8?B?Li4ud8Khw7HCp8KxwqTDsQ==?=
    | | | |
    | | | +-- Base64 text
    | | +------------------------ Encoding type ("B" = Base64)
    | +-------------------------- Character set (UTF-8)
    +-------------------------------- Begin encoded-word

    The Base64 portion is:

    Li4ud8Khw7HCp8KxwqTDsQ==

    Decoding that Base64 string yields the UTF-8 text:

    ...wi+o#n+

    Those characters are exactly the ones Winston uses in his display name.
    They are illegal as raw bytes in a header, but perfectly legal when encoded using this MIME mechanism.

    If a newsreader quotes Winston's raw 8-bit header bytes directly, without converting them to a legal encoded-word, the outgoing article ends up with
    a header/body charset mismatch. Some NNTP servers then try to "repair" the message, which can corrupt the post.

    Modern newsreaders (Thunderbird, slrn, Pan, etc.) automatically sanitize
    the header by converting Winston's illegal bytes into a proper MIME encoded-word. That is maybe why most people likely never see the problem.

    In summary, Winston apparently posts raw 8-bit characters in his display
    name (which is not allowed by RFCs). But luckily, most modern clients
    rewrite them into a legal MIME encoded-word, which uses UTF-8 + Base64.

    Decoding the Base64 reveals the original characters.
    However, strict ASCII workflows (like mine) expose the mismatch because
    they do not automatically sanitize or re-encode the header.

    Hope this helps anyone else who wondered what that strange header meant.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From =?UTF-8?B?Li4ud8Khw7HCp8KxwqTDsQ==?=@winstonmvp@gmail.com to alt.comp.os.windows-10,news.software.readers,alt.comp.microsoft.windows on Thu Mar 12 00:08:57 2026
    From Newsgroup: news.software.readers

    On 3/11/2026 5:32 PM, Maria Sophia wrote:
    Question:
    Do Winston's headers cause charset issues for anyone else?
    Or just me?

    I am trying to understand something about how different newsreaders handle malformed headers because my home-grown "newsreader" has "problems" when responding to Winston's posts due to the way he formats his "FROM" header.
    From: ...w-i|#-o-#-n|# <winstonmvp@gmail.com>

    That line apparently contains non-ASCII characters in the display name:
    -i (U+00A1)
    |# (U+00F1)
    -o (U+00A7)
    -# (U+00B1)
    -n (U+00A4)
    another |# (U+00F1)

    w = standard lower case w keystroke
    -i = Alt 0161
    |# = Alt 0241
    -o = Alt 0167 or -o = Alt 21
    -# = Alt 0177
    -n = Alt 0164
    |# = Alt 0241

    All from one or more fonts available in Character Map.
    - I've come across other folks that use some available character codes
    that appear blank - just copy the code and paste into a field to meet
    the '*' required character entry.
    --
    ...w-i|#-o-#-n|#
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From John Hall@john@jhall.co.uk to alt.comp.os.windows-10,news.software.readers,alt.comp.software.thunderbird on Thu Mar 12 09:25:48 2026
    From Newsgroup: news.software.readers

    On 12/03/2026 06:16, Maria Sophia wrote:
    To add further value to what Carlos kindly tested using Thunderbird, apparently, those on Thunderbird see not this (which is what I see):
    From: ...w-i|#-o-#-n|#<winstonmvp@gmail.com>

    I'm using Thunderbird and I see exactly what you see. Maybe it's
    something to do with which fonts we have installed or with our Windows settings? (I'm using Windows 11 rather than Windows 10, but I doubt that
    would make any difference.)
    --
    John Hall

    You can divide people into two categories:
    those who divide people into two categories and those who don't
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Carlos E.R.@robin_listas@es.invalid to alt.comp.os.windows-10,news.software.readers,alt.comp.software.thunderbird on Thu Mar 12 11:53:14 2026
    From Newsgroup: news.software.readers

    On 2026-03-12 07:16, Maria Sophia wrote:
    To add further value to what Carlos kindly tested using Thunderbird, apparently, those on Thunderbird see not this (which is what I see):
    From: ...w-i|#-o-#-n|# <winstonmvp@gmail.com>
    Which, is comprised of...
    -i (U+00A1)
    |# (U+00F1)
    -o (U+00A7)
    -# (U+00B1)
    -n (U+00A4)
    another |# (U+00F1)

    But they actually see this instead (according to what Carlos reported):
    From: =?UTF-8?B?Li4ud8Khw7HCp8KxwqTDsQ==?=

    No, that's what I see when looking at the raw version. What I see in the editor or the message viewer is

    ...w-i|#-o-#-n|# <winstonmvp@gmail.com>

    and on a follow up is "On 2026-03-12 08:08, ...w-i|#-o-#-n|# wrote:"

    Notice that we are both using thunderbird, so what happens is
    coordinated. It is sent as mime, but displayed as normal utf text.

    That's on the header. The body is plain UTF, no need for any conversion.
    The header needs to be compatible with older software.
    --
    Cheers, Carlos.
    ESEfc-Efc+, EUEfc-Efc|;
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Maria Sophia@mariasophia@comprehension.com to alt.comp.os.windows-10,news.software.readers,alt.comp.software.thunderbird on Thu Mar 12 09:18:38 2026
    From Newsgroup: news.software.readers

    John Hall wrote:
    On 12/03/2026 06:16, Maria Sophia wrote:
    To add further value to what Carlos kindly tested using Thunderbird,
    apparently, those on Thunderbird see not this (which is what I see):
    From: ...wi+o#n+<winstonmvp@gmail.com>

    I'm using Thunderbird and I see exactly what you see. Maybe it's
    something to do with which fonts we have installed or with our Windows settings? (I'm using Windows 11 rather than Windows 10, but I doubt that would make any difference.)

    Thank you for clarifying what I misunderstood from Carlos' tests, which is
    that you see what I see which Winston has subsequently confirmed are alt
    codes he manually typed in to set his FROM Usenet header long ago using
    ...w = ...w (literal)
    i = Alt 0161 (Windows inserts byte A1 hexadecimal value)
    + = Alt 0241 (Windows inserts byte F1 hexadecimal value)
    o = Alt 0167 (Windows inserts byte A7 hexadecimal value)
    # = Alt 0177 (Windows inserts byte B1 hexadecimal value)
    n = Alt 0164 (Windows inserts byte A4 hexadecimal value)

    Those are all valid Windows Alt-codes, but the important detail is that
    they produce raw 8-bit bytes from the Windows-1252 (Latin-1) character set.

    I could be wrong as I never really understood this characters stuff, but
    a. They are not UTF-8
    b. They are not ASCII
    c. They are not MIME-encoded
    d. They are raw 8-bit bytes

    They're technically not allowed inside an email/Usenet header but that
    doesn't matter to me as what matters is my character set is consistent.

    Technically, these are valid Windows-1252 characters, but not valid UTF-8.
    a. Usenet headers must be 7-bit ASCII only
    b. Non-ASCII must be encoded as MIME encoded-words (RFC 2047)
    c. Raw 8-bit bytes are not allowed (although TB is permissive)

    But I don't care about any of that technical legality RFC stuff above.

    The reason I care to understand what is going on is that my self-built newsreader needs to deal with it so I needed to understand what others see.

    Thanks for confirming what I see Carlos has also confirmed, which is that
    you see in Thunderbird what I see in my newsreader which is "...wi+o#n+".
    From: ...wi+o#n+ <winstonmvp@gmail.com>

    In my newsreader flow, they're mojibake garbled text scrambled eggs that I
    need to figure out how best to deal with so that everyone sees ASCII text.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Maria Sophia@mariasophia@comprehension.com to alt.comp.os.windows-10,news.software.readers,alt.comp.software.thunderbird on Thu Mar 12 09:42:54 2026
    From Newsgroup: news.software.readers

    Carlos E.R. wrote:
    But they actually see this instead (according to what Carlos reported):
    From: =?UTF-8?B?Li4ud8Khw7HCp8KxwqTDsQ==?=

    No, that's what I see when looking at the raw version. What I see in the editor or the message viewer is

    ...wi+o#n+ <winstonmvp@gmail.com>

    and on a follow up is "On 2026-03-12 08:08, ...wi+o#n+ wrote:"

    Notice that we are both using thunderbird, so what happens is
    coordinated. It is sent as mime, but displayed as normal utf text.

    That's on the header. The body is plain UTF, no need for any conversion.
    The header needs to be compatible with older software.

    Hi Carlos,

    Thanks for correcting my misconception as I never really understood all
    this mojibake character-set interaction but now that Winston explained he
    is typing Windows Alt-codes, and after your clarification, I am scratching
    the surface at beginning to understand what is actually happening.

    It may be that Thunderbird *stores* or *shows* the header in MIME-encoded
    form when you view the raw source, but apparently Thunderbird does not MIME-encode Winston's display name when sending the message.

    I'm not using Thunderbird (and I changed the header to reflect that since
    TB users are on this thread) but it appears that in normal viewing mode, Thunderbird simply displays the raw 8-bit Windows-1252 bytes exactly as
    they appear:

    ...wi+o#n+ <winstonmvp@gmail.com>

    Which matches what I see on my end.

    Apparently Thunderbird is perfectly happy to accept those raw 8-bit bytes
    in the header, even though they are not valid UTF-8 and not legal ASCII.

    My own workflow is strict ASCII, so when those bytes get copied into my attribution line, I think what happens is some NNTP servers try to repair
    the mismatch and end up mangling my outgoing post, which is really the only reason I care (as I don't care to be a Usenet-rules enforcer by any means).

    So, to clarify, I think you & Winston are saying the behavior is:
    1. Winston types Windows11252 Alt-codes.
    2. Thunderbird displays them as-is in the UI.
    3. Thunderbird shows a MIME-encoded version only when viewing
    the raw message source.
    4. My ASCII-only workflow exposes the illegal bytes,
    which sometimes apparently triggers server rewrites (AFAICT)

    Thanks again for checking this from the Thunderbird side, as knowing how
    you see Winston's messages helps me figure out how to handle the mojibake.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From =?UTF-8?B?Li4ud8Khw7HCp8KxwqTDsQ==?=@winstonmvp@gmail.com to alt.comp.os.windows-10,news.software.readers,alt.comp.software.thunderbird on Thu Mar 12 11:15:25 2026
    From Newsgroup: news.software.readers

    On 3/12/2026 9:18 AM, Maria Sophia wrote:

    Thank you for clarifying what I misunderstood from Carlos' tests, which is that you see what I see which Winston has subsequently confirmed are alt codes he manually typed in to set his FROM Usenet header long ago using
    ...w = ...w (literal)
    -i = Alt 0161 (Windows inserts byte A1 hexadecimal value)
    |# = Alt 0241 (Windows inserts byte F1 hexadecimal value)
    -o = Alt 0167 (Windows inserts byte A7 hexadecimal value)
    -# = Alt 0177 (Windows inserts byte B1 hexadecimal value)
    -n = Alt 0164 (Windows inserts byte A4 hexadecimal value)

    No typing required.
    Character map, choose font that has desired character(for the above
    Arial works), double click character(places the character in the
    'Characters to copy field', repeat for balance of string, once string is complete, click on Copy. Paste wherever desired(Notepad is a good
    temporary storage point, if using in multiple other apps/programs.

    Thanks for confirming what I see Carlos has also confirmed, which is that
    you see in Thunderbird what I see in my newsreader which is "...w-i|#-o-#-n|#".
    From: ...w-i|#-o-#-n|# <winstonmvp@gmail.com>

    As noted earlier, this is what I see in Thunderbird's From
    column(Message list)
    <https://i.postimg.cc/BvbXZ8mv/Tbird-From-Column-01.jpg>
    The same naming is also seen in the Message pane's From field.
    - b/c its using the Address book contact form

    If wondering about the ... prefix, its a precedent for sorting on the
    From field(my posts appear at the top of an unthreaded sorted list)
    --
    ...w-i|#-o-#-n|#
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From MikeS@MikeS@fred.com to alt.comp.os.windows-10,news.software.readers,alt.comp.microsoft.windows on Thu Mar 12 18:24:43 2026
    From Newsgroup: news.software.readers

    On 12/03/2026 07:08, ...w-i|#-o-#-n|# wrote:
    On 3/11/2026 5:32 PM, Maria Sophia wrote:
    Question:
    Do Winston's headers cause charset issues for anyone else?
    Or just me?

    I am trying to understand something about how different newsreaders
    handle
    malformed headers because my home-grown "newsreader" has "problems" when
    responding to Winston's posts due to the way he formats his "FROM"
    header.
    -a From: ...w-i|#-o-#-n|# <winstonmvp@gmail.com>

    That line apparently contains non-ASCII characters in the display name:
    -a -i (U+00A1)
    -a |# (U+00F1)
    -a -o (U+00A7)
    -a -# (U+00B1)
    -a -n (U+00A4)
    -a another |# (U+00F1)

    w = standard lower case w keystroke
    -i = Alt 0161
    |# = Alt 0241
    -o = Alt 0167-a or -o = Alt 21
    -# = Alt 0177
    -n = Alt 0164
    |# = Alt 0241

    All from one or more fonts available in Character Map.
    -a- I've come across other folks that use some available character codes that appear blank - just copy the code and paste into a field to meet
    the '*' required character entry.


    I also see your name as ...w-i|#-o-#-n|# (in Betterbird). It doesn't bother me unduly but it has puzzled me for a while. May I ask what you are doing
    and why not simply use winston as in your email address?
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Stan Brown@someone@example.com to alt.comp.os.windows-10,news.software.readers on Thu Mar 12 12:42:25 2026
    From Newsgroup: news.software.readers

    On Thu, 12 Mar 2026 02:09:02 +0100, Carlos E.R. wrote:
    Asking TB to produce the raw message, it comes as

    From: =?UTF-8?B?Li4ud8Khw7HCp8KxwqTDsQ==?= <...>

    Which is legal, obviously. (Reasoning: if TB does it, then it is legal)


    I disagree with that if-then statement. It assumes that all relevant
    standards have been followed accurately and in full, with no bugs.
    I'll leave it as an exercise for the reader to decide how many zeroes
    are needed to express the probability of that.
    --
    "The power of accurate observation is frequently called cynicism by
    those who don't have it." --George Bernard Shaw
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From =?UTF-8?B?Li4ud8Khw7HCp8KxwqTDsQ==?=@winstonmvp@gmail.com to alt.comp.os.windows-10,news.software.readers,alt.comp.microsoft.windows on Thu Mar 12 22:45:58 2026
    From Newsgroup: news.software.readers

    On 3/12/2026 11:24 AM, MikeS wrote:
    On 12/03/2026 07:08, ...w-i|#-o-#-n|# wrote:
    On 3/11/2026 5:32 PM, Maria Sophia wrote:
    Question:
    Do Winston's headers cause charset issues for anyone else?
    Or just me?

    I am trying to understand something about how different newsreaders
    handle
    malformed headers because my home-grown "newsreader" has "problems" when >>> responding to Winston's posts due to the way he formats his "FROM"
    header.
    -a From: ...w-i|#-o-#-n|# <winstonmvp@gmail.com>

    That line apparently contains non-ASCII characters in the display name:
    -a -i (U+00A1)
    -a |# (U+00F1)
    -a -o (U+00A7)
    -a -# (U+00B1)
    -a -n (U+00A4)
    -a another |# (U+00F1)

    w = standard lower case w keystroke
    -i = Alt 0161
    |# = Alt 0241
    -o = Alt 0167-a or -o = Alt 21
    -# = Alt 0177
    -n = Alt 0164
    |# = Alt 0241

    All from one or more fonts available in Character Map.
    -a-a- I've come across other folks that use some available character
    codes that appear blank - just copy the code and paste into a field to
    meet the '*' required character entry.


    I also see your name as ...w-i|#-o-#-n|# (in Betterbird). It doesn't bother me
    unduly but it has puzzled me for a while. May I ask what you are doing
    and why not simply use winston as in your email address?

    Have used that form for nntp and signature since 1998
    Html nntp, Text nntp[1], private nntp groups, private list servers,
    private web groups, blogging...

    [1] text nntp(e.g. Eternal Sept. like servers - no HTML formatting composition) users are the only source where questions, criticism,
    comments occur...but less than 5% of where 'it's' being used.

    <g>Before 1998, the nomenclature was slightly longer
    => W-i|#-o-#-n|#-4|u|f|||#g|<|2
    --
    ...w-i|#-o-#-n|#
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Maria Sophia@mariasophia@comprehension.com to alt.comp.os.windows-10,news.software.readers,alt.comp.software.thunderbird on Fri Mar 13 01:49:02 2026
    From Newsgroup: news.software.readers

    ...wi+o#n+ wrote:
    On 3/12/2026 11:24 AM, MikeS wrote:
    On 12/03/2026 07:08, ...wi+o#n+ wrote:
    On 3/11/2026 5:32 PM, Maria Sophia wrote:
    Question:
    Do Winston's headers cause charset issues for anyone else?
    Or just me?

    I am trying to understand something about how different newsreaders
    handle
    malformed headers because my home-grown "newsreader" has "problems" when >>>> responding to Winston's posts due to the way he formats his "FROM"
    header.
    a From: ...wi+o#n+ <winstonmvp@gmail.com>

    That line apparently contains non-ASCII characters in the display name: >>>> a i (U+00A1)
    a + (U+00F1)
    a o (U+00A7)
    a # (U+00B1)
    a n (U+00A4)
    a another + (U+00F1)

    w = standard lower case w keystroke
    i = Alt 0161
    + = Alt 0241
    o = Alt 0167a or o = Alt 21
    # = Alt 0177
    n = Alt 0164
    + = Alt 0241

    All from one or more fonts available in Character Map.
    aa- I've come across other folks that use some available character
    codes that appear blank - just copy the code and paste into a field to
    meet the '*' required character entry.


    I also see your name as ...wi+o#n+ (in Betterbird). It doesn't bother me
    unduly but it has puzzled me for a while. May I ask what you are doing
    and why not simply use winston as in your email address?

    Have used that form for nntp and signature since 1998
    Html nntp, Text nntp[1], private nntp groups, private list servers,
    private web groups, blogging...

    [1] text nntp(e.g. Eternal Sept. like servers - no HTML formatting composition) users are the only source where questions, criticism,
    comments occur...but less than 5% of where 'it's' being used.

    <g>Before 1998, the nomenclature was slightly longer
    => Wi+o#n+4+#<>gEo


    --
    ...wi+o#n+


    Hi Winston,

    You compressed a lot of data in the sentence about nntp servers
    "text nntp (e.g. Eternal Sept. like servers - no HTML formatting
    composition) users are the only source where questions, criticism,
    comments occur... but less than 5% of where 'it's' being used."

    I think you're trying to say:
    a. You use the decorated name everywhere
    (which were even more decorated in the past)
    b. Only users on text-only NNTP servers seem to raise issues
    c. And those places are only a small percentage of where you post

    Let me be clear I'm not complaining. I'm simply adapting.
    Your CP1252 doesn't bother me. I'm just adapting to deal with it.
    That's the only reason I care.

    People have complained *to me* that my responses have mojibake in them.
    So I'm trying to fix that problem *for them*.

    One way of doing so is that I've been testing different headers.

    I'm not sure why yet, but when I respond to your posts, the nntp server
    added mojibake seems to happen more when I have my outgoing headers set to
    Content-Type: text/plain; charset=UTF-8
    Content-Transfer-Encoding: 8bit

    Than when I set the outgoing headers to
    Content-Type: text/plain; charset=US-ASCII
    Content-Transfer-Encoding: 7bit

    The original email and NNTP standards (RFC 822, later 2822, now 5322)
    require headers like From: to contain only ASCII characters. Given that,
    it's likely some newsreaders treat them as Latin-1, some as CP1252, some as UTF-8, and some might choke entirely.

    Before I delve into understanding why I see more mojibake when I reply to
    you in UTF-8 headers and less mojibake when I reply to you using US-ASCII headers, in this thread, I simply wanted to know what other people see
    (which we've confirmed, is what I see).

    Since your display name is not UTF-8, maybe that's why the UTF-8 headers
    cause more mojibake downstream? Dunno yet. It depends on what the nntp
    server does. Some might try to re-interpret the byts as ISO-8859-1 or
    CP1252, which is why maybe sometimes my headers get munged by servers.

    Others like Carlos & Andy have noticed that, when I didn't do it.
    So I can only assume the server did it. Because I don't change it.

    I have no good idea why US-ASCII headers behave better after passing
    through nntp servers.

    Maybe ASCII mode prevents the server from trying to be helpful?
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From MikeS@MikeS@fred.com to alt.comp.os.windows-10,news.software.readers,alt.comp.microsoft.windows on Fri Mar 13 08:50:26 2026
    From Newsgroup: news.software.readers

    On 13/03/2026 05:45, ...w-i|#-o-#-n|# wrote:
    On 3/12/2026 11:24 AM, MikeS wrote:
    On 12/03/2026 07:08, ...w-i|#-o-#-n|# wrote:
    On 3/11/2026 5:32 PM, Maria Sophia wrote:
    Question:
    Do Winston's headers cause charset issues for anyone else?
    Or just me?

    I am trying to understand something about how different newsreaders
    handle
    malformed headers because my home-grown "newsreader" has "problems"
    when
    responding to Winston's posts due to the way he formats his "FROM"
    header.
    -a From: ...w-i|#-o-#-n|# <winstonmvp@gmail.com>

    That line apparently contains non-ASCII characters in the display name: >>>> -a -i (U+00A1)
    -a |# (U+00F1)
    -a -o (U+00A7)
    -a -# (U+00B1)
    -a -n (U+00A4)
    -a another |# (U+00F1)

    w = standard lower case w keystroke
    -i = Alt 0161
    |# = Alt 0241
    -o = Alt 0167-a or -o = Alt 21
    -# = Alt 0177
    -n = Alt 0164
    |# = Alt 0241

    All from one or more fonts available in Character Map.
    -a-a- I've come across other folks that use some available character
    codes that appear blank - just copy the code and paste into a field
    to meet the '*' required character entry.


    I also see your name as ...w-i|#-o-#-n|# (in Betterbird). It doesn't bother >> me unduly but it has puzzled me for a while. May I ask what you are
    doing and why not simply use winston as in your email address?

    Have used that form for nntp and signature since 1998
    Html nntp, Text nntp[1], private nntp groups, private list servers,
    private web groups, blogging...

    [1] text nntp(e.g. Eternal Sept. like servers - no HTML formatting composition) users are the only source where questions, criticism,
    comments occur...but less than 5% of where 'it's' being used.

    <g>Before 1998, the nomenclature was slightly longer
    -a =>-a W-i|#-o-#-n|#-4|u|f|||#g|<|2


    I guess the answer to my question is that you want to be different. You certainly succeeded as I have never seen any other email or usenet user emulate you. In fact when other usenet users want to refer to your
    comments in a thread they mostly type "winston". Its easier and makes
    more sense.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Maria Sophia@mariasophia@comprehension.com to alt.comp.os.windows-10,news.software.readers,alt.comp.software.thunderbird on Fri Mar 13 02:27:57 2026
    From Newsgroup: news.software.readers

    Maria Sophia wrote:
    People have complained *to me* that my responses have mojibake in them.
    So I'm trying to fix that problem *for them*.

    Delving deeper in thought...

    Given RFC 5322 says headers must be ASCII unless MIME-encoded, others have pointed out Big-5 & ISO-8859-1 sometimes gets inserted into my headers.

    I don't add that. I can't add them. They're not in my dictionaries.
    So "something else" must be adding them. But what?

    I never really understood character encoding, and I've said so many times.
    But I wonder if what's happening is possibly
    1. The "From:" display name contains raw CP1252 bytes
    2. Which are not valid UTF-8
    3. Where, if my outgoing message declares "charset=UTF-8"
    4. Maybe some NNTP servers might respond by trying to be helpful
    5. One way being by slapping a different charset label on the header
    Given... these CP1252 bytes (0xA1, 0xA7, 0xB1, 0xF1) are
    a. illegal in UTF-8
    b. legal in ISO-8859-1
    c. also legal byte patterns in Big-5
    Maybe that's where some of my responses get ISO-8859-1 or Big-5 headers?

    Maybe... given UTF-8 is not ASCII, but ASCII is valid UTF-8...
    i. Declaring UTF-8 forces some nntp servers to validate all bytes.
    ii. But CP1252 bytes are illegal in UTF-8
    iii. Where UTF-8 replies trigger more server 'helpfulness'

    An interesting related aside is that... for
    I. 0xA1 is not a valid UTF-8 start byte
    II. 0xF1 is a valid UTF-8 start byte,
    but only if followed by 0x80-0xBF, which it isn't
    III. 0xA7 is illegal as a UTF-8 start byte
    IV. 0xB1 is illegal as a UTF-8 start byte
    V. 0xA4 is illegal as a UTF-8 start byte
    VI. 0xF1 is a valid UTF-8 start byte,
    but only if followed by 0x80-0xBF, which it isn't

    The RFC-correct solution would be:
    From: =?UTF-8?Q?W=C2=A1=C3=B1=C2=A7=C2=B1=C2=A4=C3=B1=C2=AC=C3=96=C3=9F=C3=B3=C3=B2g=C3=AE=C3=AB?= <...>
    But that's ugly.

    Using Wi+o#n+4+#<>gEo would be even more so, given
    VII. 0xAC is illegal as a UTF-8 start byte
    VIII. 0xD6 is a valid start byte only if followed by continuation byte
    And so on, where the "W" in Wi+o#n+ and the "g" in +#<>gEo are the only
    bytes in that entire (pre 1988) decorated name that is both ASCII and valid UTF-8. Everything else is raw CP1252.

    The UTF-8 version of the whole name would be:
    57 C2 A1 C3 B1 C2 A7 C2 B1 C2 A4 C3 B1 C2 AC C3 96 C3 9F C3 B3 C3 B2 67 C3 AE C3 AB

    But all this is only meaningful if it causes downstream issues,
    where I think simply switching my headers to ASCII solved the
    mojibake that Andy, Carlos and others asked me to try to fix.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Maria Sophia@mariasophia@comprehension.com to alt.comp.os.windows-10,news.software.readers,alt.comp.software.thunderbird on Fri Mar 13 02:38:35 2026
    From Newsgroup: news.software.readers

    THIS IS A TEST. IT'S AN EXACT COPY OF THE PREVIOUS POST.
    THE ONLY DIFFERENCE IS THIS HAS UTF-8 DECLARED IN THE HEADER. NOT ASCII.
    DO YOU SEE THE SAME OUTPUT or DO YOU SEE IT DIFFERENTLY?

    Delving deeper in thought...

    Given RFC 5322 says headers must be ASCII unless MIME-encoded, others have pointed out Big-5 & ISO-8859-1 sometimes gets inserted into my headers.

    I don't add that. I can't add them. They're not in my dictionaries.
    So "something else" must be adding them. But what?

    I never really understood character encoding, and I've said so many times.
    But I wonder if what's happening is possibly
    1. The "From:" display name contains raw CP1252 bytes
    2. Which are not valid UTF-8
    3. Where, if my outgoing message declares "charset=UTF-8"
    4. Maybe some NNTP servers might respond by trying to be helpful
    5. One way being by slapping a different charset label on the header
    Given... these CP1252 bytes (0xA1, 0xA7, 0xB1, 0xF1) are
    a. illegal in UTF-8
    b. legal in ISO-8859-1
    c. also legal byte patterns in Big-5
    Maybe that's where some of my responses get ISO-8859-1 or Big-5 headers?

    Maybe... given UTF-8 is not ASCII, but ASCII is valid UTF-8...
    i. Declaring UTF-8 forces some nntp servers to validate all bytes.
    ii. But CP1252 bytes are illegal in UTF-8
    iii. Where UTF-8 replies trigger more server 'helpfulness'

    An interesting related aside is that... for
    I. 0xA1 is not a valid UTF-8 start byte
    II. 0xF1 is a valid UTF-8 start byte,
    but only if followed by 0x80-0xBF, which it isn't
    III. 0xA7 is illegal as a UTF-8 start byte
    IV. 0xB1 is illegal as a UTF-8 start byte
    V. 0xA4 is illegal as a UTF-8 start byte
    VI. 0xF1 is a valid UTF-8 start byte,
    but only if followed by 0x80-0xBF, which it isn't

    The RFC-correct solution would be:
    From: =?UTF-8?Q?W=C2=A1=C3=B1=C2=A7=C2=B1=C2=A4=C3=B1=C2=AC=C3=96=C3=9F=C3=B3=C3=B2g=C3=AE=C3=AB?=
    <...>
    But that's ugly.

    Using Wi+o#n+4+#<>gEo would be even more so, given
    VII. 0xAC is illegal as a UTF-8 start byte
    VIII. 0xD6 is a valid start byte only if followed by continuation byte
    And so on, where the "W" in Wi+o#n+ and the "g" in +#<>gEo are the only
    bytes in that entire (pre 1988) decorated name that is both ASCII and valid UTF-8. Everything else is raw CP1252.

    The UTF-8 version of the whole name would be:
    57 C2 A1 C3 B1 C2 A7 C2 B1 C2 A4 C3 B1 C2 AC C3 96 C3 9F C3 B3 C3 B2 67 C3
    AE C3 AB

    But all this is only meaningful if it causes downstream issues,
    where I think simply switching my headers to ASCII solved the
    mojibake that Andy, Carlos and others asked me to try to fix.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Dave Royal@dave@dave123royal.com to alt.comp.os.windows-10,news.software.readers,alt.comp.software.thunderbird on Fri Mar 13 10:02:54 2026
    From Newsgroup: news.software.readers

    Maria Sophia <mariasophia@comprehension.com> Wrote in message:

    Maria Sophia wrote:
    People have complained *to me* that my responses have mojibake in them.
    So I'm trying to fix that problem *for them*.

    Delving deeper in thought...

    Given RFC 5322 says headers must be ASCII unless MIME-encoded, others have pointed out Big-5 & ISO-8859-1 sometimes gets inserted into my headers.

    I don't add that. I can't add them. They're not in my dictionaries.
    So "something else" must be adding them. But what?

    I never really understood character encoding, and I've said so many times. But I wonder if what's happening is possibly
    1. The "From:" display name contains raw CP1252 bytes
    2. Which are not valid UTF-8
    3. Where, if my outgoing message declares "charset=UTF-8"
    4. Maybe some NNTP servers might respond by trying to be helpful
    5. One way being by slapping a different charset label on the header
    Given... these CP1252 bytes (0xA1, 0xA7, 0xB1, 0xF1) are
    a. illegal in UTF-8
    b. legal in ISO-8859-1
    c. also legal byte patterns in Big-5
    Maybe that's where some of my responses get ISO-8859-1 or Big-5 headers?

    Maybe... given UTF-8 is not ASCII, but ASCII is valid UTF-8...
    i. Declaring UTF-8 forces some nntp servers to validate all bytes.
    ii. But CP1252 bytes are illegal in UTF-8
    iii. Where UTF-8 replies trigger more server 'helpfulness'

    An interesting related aside is that... for
    I. 0xA1 is not a valid UTF-8 start byte
    II. 0xF1 is a valid UTF-8 start byte,
    but only if followed by 0x80-0xBF, which it isn't
    III. 0xA7 is illegal as a UTF-8 start byte
    IV. 0xB1 is illegal as a UTF-8 start byte
    V. 0xA4 is illegal as a UTF-8 start byte
    VI. 0xF1 is a valid UTF-8 start byte,
    but only if followed by 0x80-0xBF, which it isn't

    The RFC-correct solution would be:
    From: =?UTF-8?Q?W=C2=A1=C3=B1=C2=A7=C2=B1=C2=A4=C3=B1=C2=AC=C3=96=C3=9F=C3=B3=C3=B2g=C3=AE=C3=AB?= <...>
    But that's ugly.

    Using W-i|#-o-#-n|#-4|u|f|||#g|<|2 would be even more so, given
    VII. 0xAC is illegal as a UTF-8 start byte
    VIII. 0xD6 is a valid start byte only if followed by continuation byte
    And so on, where the "W" in W-i|#-o-#-n|# and the "g" in |u|f|||#g|<|2 are the only
    bytes in that entire (pre 1988) decorated name that is both ASCII and valid UTF-8. Everything else is raw CP1252.

    The UTF-8 version of the whole name would be:
    57 C2 A1 C3 B1 C2 A7 C2 B1 C2 A4 C3 B1 C2 AC C3 96 C3 9F C3 B3 C3 B2 67 C3 AE C3 AB

    But all this is only meaningful if it causes downstream issues,
    where I think simply switching my headers to ASCII solved the
    mojibake that Andy, Carlos and others asked me to try to fix.


    I wrote a newsreader a few years ago, in Python. Python had a
    module to decode headers encoded as in RFC2047; this one I
    think:
    <https://docs.python.org/3/library/email.header.html> <https://www.ietf.org/rfc/rfc2047>

    I didn't bother to detect /whether/ the headers had encoded words,
    I decoded everything in case it did. (I've seen several encoded
    words in different encodings in a single header field.)

    In your case it sounds like you need an encoder as well as a
    decoder. If there aren't such modules in whatever your system is
    written in, you could write one. Perhaps a sub-process written in
    Python: pass it the raw header and it returns it in unicode. And
    vice versa to encode it.
    --
    Remove numerics from my email address.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Maria Sophia@mariasophia@comprehension.com to alt.comp.os.windows-10,news.software.readers,alt.comp.software.thunderbird on Fri Mar 13 03:04:02 2026
    From Newsgroup: news.software.readers

    ...wi+o#n+ wrote:
    On 3/12/2026 9:18 AM, Maria Sophia wrote:

    Thank you for clarifying what I misunderstood from Carlos' tests, which is >> that you see what I see which Winston has subsequently confirmed are alt
    codes he manually typed in to set his FROM Usenet header long ago using
    ...w = ...w (literal)
    i = Alt 0161 (Windows inserts byte A1 hexadecimal value)
    + = Alt 0241 (Windows inserts byte F1 hexadecimal value)
    o = Alt 0167 (Windows inserts byte A7 hexadecimal value)
    # = Alt 0177 (Windows inserts byte B1 hexadecimal value)
    n = Alt 0164 (Windows inserts byte A4 hexadecimal value)

    No typing required.
    Character map, choose font that has desired character(for the above
    Arial works), double click character(places the character in the
    'Characters to copy field', repeat for balance of string, once string is complete, click on Copy. Paste wherever desired(Notepad is a good
    temporary storage point, if using in multiple other apps/programs.

    Thanks for confirming what I see Carlos has also confirmed, which is that
    you see in Thunderbird what I see in my newsreader which is "...wi+o#n+".
    From: ...wi+o#n+ <winstonmvp@gmail.com>

    As noted earlier, this is what I see in Thunderbird's From
    column(Message list)
    <https://i.postimg.cc/BvbXZ8mv/Tbird-From-Column-01.jpg>
    The same naming is also seen in the Message pane's From field.
    - b/c its using the Address book contact form

    If wondering about the ... prefix, its a precedent for sorting on the
    From field(my posts appear at the top of an unthreaded sorted list)


    --
    ...wi+o#n+

    Hi Winston,

    I just ran a test in this thread by sending the exact SAME message twice.
    The body contained this font-salad experimental sequence:
    Wi+o#n+4+#<>gEo

    I don't really know how others saw the message, but from my perspective, a server in my PATH seems to have tried to guess the charset and attempted to "repair" the invalid sequences making the whole message resemble a
    typography crime scene.

    When my outgoing headers were:
    Content-Type: text/plain; charset=US-ASCII
    Content-Transfer-Encoding: 7bit
    Your raw CP1252 bytes came back intact through the entire path.

    I've never understood this charset negotiation stuff, but apparently ASCII mode seems to make the servers in my PATH treat the message as "hands off"
    so the 8-bit bytes in the body pass through unchanged.

    At least for my PATH, US-ASCII prevented the downstream-server's charset guessing algorithms having a panic attack.

    I have no idea how the test showed up for others using TB or the like.

    But when I ran the same test but with my outgoing headers being:
    Content-Type: text/plain; charset=UTF-8
    Content-Transfer-Encoding: 8bit
    Those same CP1252 bytes were mangled and truncated & the body was
    reformatted into a hieroglyphics sampler platter somewhere along the PATH.

    I never understood this charset therapist stuff, but apparently a confused font engine in a server in my PATH tried to interpret the non-UTF-8 bytes
    as if they were UTF-8, failed validation, and then the server apparently substituted cut-and-paste collage characters from another Unicode block.

    The result was that the whole body ended up looking like a ransom note.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Maria Sophia@mariasophia@comprehension.com to alt.comp.os.windows-10,news.software.readers,alt.comp.software.thunderbird on Fri Mar 13 03:28:39 2026
    From Newsgroup: news.software.readers

    Dave Royal wrote:
    I wrote a newsreader a few years ago, in Python. Python had a
    module to decode headers encoded as in RFC2047; this one I
    think:
    <https://docs.python.org/3/library/email.header.html> <https://www.ietf.org/rfc/rfc2047>

    I didn't bother to detect /whether/ the headers had encoded words,
    I decoded everything in case it did. (I've seen several encoded
    words in different encodings in a single header field.)

    In your case it sounds like you need an encoder as well as a
    decoder. If there aren't such modules in whatever your system is
    written in, you could write one. Perhaps a sub-process written in
    Python: pass it the raw header and it returns it in unicode. And
    vice versa to encode it.


    Hi Dave,

    Thanks for the explanation. And purposefully helpful encoding advice.
    Some of this stuff only gets learned from experience like you have.

    Me?
    I've never understood character encoding. Everything I know has been
    learned the hard way, usually by staring at mojibake and trying to guess
    what the server thought it saw.

    Your point about needing an encoder as well as a decoder makes total
    sense. If I want the servers in my path to stop playing "guess the
    charset" with my messages, I need to hand them something that does not
    invite creative interpretation.

    I should mention that I already often run my outgoing body through a
    Notepad++ cleanup macro (shortcuts.xml). It is a monster of a file that
    tries to normalize everything to plain ASCII. It is over a thousand
    lines long, and at this point it has more comments than actual code.
    Newsgroups: alt.comp.os.windows-10,alt.comp.os.windows-11,alt.comp.microsoft.windows
    Subject: Tutorial: Notepad++ shortcuts.xml macro converts unicode to the 95-keyboard ASCII characters
    Date: Thu, 11 Dec 2025 09:42:48 -0700
    Message-ID: <10hesa9$2q1m$1@nnrp.usenet.blueworldhosting.com>

    It does a great job normalizing pasted text from a variety of sources,
    but it does not do RFC2047 header encoding. So it cleans the body,
    but the headers still go out wearing whatever 8-bit bytes they inherited.

    In addition, since Chromium browsers screw up Linux/macOS/Windows
    clipboards far more than Mozilla browsers do, and in invisible but telling ways, I've benefited from working with the Mozilla cross-platform team to identify how to remove those nastly CF_HTML fragments in desktop clipbrds.
    Newsgroups: alt.comp.software.firefox,comp.sys.mac.system,alt.os.linux
    Subject: PSA: Clipboard differences between Chromium & Firefox across platforms
    Date: Thu, 12 Feb 2026 15:26:32 -0500
    Message-ID: <10mld1o$1910$1@nnrp.usenet.blueworldhosting.com>

    So it has been a long rocky road indeed, where it's nice to know you've
    taken that path less traveled by yourself, and learned from the trek.

    If you can look at my test and let me know what YOU see, that would be
    helpful as I sent the same message twice, with the only difference being I changed the header from declaring ASCII to UTF-8 and the results came back
    in the latter exact message looking like a ransom note as a result.

    If you see what I saw, then that kind of proves the servers in the PATH are reacting to Winston's font-soup character set of "Wi+o#n+4+#<>gEo".

    Luckily, declaring everything is US-ASCII seems to should stop the servers
    from trying to "repair" my messages. Declaring UTF-8, so far, has produced output that looks like a charset-guessing algorithm having a midlife
    crisis.

    The reason it matters to me is other people have asked me to *fix it*.
    But I can't fix it if I don't understand it.

    And I'm the 1st to openly publicly & meekly claim I don't understand it.

    Hence I appreciate your insight, especially as you've been there done that.
    It helps to hear from someone who has already fought this & survived.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Carlos E.R.@robin_listas@es.invalid to alt.comp.os.windows-10,news.software.readers,alt.comp.software.thunderbird on Fri Mar 13 13:42:07 2026
    From Newsgroup: news.software.readers

    On 2026-03-12 17:18, Maria Sophia wrote:
    John Hall wrote:
    On 12/03/2026 06:16, Maria Sophia wrote:
    To add further value to what Carlos kindly tested using Thunderbird,
    apparently, those on Thunderbird see not this (which is what I see):
    From: ...w-i|#-o-#-n|#<winstonmvp@gmail.com>

    I'm using Thunderbird and I see exactly what you see. Maybe it's
    something to do with which fonts we have installed or with our Windows
    settings? (I'm using Windows 11 rather than Windows 10, but I doubt that
    would make any difference.)

    Thank you for clarifying what I misunderstood from Carlos' tests, which is that you see what I see which Winston has subsequently confirmed are alt codes he manually typed in to set his FROM Usenet header long ago using
    ...w = ...w (literal)
    -i = Alt 0161 (Windows inserts byte A1 hexadecimal value)
    |# = Alt 0241 (Windows inserts byte F1 hexadecimal value)
    -o = Alt 0167 (Windows inserts byte A7 hexadecimal value)
    -# = Alt 0177 (Windows inserts byte B1 hexadecimal value)
    -n = Alt 0164 (Windows inserts byte A4 hexadecimal value)

    Those are all valid Windows Alt-codes, but the important detail is that
    they produce raw 8-bit bytes from the Windows-1252 (Latin-1) character set.

    I could be wrong as I never really understood this characters stuff, but
    a. They are not UTF-8
    b. They are not ASCII
    c. They are not MIME-encoded
    d. They are raw 8-bit bytes

    Huh, no. They were typed as 8-bit bytes from Latin-1 charset at some
    point in time, but today they are UTF-8. UTF in the body, and as MIME in
    the header.

    You said it yourself in another post:

    The valid format is:

    =?charset?encoding?encoded-text?=

    Hence, if we break Winston's header down:

    =?UTF-8?B?Li4ud8Khw7HCp8KxwqTDsQ==?=
    | | | |
    | | | +-- Base64 text
    | | +------------------------ Encoding type ("B" = Base64)
    | +-------------------------- Character set (UTF-8)
    +-------------------------------- Begin encoded-word

    The Base64 portion is:

    Li4ud8Khw7HCp8KxwqTDsQ==

    Decoding that Base64 string yields the UTF-8 text:

    ...w-i|#-o-#-n|#
    --
    Cheers, Carlos.
    ESEfc-Efc+, EUEfc-Efc|;
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Carlos E.R.@robin_listas@es.invalid to alt.comp.os.windows-10,news.software.readers,alt.comp.software.thunderbird on Fri Mar 13 13:47:24 2026
    From Newsgroup: news.software.readers

    On 2026-03-12 17:42, Maria Sophia wrote:
    Carlos E.R. wrote:
    But they actually see this instead (according to what Carlos reported):
    From: =?UTF-8?B?Li4ud8Khw7HCp8KxwqTDsQ==?=

    No, that's what I see when looking at the raw version. What I see in the
    editor or the message viewer is

    ...w-i|#-o-#-n|# <winstonmvp@gmail.com>

    and on a follow up is "On 2026-03-12 08:08, ...w-i|#-o-#-n|# wrote:"

    Notice that we are both using thunderbird, so what happens is
    coordinated. It is sent as mime, but displayed as normal utf text.

    That's on the header. The body is plain UTF, no need for any conversion.
    The header needs to be compatible with older software.

    Hi Carlos,

    Thanks for correcting my misconception as I never really understood all
    this mojibake character-set interaction but now that Winston explained he
    is typing Windows Alt-codes, and after your clarification, I am scratching the surface at beginning to understand what is actually happening.

    It may be that Thunderbird *stores* or *shows* the header in MIME-encoded form when you view the raw source, but apparently Thunderbird does not MIME-encode Winston's display name when sending the message.

    I'm not using Thunderbird (and I changed the header to reflect that since
    TB users are on this thread) but it appears that in normal viewing mode, Thunderbird simply displays the raw 8-bit Windows-1252 bytes exactly as
    they appear:

    No, TB displays UTF-8. At least here, all the computer uses UTF-8.


    ...w-i|#-o-#-n|# <winstonmvp@gmail.com>

    Which matches what I see on my end.

    Apparently Thunderbird is perfectly happy to accept those raw 8-bit bytes
    in the header, even though they are not valid UTF-8 and not legal ASCII.

    The header is MIME encoded.


    My own workflow is strict ASCII, so when those bytes get copied into my attribution line, I think what happens is some NNTP servers try to repair
    the mismatch and end up mangling my outgoing post, which is really the only reason I care (as I don't care to be a Usenet-rules enforcer by any means).

    So, to clarify, I think you & Winston are saying the behavior is:
    1. Winston types Windows11252 Alt-codes.
    2. Thunderbird displays them as-is in the UI.
    3. Thunderbird shows a MIME-encoded version only when viewing
    the raw message source.
    4. My ASCII-only workflow exposes the illegal bytes,
    which sometimes apparently triggers server rewrites (AFAICT)

    Thanks again for checking this from the Thunderbird side, as knowing how
    you see Winston's messages helps me figure out how to handle the mojibake.
    --
    Cheers, Carlos.
    ESEfc-Efc+, EUEfc-Efc|;
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Carlos E.R.@robin_listas@es.invalid to alt.comp.os.windows-10,news.software.readers,alt.comp.software.thunderbird on Fri Mar 13 13:57:02 2026
    From Newsgroup: news.software.readers

    On 2026-03-13 10:38, Maria Sophia wrote:
    THIS IS A TEST. IT'S AN EXACT COPY OF THE PREVIOUS POST.
    THE ONLY DIFFERENCE IS THIS HAS UTF-8 DECLARED IN THE HEADER. NOT ASCII.
    DO YOU SEE THE SAME OUTPUT or DO YOU SEE IT DIFFERENTLY?

    ...


    The RFC-correct solution would be:
    From: =?UTF-8?Q?W=C2=A1=C3=B1=C2=A7=C2=B1=C2=A4=C3=B1=C2=AC=C3=96=C3=9F=C3=B3=C3=B2g=C3=AE=C3=AB?=
    <...>
    But that's ugly.

    Using WN+++o#nN++N++N++N++N++gN++N++ would be even more so, given
    --------************

    This text arrives corrupted. In the other post they are legible. It is declared as UTF-8, but I guess it is not actually all valid UTF-8.
    --
    Cheers, Carlos.
    ESEfc-Efc+, EUEfc-Efc|;
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Maria Sophia@mariasophia@comprehension.com to alt.comp.os.windows-10,news.software.readers,alt.comp.software.thunderbird on Fri Mar 13 14:11:42 2026
    From Newsgroup: news.software.readers

    Carlos E.R. wrote:
    I could be wrong as I never really understood this characters stuff, but
    a. They are not UTF-8
    b. They are not ASCII
    c. They are not MIME-encoded
    d. They are raw 8-bit bytes

    Huh, no. They were typed as 8-bit bytes from Latin-1 charset at some
    point in time, but today they are UTF-8. UTF in the body, and as MIME in
    the header.

    You said it yourself in another post:

    Hi Carlos,

    I agree. I apologize for the flip flop indecision. I don't know what's
    going on, as I'm only trying to fix the trouble Wi+o#n+4+#<>gEo creates.

    I will endlessly admit I never understood this charset stuff, and I will
    point out that the only reason I even care is you and others asked me to
    fix the problems that sometimes my posts look like a Chinese jigsaw puzzle.

    Since I don't mess with the characters, something else is messing with the characters, where a test in this very thread shows that when I use headers
    Content-Type: text/plain; charset=US-ASCII
    Content-Transfer-Encoding: 7bi t
    Then Wi+o#n+4+#<>gEo remains Wi+o#n+4+#<>gEo

    But when I use headers
    Content-Type: text/plain; charset=UTF-8
    Content-Transfer-Encoding: 8bi t
    Then Wi+o#n+4+#<>gEo turns the entire post into a ransom note.

    Usenet (NNTP) follows email header rules (RFC 5322 + RFC 2047):
    a. The body may be UTF-8, if declared.
    b. Headers cannot contain raw 8-bit bytes.
    c. Hence, non-ASCII characters must be encoded using MIME encoded-words
    From: =?UTF-8?Q?W=C2=A1=C3=B1=C2=A7=C2=B1=C2=A4=C3=B1?= <winston@example.com>

    Given Winston's "FROM:" header has those characters, which are not ASCII,
    all I can say is that they're not valid characters for *headers*, unless they're MIME encoded. Are they Mime-encoded? I don't know. I don't see it.

    As you said, I belatedly realized Winston's characters are valid Unicode
    and valid UTF-8 but they appear in a header, apparently without required
    MIME encoding when Usenet servers are allowed to mangle or reject 8-bit
    header bytes. When I respond, the attribute line contains Wi+o#n+4+#<>gEo

    What I'm trying to figure out is why my body gets mangled because the attribution line contains raw Latin-1 bytes, but my outgoing headers
    declare UTF-8, so I think a server in the path re-encodes the body and corrupts it. But I'm not really sure what is causing the mojibake. .
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Maria Sophia@mariasophia@comprehension.com to alt.comp.os.windows-10,news.software.readers,alt.comp.software.thunderbird on Fri Mar 13 14:40:48 2026
    From Newsgroup: news.software.readers

    Carlos E.R. wrote:
    This text arrives corrupted. In the other post they are legible. It is declared as UTF-8, but I guess it is not actually all valid UTF-8.

    Hi Carlos,

    Thanks for confirming the test results, as I can't do that by myself.

    We both saw the same thing happen.
    a. + in Latin-1 is apparently byte 0xF1
    b. But in UTF-8, + apparently must be encoded as two bytes: 0xC3 0xB1
    So the Wi+o#n+ attribution apparently contains illegal UTF-8 sequences.

    When I used this header with this text in the body: Wi+o#n+4+#<>gEo
    Content-Type: text/plain; charset=US-ASCII
    Content-Transfer-Encoding: 7bit
    Everything looks fine.

    But when I send the exact same message with the exact same body using:
    Content-Type: text/plain; charset=UTF-8
    Content-Transfer-Encoding: 8bit
    Then "something" corrupts the body, turning it into mojibake.

    As I said, I never understood this charset stuff.

    But I think I've resolved most of the issues simply by declaring that
    my body is ASCII.

    I generally send "my text" through a conversion utility (shortcuts.xml).
    But I generally do not send the attribute line through that utility.

    Even so, I tried to convert Wi+o#n+ to Winston, but it failed because
    Scintilla (which is the Notepad++ engine) doesn't recognize them yet.

    So, for now, I think the charset=US-ASCII is the best workaround.

    Thanks for confirming what you see when I changed the headers.
    Much appreciated as I'm trying to be a good netizen, as you are too.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Carlos E.R.@robin_listas@es.invalid to alt.comp.os.windows-10,news.software.readers,alt.comp.software.thunderbird on Fri Mar 13 23:25:59 2026
    From Newsgroup: news.software.readers

    On 2026-03-13 22:11, Maria Sophia wrote:
    Carlos E.R. wrote:
    I could be wrong as I never really understood this characters stuff, but >>> a. They are not UTF-8
    b. They are not ASCII
    c. They are not MIME-encoded
    d. They are raw 8-bit bytes

    Huh, no. They were typed as 8-bit bytes from Latin-1 charset at some
    point in time, but today they are UTF-8. UTF in the body, and as MIME in
    the header.

    You said it yourself in another post:

    Hi Carlos,

    I agree. I apologize for the flip flop indecision. I don't know what's
    going on, as I'm only trying to fix the trouble W-i|#-o-#-n|#-4|u|f|||#g|<|2 creates.

    I will endlessly admit I never understood this charset stuff, and I will point out that the only reason I even care is you and others asked me to
    fix the problems that sometimes my posts look like a Chinese jigsaw puzzle.

    Since I don't mess with the characters, something else is messing with the characters, where a test in this very thread shows that when I use headers
    Content-Type: text/plain; charset=US-ASCII
    Content-Transfer-Encoding: 7bi t
    Then W-i|#-o-#-n|#-4|u|f|||#g|<|2 remains W-i|#-o-#-n|#-4|u|f|||#g|<|2

    But when I use headers
    Content-Type: text/plain; charset=UTF-8
    Content-Transfer-Encoding: 8bi t
    Then W-i|#-o-#-n|#-4|u|f|||#g|<|2 turns the entire post into a ransom note.

    Possibly because the text is not actually UTF-8



    Usenet (NNTP) follows email header rules (RFC 5322 + RFC 2047):
    a. The body may be UTF-8, if declared.
    b. Headers cannot contain raw 8-bit bytes.
    c. Hence, non-ASCII characters must be encoded using MIME encoded-words
    From: =?UTF-8?Q?W=C2=A1=C3=B1=C2=A7=C2=B1=C2=A4=C3=B1?= <winston@example.com>

    Given Winston's "FROM:" header has those characters, which are not ASCII,
    all I can say is that they're not valid characters for *headers*, unless they're MIME encoded. Are they Mime-encoded? I don't know. I don't see it.

    Yes, they are MIME encoded. I posted the other day the section in HEX,
    taken directly from the on disk file that Leafnode has written on my
    system, so no translation from Thunderbird.


    As you said, I belatedly realized Winston's characters are valid Unicode
    and valid UTF-8 but they appear in a header, apparently without required
    MIME encoding when Usenet servers are allowed to mangle or reject 8-bit header bytes. When I respond, the attribute line contains W-i|#-o-#-n|#-4|u|f|||#g|<|2

    What I'm trying to figure out is why my body gets mangled because the attribution line contains raw Latin-1 bytes, but my outgoing headers
    declare UTF-8, so I think a server in the path re-encodes the body and corrupts it. But I'm not really sure what is causing the mojibake. .
    --
    Cheers, Carlos.
    ESEfc-Efc+, EUEfc-Efc|;
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Maria Sophia@mariasophia@comprehension.com to alt.comp.os.windows-10,news.software.readers,alt.comp.software.thunderbird on Fri Mar 13 21:45:24 2026
    From Newsgroup: news.software.readers

    Carlos E.R. wrote:
    Then Wi+o#n+4+#<>gEo turns the entire post into a ransom note.

    Possibly because the text is not actually UTF-8

    Yeah. In a later post you see I belatedly figured that out for myself.
    Sorry for the flip flop indecision on whether I think it's UTF-8 or not.

    Did I ever mention I never really understood this Usenet charset stuff?

    I'm one of the few people whose ego isn't so huge that they can't admit
    when they don't know something, where I openly and humbly easily admit that
    I seriously lack charset understanding when it comes to Usenet headers.

    Luckily, the two things I'm doing seems to work "most" of the time:
    a. If I copy/paste from a variety of web sources (particularly Chromium),
    I run my body through a text-normalizer to eliminate Unicode chars.
    <shortcuts.xml>
    b. I manually place a US-ASCII header which seems to tell the receiving
    newsreaders not to both trying to deal with Wi+o#n+4+#<>gEo's
    Windows-1252 ISO-8859-1 (Latin-1) character set.
    w = 0x57 (ASCII)
    i = 0xA1
    + = 0xF1
    o = 0xA7
    # = 0xB1
    n = 0xA4
    4 = 0xAC
    + = 0xD6
    # = 0xDF
    < = 0xF3
    > = 0xF2
    g = 0x67 (ASCII)
    E = 0xEE
    o = 0xEB
    Every one of those bytes is a single-byte Latin-1 / Windows-1252 character. None of them are UTF-8.

    Given Winston's "FROM:" header has those characters, which are not ASCII,
    all I can say is that they're not valid characters for *headers*, unless
    they're MIME encoded. Are they Mime-encoded? I don't know. I don't see it.

    Yes, they are MIME encoded. I posted the other day the section in HEX,
    taken directly from the on disk file that Leafnode has written on my
    system, so no translation from Thunderbird.

    I may be wrong since I never understood this stuff, so I appreciate your clarifications, and I openly let you know I really don't understand this.

    I think you are describing Thunderbird's behavior, not necessarily
    Winston's behavior, while mostly I'm describing Winston's original bytes,
    not Thunderbird's. (Although it appears that Winston uses TB after all.)

    I think we can all presume Winston originally long ago typed raw
    Windows-1252 bytes using Alt-codes for his display name, but I think it may
    be that TB does not actually send those bytes directly in the header.

    Those are raw 8-bit Latin-1 bytes when he types them.
    However, I think TB does not send those bytes directly.

    When Winston posts using TB, I think TB maybe perhaps converts the Latin-1 bytes to UTF-8, and then MIME-encodes the header using RFC 2047. That may
    be why the raw source on your system shows something like:

    From: =?UTF-8?B?Li4ud8Khw7HCp8KxwqTDsQ==?=

    On your side, TB maybe perhaps then decodes that MIME-encoded header for display, so in the normal UI you see:

    ...w!no#nn <winstonmvp@gmail.com>

    I'm rather confused, as I don't control anything but my side of the
    equation, and all I'm doing is dealing with Winston's display name,
    but maybe what's possibly happening overall, is this (maybe?):

    1. Winston typed Windows-1252 Alt-codes for his display name long ago.
    ...Wi+o#n+
    2. His Thunderbird converts those Latin-1 bytes to UTF-8 internally.
    3. His Thunderbird MIME-encodes the UTF-8 header before sending it.
    4. Your Thunderbird decodes the MIME header & displays normal UTF-8 text.
    5. My own newsreader client copies the original Latin-1 bytes from the
    attribution line because it does not decode the MIME header.
    6. That mismatch triggers mojibake in my outgoing posts when my headers
    declare "charset=UTF-8" instead of "charset=US-ASCII".

    I never understood this stuff, but perhaps maybe that explains why you see
    a valid MIME-encoded UTF-8 header in the raw view, while I see the original Latin-1 bytes in my ASCII world. Thunderbird is doing the right thing on Winston's end, but perhaps my own ASCII-only setup exposes the mismatch.

    Thanks again for helping me sort out what Thunderbird is doing on your
    side, as I used TB years ago for a client and hated how it thought Usenet
    was email. Maybe it's better now as that had to be a decade or so ago.
    --
    Usenet is a means of communication around the world on technical topics.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Michael =?ISO-8859-1?Q?B=E4uerle?=@michael.baeuerle@gmx.net to news.software.readers on Sat Mar 14 12:40:32 2026
    From Newsgroup: news.software.readers

    Maria Sophia wrote:

    The RFC-correct solution would be:
    [Invalid header field line]

    Please note that RFC 2047 defines a line length limit: <https://datatracker.ietf.org/doc/html/rfc2047#section-2>
    |
    | While there is no limit to the length of a multiple-line header
    | field, each line of a header field that contains one or more
    | 'encoded-word's is limited to 76 characters.

    for this reason:
    |
    | The length restrictions are included both to ease interoperability
    | through internetwork mail gateways, and to impose a limit on the
    | amount of lookahead a header parser must employ (while looking for a
    | final ?= delimiter) before it can decide whether a token is an
    | "encoded-word" or something else.

    But that's ugly.

    Because nobody should get the raw header displayed (except on request)
    I think this should be no problem.


    [Xpost reduced]
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Carlos E.R.@robin_listas@es.invalid to alt.comp.os.windows-10,news.software.readers,alt.comp.software.thunderbird on Sat Mar 14 14:11:54 2026
    From Newsgroup: news.software.readers

    On 2026-03-14 05:45, Maria Sophia wrote:
    Carlos E.R. wrote:
    Then W-i|#-o-#-n|#-4|u|f|||#g|<|2 turns the entire post into a ransom note. >>
    Possibly because the text is not actually UTF-8

    Yeah. In a later post you see I belatedly figured that out for myself.
    Sorry for the flip flop indecision on whether I think it's UTF-8 or not.

    Did I ever mention I never really understood this Usenet charset stuff?

    I'm one of the few people whose ego isn't so huge that they can't admit
    when they don't know something, where I openly and humbly easily admit that
    I seriously lack charset understanding when it comes to Usenet headers.

    Luckily, the two things I'm doing seems to work "most" of the time:
    a. If I copy/paste from a variety of web sources (particularly Chromium),
    I run my body through a text-normalizer to eliminate Unicode chars.
    <shortcuts.xml>
    b. I manually place a US-ASCII header which seems to tell the receiving
    newsreaders not to both trying to deal with W-i|#-o-#-n|#-4|u|f|||#g|<|2's
    Windows-1252 ISO-8859-1 (Latin-1) character set.
    w = 0x57 (ASCII)
    -i = 0xA1
    |# = 0xF1
    -o = 0xA7
    -# = 0xB1
    -n = 0xA4
    -4 = 0xAC
    |u = 0xD6
    |f = 0xDF
    || = 0xF3
    |# = 0xF2
    g = 0x67 (ASCII)
    |< = 0xEE
    |2 = 0xEB
    Every one of those bytes is a single-byte Latin-1 / Windows-1252 character. None of them are UTF-8.

    Given Winston's "FROM:" header has those characters, which are not ASCII, >>> all I can say is that they're not valid characters for *headers*, unless >>> they're MIME encoded. Are they Mime-encoded? I don't know. I don't see it. >>
    Yes, they are MIME encoded. I posted the other day the section in HEX,
    taken directly from the on disk file that Leafnode has written on my
    system, so no translation from Thunderbird.

    I may be wrong since I never understood this stuff, so I appreciate your clarifications, and I openly let you know I really don't understand this.

    I think you are describing Thunderbird's behavior, not necessarily
    Winston's behavior, while mostly I'm describing Winston's original bytes,
    not Thunderbird's. (Although it appears that Winston uses TB after all.)

    He does.

    I have looked at posts from him, in three ways:

    * as TB displays them
    * as TB "view-raw" displays them
    * at the computer file, which is stored by leafnode in my system.


    I think we can all presume Winston originally long ago typed raw
    Windows-1252 bytes using Alt-codes for his display name, but I think it may be that TB does not actually send those bytes directly in the header.

    Certainly not. Currently it sends MIME encoded UTF-8 header.


    Those are raw 8-bit Latin-1 bytes when he types them.
    However, I think TB does not send those bytes directly.

    When Winston posts using TB, I think TB maybe perhaps converts the Latin-1 bytes to UTF-8, and then MIME-encodes the header using RFC 2047. That may
    be why the raw source on your system shows something like:

    From: =?UTF-8?B?Li4ud8Khw7HCp8KxwqTDsQ==?=


    Yes.

    On your side, TB maybe perhaps then decodes that MIME-encoded header for display, so in the normal UI you see:

    ...w!n-o-#-nn <winstonmvp@gmail.com>


    Yes.

    I'm rather confused, as I don't control anything but my side of the
    equation, and all I'm doing is dealing with Winston's display name,
    but maybe what's possibly happening overall, is this (maybe?):

    1. Winston typed Windows-1252 Alt-codes for his display name long ago.
    ...W-i|#-o-#-n|#
    2. His Thunderbird converts those Latin-1 bytes to UTF-8 internally.
    3. His Thunderbird MIME-encodes the UTF-8 header before sending it.
    4. Your Thunderbird decodes the MIME header & displays normal UTF-8 text.

    Yes.

    5. My own newsreader client copies the original Latin-1 bytes from the
    attribution line because it does not decode the MIME header.
    6. That mismatch triggers mojibake in my outgoing posts when my headers
    declare "charset=UTF-8" instead of "charset=US-ASCII".

    I never understood this stuff, but perhaps maybe that explains why you see
    a valid MIME-encoded UTF-8 header in the raw view, while I see the original Latin-1 bytes in my ASCII world. Thunderbird is doing the right thing on Winston's end, but perhaps my own ASCII-only setup exposes the mismatch.

    Presumably your upstream nntp server sends to you the same utf-8 mime
    encoded header that I get, but your system does not interpret it correctly.


    Thanks again for helping me sort out what Thunderbird is doing on your
    side, as I used TB years ago for a client and hated how it thought Usenet
    was email. Maybe it's better now as that had to be a decade or so ago.


    I have no understanding of the RFCs, I simply observe what TB and
    Leafnode seem to do. I also looked in another machine that only uses TB.
    I also have the memory of what I have read over the years.

    Forget latin-1. The servers are sending mime encoded utf-8 in the
    headers. Life is simple that way.
    --
    Cheers, Carlos.
    ESEfc-Efc+, EUEfc-Efc|;
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From =?UTF-8?B?Li4ud8Khw7HCp8KxwqTDsQ==?=@winstonmvp@gmail.com to alt.comp.os.windows-10,news.software.readers,alt.comp.software.thunderbird on Sat Mar 14 11:32:22 2026
    From Newsgroup: news.software.readers

    On 3/13/2026 9:45 PM, Maria Sophia wrote:

    I think you are describing Thunderbird's behavior, not necessarily
    Winston's behavior, while mostly I'm describing Winston's original bytes,
    not Thunderbird's. (Although it appears that Winston uses TB after all.)

    Yes to TB. Also SeaMonkey and WLM2012

    I think we can all presume Winston originally long ago typed raw
    Windows-1252 bytes using Alt-codes for his display name, but I think it may be that TB does not actually send those bytes directly in the header.

    As noted earlier..no typing was ever done. The string was created in
    Character map with typing - select, repeat for next character, copy
    string, paste to desired field.
    --
    ...w-i|#-o-#-n|#
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From =?UTF-8?B?Li4ud8Khw7HCp8KxwqTDsQ==?=@winstonmvp@gmail.com to alt.comp.os.windows-10,news.software.readers,alt.comp.software.thunderbird on Sat Mar 14 11:34:31 2026
    From Newsgroup: news.software.readers

    On 3/14/2026 6:11 AM, Carlos E.R. wrote:

    Forget latin-1. The servers are sending mime encoded utf-8 in the
    headers. Life is simple that way.


    +1
    --
    ...w-i|#-o-#-n|#
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From David B.@BDonBlockNews@invalid.invalid to alt.comp.os.windows-10,news.software.readers,alt.comp.software.thunderbird on Sun Mar 15 11:49:04 2026
    From Newsgroup: news.software.readers

    This post was not found by "Individual.net"
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From David B.@BDonBlockNews@invalid.invalid to alt.comp.os.windows-10,news.software.readers,alt.comp.software.thunderbird on Sun Mar 15 11:50:24 2026
    From Newsgroup: news.software.readers

    On 14/03/2026 18:32, ...w-i|#-o-#-n|# wrote:
    On 3/13/2026 9:45 PM, Maria Sophia wrote:

    I think you are describing Thunderbird's behavior, not necessarily
    Winston's behavior, while mostly I'm describing Winston's original bytes,
    not Thunderbird's. (Although it appears that Winston uses TB after all.)

    Yes to TB. Also SeaMonkey and WLM2012

    I think we can all presume Winston originally long ago typed raw
    Windows-1252 bytes using Alt-codes for his display name, but I think
    it may
    be that TB does not actually send those bytes directly in the header.

    As noted earlier..no typing was ever done. The string was created in Character map with typing - select, repeat for next character, copy
    string, paste to desired field.

    Here's the original! EfOe
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Maria Sophia@mariasophia@comprehension.com to news.software.readers on Mon Mar 16 14:13:44 2026
    From Newsgroup: news.software.readers

    Michael BEuerle wrote:
    The RFC-correct solution would be:
    [Invalid header field line]

    Please note that RFC 2047 defines a line length limit: <https://datatracker.ietf.org/doc/html/rfc2047#section-2>
    |
    | While there is no limit to the length of a multiple-line header
    | field, each line of a header field that contains one or more
    | 'encoded-word's is limited to 76 characters.

    for this reason:
    |
    | The length restrictions are included both to ease interoperability
    | through internetwork mail gateways, and to impose a limit on the
    | amount of lookahead a header parser must employ (while looking for a
    | final ?= delimiter) before it can decide whether a token is an
    | "encoded-word" or something else.

    But that's ugly.

    Because nobody should get the raw header displayed (except on request)
    I think this should be no problem.

    Thanks for the update. I think we've resolved all (most of) the issues. Carlos' tests were instrumental as they matched my guessed assumptions.

    1. Winston typed Windows-1252 Alt-codes for his display name long ago.
    ...Wi+o#n+
    2. His Thunderbird converts those Latin-1 bytes to UTF-8 internally.
    3. His Thunderbird MIME-encodes the UTF-8 header before sending it.
    4. Your Thunderbird decodes the MIME header & displays normal UTF-8 text.
    5. My own newsreader client copies the original Latin-1 bytes from the
    attribution line because it does not decode the MIME header.
    6. That mismatch triggers mojibake in my outgoing posts when my headers
    declare "charset=UTF-8" instead of "charset=US-ASCII".
    --- Synchronet 3.21d-Linux NewsLink 1.2