• Re: Printing UTF-8 mail to terminal

    From Barry@21:1/5 to All on Sat Nov 2 13:19:40 2024
    On 1 Nov 2024, at 22:57, Left Right <olegsivokon@gmail.com> wrote:

    Does this Windows Terminal support the use
    of programs like tmux?

    I have not tried, but should work.

    Best to install the terminal app from the MS app store.
    Most use I make is to ssh into linux systems and stuff like editors.
    Colour output and cursor movement all work.

    Barry

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Loris Bennett@21:1/5 to Cameron Simpson on Mon Nov 4 11:44:03 2024
    Cameron Simpson <cs@cskk.id.au> writes:

    On 01Nov2024 10:10, Loris Bennett <loris.bennett@fu-berlin.de> wrote:
    as expected. The non-UTF-8 text occurs when I do

    mail = EmailMessage()
    mail.set_content(body, cte="quoted-printable")
    ...

    if args.verbose:
    print(mail)

    which is presumably also correct.

    The question is: What conversion is necessary in order to print the >>EmailMessage object to the terminal, such that the quoted-printable
    parts are turned (back) into UTF-8?

    Do you still have access to `body` ? That would be the original
    message text? Otherwise maybe:

    print(mail.get_content())

    The objective is to obtain the message body Unicode text (i.e. a
    regular Python string with the original text, unencoded). And to print
    that.

    With the following:

    ######################################################################

    import email.message

    m = email.message.EmailMessage()

    m['Subject'] = 'Übung'

    m.set_content('Dies ist eine Übung')
    print('== cte: default == \n')
    print(m)

    print('-- full mail ---')
    print(m)
    print('-- just content--')
    print(m.get_content())

    m.set_content('Dies ist eine Übung', cte='quoted-printable')
    print('== cte: quoted-printable ==\n')
    print('-- full mail --')
    print(m)
    print('-- just content --')
    print(m.get_content())

    ######################################################################

    I get the following output:

    ######################################################################

    == cte: default ==

    Subject: Übung
    Content-Type: text/plain; charset="utf-8"
    Content-Transfer-Encoding: base64
    MIME-Version: 1.0

    RGllcyBpc3QgZWluZSDDnGJ1bmcK

    -- full mail ---
    Subject: Übung
    Content-Type: text/plain; charset="utf-8"
    Content-Transfer-Encoding: base64
    MIME-Version: 1.0

    RGllcyBpc3QgZWluZSDDnGJ1bmcK

    -- just content--
    Dies ist eine Übung

    == cte: quoted-printable ==

    -- full mail --
    Subject: Übung
    MIME-Version: 1.0
    Content-Type: text/plain; charset="utf-8"
    Content-Transfer-Encoding: quoted-printable

    Dies ist eine Übung

    -- just content --
    Dies ist eine Übung

    ######################################################################

    So in both cases the subject is fine, but it is unclear to me how to
    print the body. Or rather, I know how to print the body OK, but I don't
    know how to print the headers separately - there seems to be nothing
    like 'get_headers()'. I can use 'get('Subject) etc. and reconstruct the headers, but that seems a little clunky.

    Cheers,

    Loris

    --
    This signature is currently under constuction.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Loris Bennett@21:1/5 to Inada Naoki on Mon Nov 4 11:48:15 2024
    Inada Naoki <songofacandy@gmail.com> writes:

    2024年11月2日(土) 0:36 Loris Bennett via Python-list <python-list@python.org>:

    Left Right <olegsivokon@gmail.com> writes:

    There's quite a lot of misuse of terminology around terminal / console
    / shell. Please, correct me if I'm wrong, but it looks like you are
    printing that on MS Windows, right? MS Windows doesn't have or use
    terminals (that's more of a Unix-related concept). And, by "terminal"
    I mean terminal emulator (i.e. a program that emulates the behavior of
    a physical terminal). You can, of course, find some terminal programs
    for windows (eg. mintty), but I doubt that that's what you are dealing
    with.

    What MS Windows users usually end up using is the console. If you
    run, eg. cmd.exe, it will create a process that displays a graphical
    console. The console uses an encoding scheme to represent the text
    output. I believe that the default on MS Windows is to use some
    single-byte encoding. This answer from SE family site tells you how to
    set the console encoding to UTF-8 permanently:

    https://superuser.com/questions/269818/change-default-code-page-of-windows-console-to-utf-8
    , which, I believe, will solve your problem with how the text is
    displayed.

    I'm not using MS Windows. I am using a Gnome terminal on Debian 12
    locally and connecting via SSH to a AlmaLinux 8 server, where I start a
    tmux session.

    On Thu, Oct 31, 2024 at 5:19 PM Loris Bennett via Python-list
    <python-list@python.org> wrote:

    Hi,

    I have a command-line program which creates an email containing German
    umlauts. On receiving the mail, my mail client displays the subject and >> >> body correctly:

    Subject: Übung

    Sehr geehrter Herr Dr. Bennett,

    Dies ist eine Übung.

    So far, so good. However, when I use the --verbose option to print
    the mail to the terminal via

    if args.verbose:
    print(mail)

    I get:

    Subject: Übungsbetreff

    Sehr geehrter Herr Dr. Bennett,

    Dies ist eine =C3=9Cbung.

    What do I need to do to prevent the body from getting mangled?

    I seem to remember that I had issues in the past with a Perl version of >> >> a similar program. As far as I recall there was an issue with fact the >> >> greeting is generated by querying a server, whereas the body is being
    read from a file, which lead to oddities when the two bits were
    concatenated. But that might just have been a Perl thing.


    Try PYTHONUTF8=1 envver.


    This does not seem to affect the way the email body is printed.

    Cheers,

    Loris

    --
    This signature is currently under constuction.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Loris Bennett@21:1/5 to Loris Bennett on Mon Nov 4 11:57:37 2024
    "Loris Bennett" <loris.bennett@fu-berlin.de> writes:

    Cameron Simpson <cs@cskk.id.au> writes:

    On 01Nov2024 10:10, Loris Bennett <loris.bennett@fu-berlin.de> wrote:
    as expected. The non-UTF-8 text occurs when I do

    mail = EmailMessage()
    mail.set_content(body, cte="quoted-printable")
    ...

    if args.verbose:
    print(mail)

    which is presumably also correct.

    The question is: What conversion is necessary in order to print the >>>EmailMessage object to the terminal, such that the quoted-printable
    parts are turned (back) into UTF-8?

    Do you still have access to `body` ? That would be the original
    message text? Otherwise maybe:

    print(mail.get_content())

    The objective is to obtain the message body Unicode text (i.e. a
    regular Python string with the original text, unencoded). And to print
    that.

    With the following:

    ######################################################################

    import email.message

    m = email.message.EmailMessage()

    m['Subject'] = 'Übung'

    m.set_content('Dies ist eine Übung')
    print('== cte: default == \n')
    print(m)

    print('-- full mail ---')
    print(m)
    print('-- just content--')
    print(m.get_content())

    m.set_content('Dies ist eine Übung', cte='quoted-printable')
    print('== cte: quoted-printable ==\n')
    print('-- full mail --')
    print(m)
    print('-- just content --')
    print(m.get_content())

    ######################################################################

    I get the following output:

    ######################################################################

    == cte: default ==

    Subject: Übung
    Content-Type: text/plain; charset="utf-8"
    Content-Transfer-Encoding: base64
    MIME-Version: 1.0

    RGllcyBpc3QgZWluZSDDnGJ1bmcK

    -- full mail ---
    Subject: Übung
    Content-Type: text/plain; charset="utf-8"
    Content-Transfer-Encoding: base64
    MIME-Version: 1.0

    RGllcyBpc3QgZWluZSDDnGJ1bmcK

    -- just content--
    Dies ist eine Übung

    == cte: quoted-printable ==

    -- full mail --
    Subject: Übung
    MIME-Version: 1.0
    Content-Type: text/plain; charset="utf-8"
    Content-Transfer-Encoding: quoted-printable

    Dies ist eine =C3=9Cbung

    -- just content --
    Dies ist eine Übung

    ######################################################################

    So in both cases the subject is fine, but it is unclear to me how to
    print the body. Or rather, I know how to print the body OK, but I don't
    know how to print the headers separately - there seems to be nothing
    like 'get_headers()'. I can use 'get('Subject) etc. and reconstruct the headers, but that seems a little clunky.

    Sorry, I am confusing the terminology here. The 'body' seems to be the
    headers plus the 'content'. So I can print the *content* without the
    headers OK, but I can't easily print all the headers separately. If
    just print the body, i.e. headers plus content, the umlauts in the
    content are not resolved.

    --
    This signature is currently under constuction.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Loris Bennett@21:1/5 to Loris Bennett on Mon Nov 4 13:02:21 2024
    "Loris Bennett" <loris.bennett@fu-berlin.de> writes:

    "Loris Bennett" <loris.bennett@fu-berlin.de> writes:

    Cameron Simpson <cs@cskk.id.au> writes:

    On 01Nov2024 10:10, Loris Bennett <loris.bennett@fu-berlin.de> wrote: >>>>as expected. The non-UTF-8 text occurs when I do

    mail = EmailMessage()
    mail.set_content(body, cte="quoted-printable")
    ...

    if args.verbose:
    print(mail)

    which is presumably also correct.

    The question is: What conversion is necessary in order to print the >>>>EmailMessage object to the terminal, such that the quoted-printable >>>>parts are turned (back) into UTF-8?

    Do you still have access to `body` ? That would be the original
    message text? Otherwise maybe:

    print(mail.get_content())

    The objective is to obtain the message body Unicode text (i.e. a
    regular Python string with the original text, unencoded). And to print
    that.

    With the following:

    ######################################################################

    import email.message

    m = email.message.EmailMessage()

    m['Subject'] = 'Übung'

    m.set_content('Dies ist eine Übung')
    print('== cte: default == \n')
    print(m)

    print('-- full mail ---')
    print(m)
    print('-- just content--')
    print(m.get_content())

    m.set_content('Dies ist eine Übung', cte='quoted-printable')
    print('== cte: quoted-printable ==\n')
    print('-- full mail --')
    print(m)
    print('-- just content --')
    print(m.get_content())

    ######################################################################

    I get the following output:

    ######################################################################

    == cte: default ==

    Subject: Übung
    Content-Type: text/plain; charset="utf-8"
    Content-Transfer-Encoding: base64
    MIME-Version: 1.0

    RGllcyBpc3QgZWluZSDDnGJ1bmcK

    -- full mail ---
    Subject: Übung
    Content-Type: text/plain; charset="utf-8"
    Content-Transfer-Encoding: base64
    MIME-Version: 1.0

    RGllcyBpc3QgZWluZSDDnGJ1bmcK

    -- just content--
    Dies ist eine Übung

    == cte: quoted-printable ==

    -- full mail --
    Subject: Übung
    MIME-Version: 1.0
    Content-Type: text/plain; charset="utf-8"
    Content-Transfer-Encoding: quoted-printable

    Dies ist eine =C3=9Cbung

    -- just content --
    Dies ist eine Übung

    ######################################################################

    So in both cases the subject is fine, but it is unclear to me how to
    print the body. Or rather, I know how to print the body OK, but I don't
    know how to print the headers separately - there seems to be nothing
    like 'get_headers()'. I can use 'get('Subject) etc. and reconstruct the
    headers, but that seems a little clunky.

    Sorry, I am confusing the terminology here. The 'body' seems to be the headers plus the 'content'. So I can print the *content* without the
    headers OK, but I can't easily print all the headers separately. If
    just print the body, i.e. headers plus content, the umlauts in the
    content are not resolved.

    OK, so I can do:

    ######################################################################
    if args.verbose:
    for k in mail.keys():
    print(f"{k}: {mail.get(k)}")
    print('')
    print(mail.get_content()) ######################################################################

    prints what I want and is not wildly clunky, but I am a little surprised
    that I can't get a string representation of the whole email in one go.

    Cheers,

    Loris


    --
    Dr. Loris Bennett (Herr/Mr)
    FUB-IT, Freie Universität Berlin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Peter J. Holzer@21:1/5 to Loris Bennett via Python-list on Tue Nov 5 21:39:32 2024
    On 2024-11-04 13:02:21 +0100, Loris Bennett via Python-list wrote:
    "Loris Bennett" <loris.bennett@fu-berlin.de> writes:
    "Loris Bennett" <loris.bennett@fu-berlin.de> writes:
    Cameron Simpson <cs@cskk.id.au> writes:
    On 01Nov2024 10:10, Loris Bennett <loris.bennett@fu-berlin.de> wrote: >>>>as expected. The non-UTF-8 text occurs when I do

    mail = EmailMessage()
    mail.set_content(body, cte="quoted-printable")
    ...

    if args.verbose:
    print(mail)

    which is presumably also correct.

    The question is: What conversion is necessary in order to print the >>>>EmailMessage object to the terminal, such that the quoted-printable >>>>parts are turned (back) into UTF-8?
    [...]
    OK, so I can do:

    ######################################################################
    if args.verbose:
    for k in mail.keys():
    print(f"{k}: {mail.get(k)}")
    print('')
    print(mail.get_content()) ######################################################################

    prints what I want and is not wildly clunky, but I am a little surprised
    that I can't get a string representation of the whole email in one go.

    Mails can contain lots of stuff, so there is in general no suitable
    human readable string representation of a whole email. You have to go
    through it part by part and decide what you want to do with each. For
    example, if you have a multipart/alternative with a text/plain and a
    text/html part what should the "string representation" be? For some uses
    the text/plain part might be sufficient. For some you might want the
    HTML part or some rendering of it. Or what would you do with an image?
    Omit it completely? Just use the filename (if any)? Try to convert it to ASCII-Art? Use an AI to describe it?

    hp

    --
    _ | Peter J. Holzer | Story must make more sense than reality.
    |_|_) | |
    | | | hjp@hjp.at | -- Charles Stross, "Creative writing
    __/ | http://www.hjp.at/ | challenge!"

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEETtJbRjyPwVTYGJ5k8g5IURL+KF0FAmcqgn4ACgkQ8g5IURL+ KF1W/hAAgjdxK0LTU1ujYIKi3RzIzJdTXx5/XZsDFyaSIcFuGwmxOx64Pg3petNd UcI1aOcUmZhYU5YJ91Q/vYgCID/wBWqycjfQLL7N50+FMK3428KAngPeI97S5sAP EjnfiLkM30RElveaeDBD4Savunlc93Sr74+o0+tbi/0EzuLAEB4hqPStFYUAgpLX 7f+tS26bQjAZ7EMR4oRtrerJjUV9a4c1FMakAEk8MR7A9aEvFBSR4CpDXlH3D5hB M1iDtxh2NFLwVfiGhadfjqOww0EHGrHEDPN8nQMIs43IFIAF8tx4jood9LgGni20 x633eO9jMZ3hD/T51VqVTpv0T374Y2++SGGOsXYO/T9tFVexJH957wyXMxCwQN48 lI5XVpW8U4TKQ55n5LX5w3ZEXyCq6drR5JkRhZpl50toEz8S0QrdEv53IyANbXfJ gaQzwfRcWEgBwSldrAaeJaeTsq3DQAOL90wY4Vk84yYWcVWxjhLM4QdzAJa4VoUh djLSQ0qvXgJe8I6w0kFfuWpIiPc0WV7LFDQlyxyQ8YZj0xGM9AsIVOaW1KX4h9W7 xqhlusKwR7cViXoF6JixTBYRPWqxkOhmcNjuH28BklRTi7KvNFN4QVdj4hFK1yT9 27rlsW+VZXRCiEF7/pSizYhHscZhhgMsBXFMQ8l
  • From Cameron Simpson@21:1/5 to Loris Bennett on Wed Nov 6 08:20:44 2024
    On 04Nov2024 13:02, Loris Bennett <loris.bennett@fu-berlin.de> wrote:
    OK, so I can do:

    ######################################################################
    if args.verbose:
    for k in mail.keys():
    print(f"{k}: {mail.get(k)}")
    print('')
    print(mail.get_content())
    ######################################################################

    prints what I want and is not wildly clunky, but I am a little surprised
    that I can't get a string representation of the whole email in one go.

    A string representation of the whole message needs to be correctly
    encoded so that its components can be identified mechanically. So it
    needs to be a syntacticly valid RFC5322 message. Thus the encoding.

    As an example (slightly contrived) of why this is important, multipart
    messages are delimited with distinct lines, and their content may not
    present such a line (even f it's in the "raw" original data).

    So printing a whole message transcribes it in the encoded form so that
    it can be decoded mechanically. And conservativly, this is usually an
    ASCII compatibly encoding so that it can traverse various systems
    undamaged. This means the text requiring UTF8 encoding get further
    encoded as quoted printable to avoid ambiguity about the meaning of bytes/octets which have their high bit set.

    BTW, doesn't this:

    for k in mail.keys():
    print(f"{k}: {mail.get(k)}")

    print the quoted printable (i.e. not decoded) form of subject lines?

    Cheers,
    Cameron Simpson <cs@cskk.id.au>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Loris Bennett@21:1/5 to All on Thu Oct 31 16:33:41 2024
    Hi,

    I have a command-line program which creates an email containing German
    umlauts. On receiving the mail, my mail client displays the subject and
    body correctly:

    Subject: Übung

    Sehr geehrter Herr Dr. Bennett,

    Dies ist eine Übung.

    So far, so good. However, when I use the --verbose option to print
    the mail to the terminal via

    if args.verbose:
    print(mail)

    I get:

    Subject: Übungsbetreff

    Sehr geehrter Herr Dr. Bennett,

    Dies ist eine =C3=9Cbung.

    What do I need to do to prevent the body from getting mangled?

    I seem to remember that I had issues in the past with a Perl version of
    a similar program. As far as I recall there was an issue with fact the greeting is generated by querying a server, whereas the body is being
    read from a file, which lead to oddities when the two bits were
    concatenated. But that might just have been a Perl thing.

    Cheers,

    Loris

    --
    This signature is currently under constuction.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Cameron Simpson@21:1/5 to Loris Bennett on Fri Nov 1 07:50:56 2024
    On 31Oct2024 16:33, Loris Bennett <loris.bennett@fu-berlin.de> wrote:
    I have a command-line program which creates an email containing German >umlauts. On receiving the mail, my mail client displays the subject and
    body correctly:
    [...]
    So far, so good. However, when I use the --verbose option to print
    the mail to the terminal via

    if args.verbose:
    print(mail)

    I get:

    Subject: Übungsbetreff

    Sehr geehrter Herr Dr. Bennett,

    Dies ist eine =C3=9Cbung.

    What do I need to do to prevent the body from getting mangled?

    That looks to me like quoted-printable. This is an encoding for binary transport of text to make it robust against not 8-buit clean transports.
    So your Unicode text is encodings as UTF-8, and then that is encoded in quoted-printable for transport through the email system.

    Your terminal probably accepts UTF-8 - I imagine other German text
    renders corectly?

    You need to get the text and undo the quoted-printable encoding.

    If you're using the Python email module to parse (or construct) the
    message as a `Message` object I'd expect that to happen automatically.

    If you're just dealing with this directly, use the `quopri` stdlib
    module: https://docs.python.org/3/library/quopri.html

    Cheers,
    Cameron Simpson <cs@cskk.id.au>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Loris Bennett@21:1/5 to Cameron Simpson on Fri Nov 1 08:11:30 2024
    Cameron Simpson <cs@cskk.id.au> writes:

    On 31Oct2024 16:33, Loris Bennett <loris.bennett@fu-berlin.de> wrote:
    I have a command-line program which creates an email containing German >>umlauts. On receiving the mail, my mail client displays the subject and >>body correctly:
    [...]
    So far, so good. However, when I use the --verbose option to print
    the mail to the terminal via

    if args.verbose:
    print(mail)

    I get:

    Subject: Übungsbetreff

    Sehr geehrter Herr Dr. Bennett,

    Dies ist eine =C3=9Cbung.

    What do I need to do to prevent the body from getting mangled?

    That looks to me like quoted-printable. This is an encoding for binary transport of text to make it robust against not 8-buit clean
    transports. So your Unicode text is encodings as UTF-8, and then that
    is encoded in quoted-printable for transport through the email system.

    As I mentioned, I think the problem is to do with the way the salutation
    text provided by the "salutation server" and the mail body from a file
    are encoded. This seems to be different.

    Your terminal probably accepts UTF-8 - I imagine other German text
    renders corectly?

    Yes, it does.

    You need to get the text and undo the quoted-printable encoding.

    If you're using the Python email module to parse (or construct) the
    message as a `Message` object I'd expect that to happen automatically.

    I am using

    email.message.EmailMessage

    as, from the Python documentation

    https://docs.python.org/3/library/email.examples.html

    I gathered that that is the standard approach.

    And you are right that encoding for the actual mail which is received is automatically sorted out. If I display the raw email in my client I get
    the following:

    Content-Type: text/plain; charset="utf-8"
    Content-Transfer-Encoding: quoted-printable
    ...
    Subject: =?utf-8?q?=C3=9Cbungsbetreff?=
    ...
    Dies ist eine =C3=9Cbung.

    I would interpret that as meaning that the subject and body are encoded
    in the same way.

    The problem just occurs with the unsent string representation printed to
    the terminal.

    Cheers,

    Loris

    --
    This signature is currently under constuction.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Loris Bennett@21:1/5 to Left Right on Fri Nov 1 07:52:32 2024
    Left Right <olegsivokon@gmail.com> writes:

    There's quite a lot of misuse of terminology around terminal / console
    / shell. Please, correct me if I'm wrong, but it looks like you are
    printing that on MS Windows, right? MS Windows doesn't have or use
    terminals (that's more of a Unix-related concept). And, by "terminal"
    I mean terminal emulator (i.e. a program that emulates the behavior of
    a physical terminal). You can, of course, find some terminal programs
    for windows (eg. mintty), but I doubt that that's what you are dealing
    with.

    What MS Windows users usually end up using is the console. If you
    run, eg. cmd.exe, it will create a process that displays a graphical
    console. The console uses an encoding scheme to represent the text
    output. I believe that the default on MS Windows is to use some
    single-byte encoding. This answer from SE family site tells you how to
    set the console encoding to UTF-8 permanently: https://superuser.com/questions/269818/change-default-code-page-of-windows-console-to-utf-8
    , which, I believe, will solve your problem with how the text is
    displayed.

    I'm not using MS Windows. I am using a Gnome terminal on Debian 12
    locally and connecting via SSH to a AlmaLinux 8 server, where I start a
    tmux session.

    On Thu, Oct 31, 2024 at 5:19 PM Loris Bennett via Python-list <python-list@python.org> wrote:

    Hi,

    I have a command-line program which creates an email containing German
    umlauts. On receiving the mail, my mail client displays the subject and
    body correctly:

    Subject: Übung

    Sehr geehrter Herr Dr. Bennett,

    Dies ist eine Übung.

    So far, so good. However, when I use the --verbose option to print
    the mail to the terminal via

    if args.verbose:
    print(mail)

    I get:

    Subject: Übungsbetreff

    Sehr geehrter Herr Dr. Bennett,

    Dies ist eine =C3=9Cbung.

    What do I need to do to prevent the body from getting mangled?

    I seem to remember that I had issues in the past with a Perl version of
    a similar program. As far as I recall there was an issue with fact the
    greeting is generated by querying a server, whereas the body is being
    read from a file, which lead to oddities when the two bits were
    concatenated. But that might just have been a Perl thing.

    Cheers,

    Loris

    --
    This signature is currently under constuction.
    --
    https://mail.python.org/mailman/listinfo/python-list
    --
    Dr. Loris Bennett (Herr/Mr)
    FUB-IT, Freie Universität Berlin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Loris Bennett@21:1/5 to Loris Bennett on Fri Nov 1 10:10:03 2024
    "Loris Bennett" <loris.bennett@fu-berlin.de> writes:

    Cameron Simpson <cs@cskk.id.au> writes:

    On 31Oct2024 16:33, Loris Bennett <loris.bennett@fu-berlin.de> wrote:
    I have a command-line program which creates an email containing German >>>umlauts. On receiving the mail, my mail client displays the subject and >>>body correctly:
    [...]
    So far, so good. However, when I use the --verbose option to print
    the mail to the terminal via

    if args.verbose:
    print(mail)

    I get:

    Subject: Übungsbetreff

    Sehr geehrter Herr Dr. Bennett,

    Dies ist eine =C3=9Cbung.

    What do I need to do to prevent the body from getting mangled?

    That looks to me like quoted-printable. This is an encoding for binary
    transport of text to make it robust against not 8-buit clean
    transports. So your Unicode text is encodings as UTF-8, and then that
    is encoded in quoted-printable for transport through the email system.

    As I mentioned, I think the problem is to do with the way the salutation
    text provided by the "salutation server" and the mail body from a file
    are encoded. This seems to be different.

    Your terminal probably accepts UTF-8 - I imagine other German text
    renders corectly?

    Yes, it does.

    You need to get the text and undo the quoted-printable encoding.

    If you're using the Python email module to parse (or construct) the
    message as a `Message` object I'd expect that to happen automatically.

    I am using

    email.message.EmailMessage

    as, from the Python documentation

    https://docs.python.org/3/library/email.examples.html

    I gathered that that is the standard approach.

    And you are right that encoding for the actual mail which is received is automatically sorted out. If I display the raw email in my client I get
    the following:

    Content-Type: text/plain; charset="utf-8"
    Content-Transfer-Encoding: quoted-printable
    ...
    Subject: =?utf-8?q?=C3=9Cbungsbetreff?=
    ...
    Dies ist eine =C3=9Cbung.

    I would interpret that as meaning that the subject and body are encoded
    in the same way.

    The problem just occurs with the unsent string representation printed to
    the terminal.

    If I log the body like this

    body = f"{salutation},\n\n{text}\n{signature}"
    logger.debug("body: " + body)

    and look at the log file in my terminal I see

    2024-11-01 09:59:12,318 - DEBUG - mailer:create_body - body: Sehr geehrter Herr Dr. Bennett,

    Dies ist eine Übung.

    ...

    as expected. The non-UTF-8 text occurs when I do

    mail = EmailMessage()
    mail.set_content(body, cte="quoted-printable")
    ...

    if args.verbose:
    print(mail)

    which is presumably also correct.

    The question is: What conversion is necessary in order to print the EmailMessage object to the terminal, such that the quoted-printable
    parts are turned (back) into UTF-8?

    Cheers,

    Loris

    --
    This signature is currently under constuction.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From dieter.maurer@online.de@21:1/5 to Loris Bennett on Fri Nov 1 17:38:01 2024
    Loris Bennett wrote at 2024-11-1 10:10 +0100:
    ...
    mail.set_content(body, cte="quoted-printable")

    In the line above, you request the content to use
    the "cte" (= "Content-Transfer-Encoding") "quoted-printable"
    and consequently, the content is encoded with `quoted-printable`.
    Maybe, you do not need to pass `cte`?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gilmeh Serda@21:1/5 to Alan Gauld on Fri Nov 1 20:18:16 2024
    On Thu, 31 Oct 2024 21:53:31 +0000, Alan Gauld wrote:

    never noticed that module before!

    """
    $ python
    Python 3.12.6 (main, Sep 8 2024, 13:18:56) [GCC 14.2.1 20240805] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    help('modules')

    Please wait a moment while I gather a list of all available modules...

    AssemblyApp apparmor io pyzipper AssemblyGui appdirs ipaddress qrtools CAMSimulator application_utility isodate queue
    Cheetah apprise isort quopri
    [...]
    """

    Put it in a list, unmangle it, sort it and you should have an alphabetical
    list of all modules on your system.

    --
    Gilmeh

    What the world *really* needs is a good Automatic Bicycle Sharpener.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jon Ribbens@21:1/5 to Eli the Bearded on Fri Nov 1 21:05:54 2024
    On 2024-11-01, Eli the Bearded <*@eli.users.panix.com> wrote:
    In comp.lang.python, Gilmeh Serda <gilmeh.serda@nothing.here.invalid> wrote:
    Python 3.12.6 (main, Sep 8 2024, 13:18:56) [GCC 14.2.1 20240805] on linux >> Type "help", "copyright", "credits" or "license" for more information.
    help('modules')

    Please wait a moment while I gather a list of all available modules...

    AssemblyApp apparmor io pyzipper
    AssemblyGui appdirs ipaddress qrtools
    CAMSimulator application_utility isodate queue
    Cheetah apprise isort quopri
    [...]
    """

    Put it in a list, unmangle it, sort it and you should have an alphabetical >> list of all modules on your system.

    As someone who has done a lot of work with email in other languages,
    "quopri" is not a name I'd expect or look for first pass for dealing
    with MIME quoted-printable encoding. (Me, being me, I'd probably just
    write it for myself if I didn't quickly find it while working with
    email.)

    Python went through a period of time where lots of things just got stuck
    in the standard library without any particula taxonomy. Hence ending up
    with base64, binascii, binhex, quopri, and uu all being separate
    top-level modules, only some of which got tidied up in Python 3.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli the Bearded@21:1/5 to gilmeh.serda@nothing.here.invalid on Fri Nov 1 20:55:20 2024
    In comp.lang.python, Gilmeh Serda <gilmeh.serda@nothing.here.invalid> wrote:
    Python 3.12.6 (main, Sep 8 2024, 13:18:56) [GCC 14.2.1 20240805] on linux Type "help", "copyright", "credits" or "license" for more information.
    help('modules')

    Please wait a moment while I gather a list of all available modules...

    AssemblyApp apparmor io pyzipper AssemblyGui appdirs ipaddress qrtools CAMSimulator application_utility isodate queue
    Cheetah apprise isort quopri
    [...]
    """

    Put it in a list, unmangle it, sort it and you should have an alphabetical list of all modules on your system.

    As someone who has done a lot of work with email in other languages,
    "quopri" is not a name I'd expect or look for first pass for dealing
    with MIME quoted-printable encoding. (Me, being me, I'd probably just
    write it for myself if I didn't quickly find it while working with
    email.)

    Elijah
    ------
    MIME: multipurpose Internet mail extensions

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Cameron Simpson@21:1/5 to Loris Bennett on Sat Nov 2 08:47:39 2024
    On 01Nov2024 10:10, Loris Bennett <loris.bennett@fu-berlin.de> wrote:
    as expected. The non-UTF-8 text occurs when I do

    mail = EmailMessage()
    mail.set_content(body, cte="quoted-printable")
    ...

    if args.verbose:
    print(mail)

    which is presumably also correct.

    The question is: What conversion is necessary in order to print the >EmailMessage object to the terminal, such that the quoted-printable
    parts are turned (back) into UTF-8?

    Do you still have access to `body` ? That would be the original message
    text? Otherwise maybe:

    print(mail.get_content())

    The objective is to obtain the message body Unicode text (i.e. a regular
    Python string with the original text, unencoded). And to print that.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Cameron Simpson@21:1/5 to Loris Bennett on Sat Nov 2 08:44:18 2024
    On 01Nov2024 08:11, Loris Bennett <loris.bennett@fu-berlin.de> wrote:
    Cameron Simpson <cs@cskk.id.au> writes:
    If you're using the Python email module to parse (or construct) the
    message as a `Message` object I'd expect that to happen automatically.

    I am using
    email.message.EmailMessage

    Noted. That seems like the correct approach to me.

    And you are right that encoding for the actual mail which is received
    is
    automatically sorted out. If I display the raw email in my client I get
    the following:

    Content-Type: text/plain; charset="utf-8"
    Content-Transfer-Encoding: quoted-printable
    ...
    Subject: =?utf-8?q?=C3=9Cbungsbetreff?=
    ...
    Dies ist eine =C3=9Cbung.

    Right. Quoted-printable encoding for the transport.

    I would interpret that as meaning that the subject and body are encoded
    in the same way.

    Yes.

    The problem just occurs with the unsent string representation printed to
    the terminal.

    Yes, and I was thinking abut this yesterday. I suspect that `print(some_message_object)` is intended to transcribe it for transport.
    For example, one could write to an mbox file and just print() the
    message into it and get correct transport/storage formatting, which
    includes the qp encoding.

    Can you should the code (or example code) which leads to the qp output?
    I suspect there's a straight forward way to get the decoded Unicode, but
    I'd need to see how what you've got was obtained.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Cameron Simpson@21:1/5 to alan.gauld@yahoo.co.uk on Sat Nov 2 08:52:22 2024
    On 31Oct2024 21:53, alan.gauld@yahoo.co.uk <alan.gauld@yahoo.co.uk> wrote:
    On 31/10/2024 20:50, Cameron Simpson via Python-list wrote:
    If you're just dealing with this directly, use the `quopri` stdlib
    module: https://docs.python.org/3/library/quopri.html

    One of the things I love about this list are these little features
    that I didn't know existed. Despite having used Python for over 25
    years, I've never noticed that module before! :-)

    Heh. And James Parrott caused me to discover the `subprocess.run(executable_path)` mode of `run()/Popen()`: a string with `shell=False` (the default) is an executable name.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Barry@21:1/5 to All on Fri Nov 1 22:10:43 2024
    On 31 Oct 2024, at 16:42, Left Right via Python-list <python-list@python.org> wrote:

    MS Windows doesn't have or use
    terminals (that's more of a Unix-related concept).

    Windows does now. They implemented this feature over the last few years.
    Indeed they took inspiration from how linux does this.

    You might find https://devblogs.microsoft.com/commandline/ has interesting articles about this.

    They also have implemented utf-8 as code page 65001.

    Barry

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)