Sysop: | Amessyroom |
---|---|
Location: | Fayetteville, NC |
Users: | 42 |
Nodes: | 6 (0 / 6) |
Uptime: | 00:34:01 |
Calls: | 220 |
Calls today: | 1 |
Files: | 824 |
Messages: | 121,520 |
Posted today: | 6 |
On 1 Nov 2024, at 22:57, Left Right <olegsivokon@gmail.com> wrote:
Does this Windows Terminal support the use
of programs like tmux?
On 01Nov2024 10:10, Loris Bennett <loris.bennett@fu-berlin.de> wrote:
as expected. The non-UTF-8 text occurs when I do
mail = EmailMessage()
mail.set_content(body, cte="quoted-printable")
...
if args.verbose:
print(mail)
which is presumably also correct.
The question is: What conversion is necessary in order to print the >>EmailMessage object to the terminal, such that the quoted-printable
parts are turned (back) into UTF-8?
Do you still have access to `body` ? That would be the original
message text? Otherwise maybe:
print(mail.get_content())
The objective is to obtain the message body Unicode text (i.e. a
regular Python string with the original text, unencoded). And to print
that.
2024年11月2日(土) 0:36 Loris Bennett via Python-list <python-list@python.org>:
Left Right <olegsivokon@gmail.com> writes:
There's quite a lot of misuse of terminology around terminal / consolehttps://superuser.com/questions/269818/change-default-code-page-of-windows-console-to-utf-8
/ shell. Please, correct me if I'm wrong, but it looks like you are
printing that on MS Windows, right? MS Windows doesn't have or use
terminals (that's more of a Unix-related concept). And, by "terminal"
I mean terminal emulator (i.e. a program that emulates the behavior of
a physical terminal). You can, of course, find some terminal programs
for windows (eg. mintty), but I doubt that that's what you are dealing
with.
What MS Windows users usually end up using is the console. If you
run, eg. cmd.exe, it will create a process that displays a graphical
console. The console uses an encoding scheme to represent the text
output. I believe that the default on MS Windows is to use some
single-byte encoding. This answer from SE family site tells you how to
set the console encoding to UTF-8 permanently:
, which, I believe, will solve your problem with how the text is
displayed.
I'm not using MS Windows. I am using a Gnome terminal on Debian 12
locally and connecting via SSH to a AlmaLinux 8 server, where I start a
tmux session.
On Thu, Oct 31, 2024 at 5:19 PM Loris Bennett via Python-list
<python-list@python.org> wrote:
Hi,
I have a command-line program which creates an email containing German
umlauts. On receiving the mail, my mail client displays the subject and >> >> body correctly:
Subject: Übung
Sehr geehrter Herr Dr. Bennett,
Dies ist eine Übung.
So far, so good. However, when I use the --verbose option to print
the mail to the terminal via
if args.verbose:
print(mail)
I get:
Subject: Übungsbetreff
Sehr geehrter Herr Dr. Bennett,
Dies ist eine =C3=9Cbung.
What do I need to do to prevent the body from getting mangled?
I seem to remember that I had issues in the past with a Perl version of >> >> a similar program. As far as I recall there was an issue with fact the >> >> greeting is generated by querying a server, whereas the body is being
read from a file, which lead to oddities when the two bits were
concatenated. But that might just have been a Perl thing.
Try PYTHONUTF8=1 envver.
Cameron Simpson <cs@cskk.id.au> writes:
On 01Nov2024 10:10, Loris Bennett <loris.bennett@fu-berlin.de> wrote:
as expected. The non-UTF-8 text occurs when I do
mail = EmailMessage()
mail.set_content(body, cte="quoted-printable")
...
if args.verbose:
print(mail)
which is presumably also correct.
The question is: What conversion is necessary in order to print the >>>EmailMessage object to the terminal, such that the quoted-printable
parts are turned (back) into UTF-8?
Do you still have access to `body` ? That would be the original
message text? Otherwise maybe:
print(mail.get_content())
The objective is to obtain the message body Unicode text (i.e. a
regular Python string with the original text, unencoded). And to print
that.
With the following:
######################################################################
import email.message
m = email.message.EmailMessage()
m['Subject'] = 'Übung'
m.set_content('Dies ist eine Übung')
print('== cte: default == \n')
print(m)
print('-- full mail ---')
print(m)
print('-- just content--')
print(m.get_content())
m.set_content('Dies ist eine Übung', cte='quoted-printable')
print('== cte: quoted-printable ==\n')
print('-- full mail --')
print(m)
print('-- just content --')
print(m.get_content())
######################################################################
I get the following output:
######################################################################
== cte: default ==
Subject: Übung
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
RGllcyBpc3QgZWluZSDDnGJ1bmcK
-- full mail ---
Subject: Übung
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
RGllcyBpc3QgZWluZSDDnGJ1bmcK
-- just content--
Dies ist eine Übung
== cte: quoted-printable ==
-- full mail --
Subject: Übung
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Dies ist eine =C3=9Cbung
-- just content --
Dies ist eine Übung
######################################################################
So in both cases the subject is fine, but it is unclear to me how to
print the body. Or rather, I know how to print the body OK, but I don't
know how to print the headers separately - there seems to be nothing
like 'get_headers()'. I can use 'get('Subject) etc. and reconstruct the headers, but that seems a little clunky.
"Loris Bennett" <loris.bennett@fu-berlin.de> writes:
Cameron Simpson <cs@cskk.id.au> writes:
On 01Nov2024 10:10, Loris Bennett <loris.bennett@fu-berlin.de> wrote: >>>>as expected. The non-UTF-8 text occurs when I do
mail = EmailMessage()
mail.set_content(body, cte="quoted-printable")
...
if args.verbose:
print(mail)
which is presumably also correct.
The question is: What conversion is necessary in order to print the >>>>EmailMessage object to the terminal, such that the quoted-printable >>>>parts are turned (back) into UTF-8?
Do you still have access to `body` ? That would be the original
message text? Otherwise maybe:
print(mail.get_content())
The objective is to obtain the message body Unicode text (i.e. a
regular Python string with the original text, unencoded). And to print
that.
With the following:
######################################################################
import email.message
m = email.message.EmailMessage()
m['Subject'] = 'Übung'
m.set_content('Dies ist eine Übung')
print('== cte: default == \n')
print(m)
print('-- full mail ---')
print(m)
print('-- just content--')
print(m.get_content())
m.set_content('Dies ist eine Übung', cte='quoted-printable')
print('== cte: quoted-printable ==\n')
print('-- full mail --')
print(m)
print('-- just content --')
print(m.get_content())
######################################################################
I get the following output:
######################################################################
== cte: default ==
Subject: Übung
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
RGllcyBpc3QgZWluZSDDnGJ1bmcK
-- full mail ---
Subject: Übung
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
RGllcyBpc3QgZWluZSDDnGJ1bmcK
-- just content--
Dies ist eine Übung
== cte: quoted-printable ==
-- full mail --
Subject: Übung
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Dies ist eine =C3=9Cbung
-- just content --
Dies ist eine Übung
######################################################################
So in both cases the subject is fine, but it is unclear to me how to
print the body. Or rather, I know how to print the body OK, but I don't
know how to print the headers separately - there seems to be nothing
like 'get_headers()'. I can use 'get('Subject) etc. and reconstruct the
headers, but that seems a little clunky.
Sorry, I am confusing the terminology here. The 'body' seems to be the headers plus the 'content'. So I can print the *content* without the
headers OK, but I can't easily print all the headers separately. If
just print the body, i.e. headers plus content, the umlauts in the
content are not resolved.
"Loris Bennett" <loris.bennett@fu-berlin.de> writes:[...]
"Loris Bennett" <loris.bennett@fu-berlin.de> writes:
Cameron Simpson <cs@cskk.id.au> writes:
On 01Nov2024 10:10, Loris Bennett <loris.bennett@fu-berlin.de> wrote: >>>>as expected. The non-UTF-8 text occurs when I do
mail = EmailMessage()
mail.set_content(body, cte="quoted-printable")
...
if args.verbose:
print(mail)
which is presumably also correct.
The question is: What conversion is necessary in order to print the >>>>EmailMessage object to the terminal, such that the quoted-printable >>>>parts are turned (back) into UTF-8?
OK, so I can do:
######################################################################
if args.verbose:
for k in mail.keys():
print(f"{k}: {mail.get(k)}")
print('')
print(mail.get_content()) ######################################################################
prints what I want and is not wildly clunky, but I am a little surprised
that I can't get a string representation of the whole email in one go.
OK, so I can do:
######################################################################
if args.verbose:
for k in mail.keys():
print(f"{k}: {mail.get(k)}")
print('')
print(mail.get_content())
######################################################################
prints what I want and is not wildly clunky, but I am a little surprised
that I can't get a string representation of the whole email in one go.
I have a command-line program which creates an email containing German >umlauts. On receiving the mail, my mail client displays the subject and[...]
body correctly:
So far, so good. However, when I use the --verbose option to print
the mail to the terminal via
if args.verbose:
print(mail)
I get:
Subject: Übungsbetreff
Sehr geehrter Herr Dr. Bennett,
Dies ist eine =C3=9Cbung.
What do I need to do to prevent the body from getting mangled?
On 31Oct2024 16:33, Loris Bennett <loris.bennett@fu-berlin.de> wrote:
I have a command-line program which creates an email containing German >>umlauts. On receiving the mail, my mail client displays the subject and >>body correctly:[...]
So far, so good. However, when I use the --verbose option to print
the mail to the terminal via
if args.verbose:
print(mail)
I get:
Subject: Übungsbetreff
Sehr geehrter Herr Dr. Bennett,
Dies ist eine =C3=9Cbung.
What do I need to do to prevent the body from getting mangled?
That looks to me like quoted-printable. This is an encoding for binary transport of text to make it robust against not 8-buit clean
transports. So your Unicode text is encodings as UTF-8, and then that
is encoded in quoted-printable for transport through the email system.
Your terminal probably accepts UTF-8 - I imagine other German text
renders corectly?
You need to get the text and undo the quoted-printable encoding.
If you're using the Python email module to parse (or construct) the
message as a `Message` object I'd expect that to happen automatically.
There's quite a lot of misuse of terminology around terminal / console
/ shell. Please, correct me if I'm wrong, but it looks like you are
printing that on MS Windows, right? MS Windows doesn't have or use
terminals (that's more of a Unix-related concept). And, by "terminal"
I mean terminal emulator (i.e. a program that emulates the behavior of
a physical terminal). You can, of course, find some terminal programs
for windows (eg. mintty), but I doubt that that's what you are dealing
with.
What MS Windows users usually end up using is the console. If you
run, eg. cmd.exe, it will create a process that displays a graphical
console. The console uses an encoding scheme to represent the text
output. I believe that the default on MS Windows is to use some
single-byte encoding. This answer from SE family site tells you how to
set the console encoding to UTF-8 permanently: https://superuser.com/questions/269818/change-default-code-page-of-windows-console-to-utf-8
, which, I believe, will solve your problem with how the text is
displayed.
On Thu, Oct 31, 2024 at 5:19 PM Loris Bennett via Python-list <python-list@python.org> wrote:--
Hi,
I have a command-line program which creates an email containing German
umlauts. On receiving the mail, my mail client displays the subject and
body correctly:
Subject: Übung
Sehr geehrter Herr Dr. Bennett,
Dies ist eine Übung.
So far, so good. However, when I use the --verbose option to print
the mail to the terminal via
if args.verbose:
print(mail)
I get:
Subject: Übungsbetreff
Sehr geehrter Herr Dr. Bennett,
Dies ist eine =C3=9Cbung.
What do I need to do to prevent the body from getting mangled?
I seem to remember that I had issues in the past with a Perl version of
a similar program. As far as I recall there was an issue with fact the
greeting is generated by querying a server, whereas the body is being
read from a file, which lead to oddities when the two bits were
concatenated. But that might just have been a Perl thing.
Cheers,
Loris
--
This signature is currently under constuction.
--
https://mail.python.org/mailman/listinfo/python-list
Cameron Simpson <cs@cskk.id.au> writes:
On 31Oct2024 16:33, Loris Bennett <loris.bennett@fu-berlin.de> wrote:
I have a command-line program which creates an email containing German >>>umlauts. On receiving the mail, my mail client displays the subject and >>>body correctly:[...]
So far, so good. However, when I use the --verbose option to print
the mail to the terminal via
if args.verbose:
print(mail)
I get:
Subject: Übungsbetreff
Sehr geehrter Herr Dr. Bennett,
Dies ist eine =C3=9Cbung.
What do I need to do to prevent the body from getting mangled?
That looks to me like quoted-printable. This is an encoding for binary
transport of text to make it robust against not 8-buit clean
transports. So your Unicode text is encodings as UTF-8, and then that
is encoded in quoted-printable for transport through the email system.
As I mentioned, I think the problem is to do with the way the salutation
text provided by the "salutation server" and the mail body from a file
are encoded. This seems to be different.
Your terminal probably accepts UTF-8 - I imagine other German text
renders corectly?
Yes, it does.
You need to get the text and undo the quoted-printable encoding.
If you're using the Python email module to parse (or construct) the
message as a `Message` object I'd expect that to happen automatically.
I am using
email.message.EmailMessage
as, from the Python documentation
https://docs.python.org/3/library/email.examples.html
I gathered that that is the standard approach.
And you are right that encoding for the actual mail which is received is automatically sorted out. If I display the raw email in my client I get
the following:
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
...
Subject: =?utf-8?q?=C3=9Cbungsbetreff?=
...
Dies ist eine =C3=9Cbung.
I would interpret that as meaning that the subject and body are encoded
in the same way.
The problem just occurs with the unsent string representation printed to
the terminal.
...
mail.set_content(body, cte="quoted-printable")
never noticed that module before!
help('modules')
In comp.lang.python, Gilmeh Serda <gilmeh.serda@nothing.here.invalid> wrote:
Python 3.12.6 (main, Sep 8 2024, 13:18:56) [GCC 14.2.1 20240805] on linux >> Type "help", "copyright", "credits" or "license" for more information.
help('modules')
Please wait a moment while I gather a list of all available modules...
AssemblyApp apparmor io pyzipper
AssemblyGui appdirs ipaddress qrtools
CAMSimulator application_utility isodate queue
Cheetah apprise isort quopri
[...]
"""
Put it in a list, unmangle it, sort it and you should have an alphabetical >> list of all modules on your system.
As someone who has done a lot of work with email in other languages,
"quopri" is not a name I'd expect or look for first pass for dealing
with MIME quoted-printable encoding. (Me, being me, I'd probably just
write it for myself if I didn't quickly find it while working with
email.)
Python 3.12.6 (main, Sep 8 2024, 13:18:56) [GCC 14.2.1 20240805] on linux Type "help", "copyright", "credits" or "license" for more information.
help('modules')
Please wait a moment while I gather a list of all available modules...
AssemblyApp apparmor io pyzipper AssemblyGui appdirs ipaddress qrtools CAMSimulator application_utility isodate queue
Cheetah apprise isort quopri
[...]
"""
Put it in a list, unmangle it, sort it and you should have an alphabetical list of all modules on your system.
as expected. The non-UTF-8 text occurs when I do
mail = EmailMessage()
mail.set_content(body, cte="quoted-printable")
...
if args.verbose:
print(mail)
which is presumably also correct.
The question is: What conversion is necessary in order to print the >EmailMessage object to the terminal, such that the quoted-printable
parts are turned (back) into UTF-8?
Cameron Simpson <cs@cskk.id.au> writes:
If you're using the Python email module to parse (or construct) the
message as a `Message` object I'd expect that to happen automatically.
I am using
email.message.EmailMessage
And you are right that encoding for the actual mail which is received
is
automatically sorted out. If I display the raw email in my client I get
the following:
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
...
Subject: =?utf-8?q?=C3=9Cbungsbetreff?=
...
Dies ist eine =C3=9Cbung.
I would interpret that as meaning that the subject and body are encoded
in the same way.
The problem just occurs with the unsent string representation printed to
the terminal.
On 31/10/2024 20:50, Cameron Simpson via Python-list wrote:
If you're just dealing with this directly, use the `quopri` stdlib
module: https://docs.python.org/3/library/quopri.html
One of the things I love about this list are these little features
that I didn't know existed. Despite having used Python for over 25
years, I've never noticed that module before! :-)
On 31 Oct 2024, at 16:42, Left Right via Python-list <python-list@python.org> wrote:
MS Windows doesn't have or use
terminals (that's more of a Unix-related concept).