Sysop: | Amessyroom |
---|---|
Location: | Fayetteville, NC |
Users: | 42 |
Nodes: | 6 (0 / 6) |
Uptime: | 01:03:09 |
Calls: | 220 |
Calls today: | 1 |
Files: | 824 |
Messages: | 121,521 |
Posted today: | 6 |
Is there some utility function out there that can be called to show what the regular expression you typed in will look like by the time it is ready to be used?
But - without having looked at the implementation - it's far from clear
that the compiled form would be useful to the user.
On 2024-10-11 17:13:07 -0400, AVI GROSS via Python-list wrote:
Is there some utility function out there that can be called to show what the >> regular expression you typed in will look like by the time it is ready to be >> used?
I assume that by "ready to be used" you mean the compiled form?
No, there doesn't seem to be a way to dump that. You can
p = re.compile("\\\\sout{")
print(p.pattern)
but that just prints the input string, which you could do without
compiling it first.
But - without having looked at the implementation - it's far from clear
that the compiled form would be useful to the user. It's probably some
kind of state machine, and a large table of state transitions isn't very readable.
There are a number of websites which visualize regular expressions.
Those are probably better for debugging a regular expression than
anything the re module could reasonably produce (although with the
caveat that such a web site would use a different implementation and therefore might produce different results).
hp
On 2024-10-11 22:13, AVI GROSS via Python-list wrote:
Is there some utility function out there that can be called to showYes. It's called 'print'. :-)
what the
regular expression you typed in will look like by the time it is ready
to be
used?
Obviously, life is not that simple as it can go through multiple
layers with
each dealing with a layer of backslashes.
But for simple cases, ...
\w+\\subimport re
re_string = '\\w+\\\\sub'
re_pattern = re.compile(re_string)
# Should look as if we had used r'\w+\\sub'
print(re_pattern.pattern)
-----Original Message-----
From: Python-list <python-list-
bounces+avi.e.gross=gmail.com@python.org> On
Behalf Of Gilmeh Serda via Python-list
Sent: Friday, October 11, 2024 10:44 AM
To: python-list@python.org
Subject: Re: Correct syntax for pathological re.search()
On Mon, 7 Oct 2024 08:35:32 -0500, Michael F. Stemper wrote:
I'm trying to discard lines that include the string "\sout{" (which is
TeX, for those who are curious. I have tried:
if not re.search("\sout{", line): if not re.search("\sout\{", line): >>> if not re.search("\\sout{", line): if not re.search("\\sout\{",
line):
But the lines with that string keep coming through. What is the right
syntax to properly escape the backslash and the left curly bracket?
$ python
Python 3.12.6 (main, Sep 8 2024, 13:18:56) [GCC 14.2.1 20240805] on
linux
Type "help", "copyright", "credits" or "license" for more information.
<re.Match object; span=(8, 14), match='\\sout{'>import re
s = r"testing \sout{WHADDEVVA}"
re.search(r"\\sout{", s)
You want a literal backslash, hence, you need to escape everything.
It is not enough to escape the "\s" as "\\s", because that only takes
care
of Python's demands for escaping "\". You also need to escape the "\" for
the RegEx as well, or it will read it like it means "\s", which is the
RegEx for a space character and therefore your search doesn't match,
because it reads it like you want to search for " out{".
Therefore, you need to escape it either as per my example, or by using
four "\" and no "r" in front of the first quote, which also works:
<re.Match object; span=(8, 14), match='\\sout{'>re.search("\\\\sout{", s)
You don't need to escape the curly braces. We call them "seagull wings"
where I live.
You don't need to escape the curly braces.
if not re.search("\\sout\{", line):
For now, I'll use the "r" in a cargo-cult fashion, until I decide which >syntax I prefer. (Is there any reason that one or the other is preferable?)
"Michael F. Stemper" <michael.stemper@gmail.com> wrote or quoted:
For now, I'll use the "r" in a cargo-cult fashion, until I decide which >>syntax I prefer. (Is there any reason that one or the other is preferable?)
I'd totally go with the r-style notation!
It's got one bummer though - you can't end such a string literal with
a backslash. But hey, no biggie, you could use one of those notations:
main.py
path = r'C:\Windows\example' + '\\'
print( path )
path = r'''
C:\Windows\example\
'''.strip()
print( path )
stdout
C:\Windows\example\
C:\Windows\example\
.
Peter J. Holzer ha scritto:
As a trivial example, the regular expressions r"\\sout{" and r"\\sout\{" are equivalent (the \ before the { is redundant). Yet
re.compile(s).pattern preserves the difference between the two strings.
Allow me to be fussy: r"\\sout{" and r"\\sout\{" are similar but not equivalent.
If you omit the backslash, the parser will have to determine if the
graph is part of regular expression {n, m} and will take more time.
On 2024-10-19 00:15:23 +0200, jak via Python-list wrote:. . .
Allow me to be fussy: r"\\sout{" and r"\\sout\{" are similar but not >>equivalent.
Yes, that's the parser. But the result of parsing will be the same:
The string will end in a literal backslash.
"Michael F. Stemper" <michael.stemper@gmail.com> wrote or quoted:
path = r'C:\Windows\example' + '\\'
I'm trying to discard lines that include the string "\sout{" (which is TeX, for
those who are curious. I have tried:
if not re.search("\sout{", line):
if not re.search("\sout\{", line):
if not re.search("\\sout{", line):
if not re.search("\\sout\{", line):
But the lines with that string keep coming through. What is the right syntax to
properly escape the backslash and the left curly bracket?
Am Mon, Oct 07, 2024 at 08:35:32AM -0500 schrieb Michael F. Stemper via Python-list:
I'm trying to discard lines that include the string "\sout{" (which is TeX, for
those who are curious. I have tried:
if not re.search("\sout{", line):
if not re.search("\sout\{", line):
if not re.search("\\sout{", line):
if not re.search("\\sout\{", line):
unwanted_tex = '\sout{'
if unwanted_tex not in line: do_something_with_libreoffice()
I'm trying to discard lines that include the string "\sout{" (which is TeX, for
those who are curious. I have tried:
if not re.search("\sout{", line):
if not re.search("\sout\{", line):
if not re.search("\\sout{", line):
if not re.search("\\sout\{", line):
"\\\\chardef \\\\\\\\ = '\\\\\\\\".
However, regex also uses backslash as an escape character.
unwanted_tex = '\sout{'
if unwanted_tex not in line: do_something_with_libreoffice()
That should be:
unwanted_tex = r'\sout{'
'\\sout{'tex = '\sout{'
tex
Karsten Hilbert <Karsten.Hilbert@gmx.net> writes:
Python 3.11.2 (main, Aug 26 2024, 07:20:54) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> tex = '\sout{'
>>> tex
'\\sout{'
>>>
Am I missing something ?
You're missing the warning it generates:
> python -E -Wonce
Python 3.11.2 (main, Aug 26 2024, 07:20:54) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> tex = '\sout{'
<stdin>:1: DeprecationWarning: invalid escape sequence '\s'
>>>
Karsten Hilbert <Karsten.Hilbert@gmx.net> writes:
Python 3.11.2 (main, Aug 26 2024, 07:20:54) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> tex = '\sout{'
>>> tex
'\\sout{'
>>>
Am I missing something ?
You're missing the warning it generates:
<stdin>:1: DeprecationWarning: invalid escape sequence '\s'
I'm trying to discard lines that include the string "\sout{" (which is
TeX, for those who are curious. I have tried:
if not re.search("\sout{", line): if not re.search("\sout\{", line):
if not re.search("\\sout{", line): if not re.search("\\sout\{",
line):
But the lines with that string keep coming through. What is the right
syntax to properly escape the backslash and the left curly bracket?
<re.Match object; span=(8, 14), match='\\sout{'>import re
s = r"testing \sout{WHADDEVVA}"
re.search(r"\\sout{", s)
<re.Match object; span=(8, 14), match='\\sout{'>re.search("\\\\sout{", s)
Is there some utility function out there that can be called to show what the regular expression you typed in will look like by the time it is ready to be used?
Obviously, life is not that simple as it can go through multiple layers with each dealing with a layer of backslashes.
But for simple cases, ...
-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On Behalf Of Gilmeh Serda via Python-list
Sent: Friday, October 11, 2024 10:44 AM
To: python-list@python.org
Subject: Re: Correct syntax for pathological re.search()
On Mon, 7 Oct 2024 08:35:32 -0500, Michael F. Stemper wrote:
I'm trying to discard lines that include the string "\sout{" (which is
TeX, for those who are curious. I have tried:
if not re.search("\sout{", line): if not re.search("\sout\{", line):
if not re.search("\\sout{", line): if not re.search("\\sout\{",
line):
But the lines with that string keep coming through. What is the right
syntax to properly escape the backslash and the left curly bracket?
$ python
Python 3.12.6 (main, Sep 8 2024, 13:18:56) [GCC 14.2.1 20240805] on linux Type "help", "copyright", "credits" or "license" for more information.
<re.Match object; span=(8, 14), match='\\sout{'>import re
s = r"testing \sout{WHADDEVVA}"
re.search(r"\\sout{", s)
You want a literal backslash, hence, you need to escape everything.
It is not enough to escape the "\s" as "\\s", because that only takes care
of Python's demands for escaping "\". You also need to escape the "\" for
the RegEx as well, or it will read it like it means "\s", which is the
RegEx for a space character and therefore your search doesn't match,
because it reads it like you want to search for " out{".
Therefore, you need to escape it either as per my example, or by using
four "\" and no "r" in front of the first quote, which also works:
<re.Match object; span=(8, 14), match='\\sout{'>re.search("\\\\sout{", s)
You don't need to escape the curly braces. We call them "seagull wings"
where I live.
On 10/12/2024 6:59 AM, Peter J. Holzer via Python-list wrote:
On 2024-10-11 17:13:07 -0400, AVI GROSS via Python-list wrote:
Is there some utility function out there that can be called to show what the
regular expression you typed in will look like by the time it is ready to be
used?
I assume that by "ready to be used" you mean the compiled form?
No, there doesn't seem to be a way to dump that. You can
p = re.compile("\\\\sout{")
print(p.pattern)
but that just prints the input string, which you could do without
compiling it first.
It prints the escaped version,
so you can see if you escaped the string as you intended. In this
case, the print will display '\\sout{'.
As a trivial example, the regular expressions r"\\sout{" and r"\\sout\{"
are equivalent (the \ before the { is redundant). Yet
re.compile(s).pattern preserves the difference between the two strings.