Sysop: | Amessyroom |
---|---|
Location: | Fayetteville, NC |
Users: | 42 |
Nodes: | 6 (0 / 6) |
Uptime: | 00:18:25 |
Calls: | 220 |
Calls today: | 1 |
Files: | 824 |
Messages: | 121,481 |
Posted today: | 6 |
In GNU Awk there's currently three types of regular expressions, in
addition to the standard regexp-constants (/regex/) and the dynamic
regexps ("regex", or variables containing "regex") there's in newer
versions also first class regexp objects (@/regex/, "Strongly Typed
Regexp Constants") supported.
One principal advantage of regexp-constants is that the engine to
parse the regexp can be created in advance, while a dynamic regexp
may be constructed dynamically (from strings) and needs an explicit runtime-step to create the engine before the matching can be done.
Now I assumed that @/regex-const/ would in that respect behave as
/regex-const/ ... - until I found in the GNU Awk manual this text:
|
| Thus, if you have something like this:
|
| re = @/don't panic/
| sub(/don't/, "do", re)
| print typeof(re), re
|
| then re retains its type, but now attempts to match the string ‘do
| panic’. This provides a (very indirect) way to create regexp-typed
| variables at runtime.
|
(I'm astonished that first class regexp objects can be dynamically
changed. But that is not my point here; I'm interested in potential pre-compiles of regexp constants...)
This would imply that the first class regexp constants can be changed
like dynamic regexps and that there's no regexp pre-compile involved.
And dynamic regexps and first class regexps that got changed (e.g.
by code like
sub(/don't/, "do[", re)
in above sample snippet) would both create runtime errors, e.g.
On 2024-11-28, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
[...]
And dynamic regexps and first class regexps that got changed (e.g.
by code like
sub(/don't/, "do[", re)
in above sample snippet) would both create runtime errors, e.g.
Have you tried this?
Do you get an error at sub() time, or when you later try to use re?
[...]
It could also (in combination with this) be lazy. [...]
Someone will undoubtedly chime in confirming or refuting these
hypotheses.
It would be pretty silly if these regex objects didn't cache a compiled
regex across multiple uses.
[...]
In GNU Awk there's currently three types of regular expressions, in
addition to the standard regexp-constants (/regex/) and the dynamic
regexps ("regex", or variables containing "regex") there's in newer
versions also first class regexp objects (@/regex/, "Strongly Typed
Regexp Constants") supported.
One principal advantage of regexp-constants is that the engine to
parse the regexp can be created in advance, while a dynamic regexp
may be constructed dynamically (from strings) and needs an explicit >runtime-step to create the engine before the matching can be done.
Now I assumed that @/regex-const/ would in that respect behave as
/regex-const/ ... - until I found in the GNU Awk manual this text:
| Thus, if you have something like this:
|
| re = @/don't panic/
| sub(/don't/, "do", re)
| print typeof(re), re
|
| then re retains its type, but now attempts to match the string ‘do
| panic’. This provides a (very indirect) way to create regexp-typed
| variables at runtime.
(I'm astonished that first class regexp objects can be dynamically
changed. But that is not my point here; I'm interested in potential >pre-compiles of regexp constants...)
This would imply that the first class regexp constants can be changed
like dynamic regexps and that there's no regexp pre-compile involved.
This would also rise suspicion that the "normal" regexp-constants are >probably also not precomputed.
So constant-regexps (both forms) have (only?) the advantage that the >regexp-syntax can be (initially during awk parsing) checked, e.g.,
re = @/don't panic[/
^ unterminated regexp
And dynamic regexps and first class regexps that got changed (e.g.
by code like
sub(/don't/, "do[", re)
in above sample snippet) would both create runtime errors, e.g.
error: Unmatched [, [^, [:, [., or [=: /do[ panic/
fatal: could not make typed regex
(as all ill-formed regexp-types will produce a runtime error).
Hi. Mack The Knife pointed me at this question.
This kind of query should go to the bug list (where I'll see it).
[ explanations snipped ]
In short, I jump through a lot of hoops in order to avoid recompiling
regexps if it's not necessary.
Hope this helps,
This kind of query should go to the bug list (where I'll see it).
Oh, I haven't considered what I wrote and suspected as a bug, so
it didn't occur to me to use a bug-mailing list.
And of course, you can always look at the source code.
This isn't meant as a statement of quality of software design
or existence of useful comments in GNU Awk. It's only so that
last time I looked into the sources (with the intention to add
new syntax and semantic for a feature I'd have liked) I wasn't
able to identify how to do it without doing harm to the code;
I'm lacking the familiarity with this source code. Of course I
could have looked into the source code instead of posting, but
the described experience lead me to not take that path.
Re "(where I'll see it)": My post's intention was not meant to
address/bother you personally - yet, all the more I appreciate
your reply! In this newsgroup there's also some folks who have
some expertise and might answer such questions.
And I'm not a "client" of the mailing list.
In article <vijlu6$35un4$1@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
[...]
You can always ask me directly.
Re "(where I'll see it)": My post's intention was not meant to
address/bother you personally - yet, all the more I appreciate
your reply! In this newsgroup there's also some folks who have
some expertise and might answer such questions.
True, but ultimately I'm authoritative. :-)
And I'm not a "client" of the mailing list.
You don't have to be subscribed to the bug list to send messages
there.