Sysop: | Amessyroom |
---|---|
Location: | Fayetteville, NC |
Users: | 23 |
Nodes: | 6 (0 / 6) |
Uptime: | 54:31:29 |
Calls: | 583 |
Files: | 1,139 |
D/L today: |
179 files (27,921K bytes) |
Messages: | 111,800 |
On 2024-11-22, Muttley@DastartdlyHQ.org <Muttley@DastartdlyHQ.org> wrote:
Its not that simple I'm afraid since comments can be commented out.
Umm, no.
eg:
// int i; /*
This /* sequence is inside a // comment, and so the machinery that
recognizes /* as the start of a comment would never see it.
A C99 and C++ compiler would see "int j" and compile it, a regex would
simply remove everything from the first /* to */.
No, it won't, because that's not how regexes are used in a lexical
Also the same probably applies to #ifdef's.
Lexically analyzing C requires implementing the translation phases
as described in the standard. There are preprocessor phases which
delimit the input into preprocessor tokens (pp-tokens). Comments
are stripped in preprocessing. But logical lines (backslash
continuations) are recognized below comments; i.e. this is one
comment:
On 20.11.2024 12:46, Ed Morton wrote:
Definitely. The most relevant statement about regexps is this:
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.
(Worth a scribbling on a WC wall.)
Obviously regexps are very useful and commonplace but if you find you
have to use some online site or other tools to help you write/understand
one or just generally need more than a couple of minutes to
write/understand it then it's time to back off and figure out a better
way to write your code for the sake of whoever has to read it 6 months
later (and usually for robustness too as it's hard to be sure all rainy
day cases are handled correctly in a lengthy and/or complicated regexp).
Regexps are nothing for newbies.
The inherent fine thing with Regexps is that you can incrementally
compose them[*].[**]
It seems you haven't found a sensible way to work with them?
(And I'm really astonished about that since I know you worked with
Regexps for years if not decades.)
In those cases where Regexps *are* the tool for a specific task -
I don't expect you to use them where they are inappropriate?! -
what would be the better solution[***] then?
Janis
[*] Like the corresponding FSMs.
[**] And you can also decompose them if they are merged in a huge
expression, too large for you to grasp it. (BTW, I'm doing such decompositions also with other expressions in program code that
are too bulky.)
[***] Can you answer the question that another poster failed to do?
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
[...]
Personally I think that writing bulky procedural stuff for
something like [0-9]+ can only be much worse, and that further
abbreviations like \d+ are the better direction to go if targeting
a good interface. YMMV.
Assuming that p is a pointer to the current position in a string, e
is a pointer to the end of it (ie, point just past the last byte)
and - that's important - both are pointers to unsigned quantities,
the 'bulky' C equivalent of [0-9]+ is
while (p < e && *p - '0' < 10) ++p;
Here is an example: using a regex match to capture a C comment /* ... */
in Lex compared to just recognizing the start sequence /* and handling
the discarding of the comment in the action.
Without non-greedy repetition matching, the regex for a C comment is
quite obtuse. The procedural handling is straightforward: read
characters until you see a * immediately followed by a /.