Forum: Too Lazy BBS

Who's Online
Recent Visitors
- Kawasu
  Thu Oct 16 10:17:15 2025
  from Mena, Ar via Telnet
- Geek2
  Thu Oct 16 06:39:58 2025
  from Euclid, Oh via Telnet
- Amr
  Tue Oct 14 21:13:21 2025
  from Fayetteville, Nc via Telnet
- Amr
  Tue Oct 14 20:34:34 2025
  from Fayetteville, Nc via Telnet

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	27
Nodes:	6 (0 / 6)
Uptime:	41:18:23
Calls:	631
Calls today:	2
Files:	1,187
D/L today:	24 files (29,813K bytes)
Messages:	174,725

Re: Command Languages Versus Programming Languages

From Muttley@Muttley@dastardlyhq.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Sat Nov 23 11:40:37 2024

From Newsgroup: comp.lang.misc

On Fri, 22 Nov 2024 18:18:04 -0000 (UTC)
Kaz Kylheku <643-408-1753@kylheku.com> gabbled:

On 2024-11-22, Muttley@DastartdlyHQ.org <Muttley@DastartdlyHQ.org> wrote:

Its not that simple I'm afraid since comments can be commented out.

Umm, no.

Umm, yes, they can.

eg:

// int i; /*

This /* sequence is inside a // comment, and so the machinery that
recognizes /* as the start of a comment would never see it.

Yes, thats kind of the point. You seem to be arguing against yourself.

A C99 and C++ compiler would see "int j" and compile it, a regex would
simply remove everything from the first /* to */.

No, it won't, because that's not how regexes are used in a lexical

Yes, it will.

Also the same probably applies to #ifdef's.

Lexically analyzing C requires implementing the translation phases
as described in the standard. There are preprocessor phases which
delimit the input into preprocessor tokens (pp-tokens). Comments
are stripped in preprocessing. But logical lines (backslash
continuations) are recognized below comments; i.e. this is one
comment:

Not sure what your point is. A regex cannot be used to parse C comments because its doesn't know C/C++ grammar.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Ed Morton@mortonspam@gmail.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Sat Nov 23 18:17:41 2024

From Newsgroup: comp.lang.misc

On 11/20/2024 9:53 AM, Janis Papanagnou wrote:

On 20.11.2024 12:46, Ed Morton wrote:

Definitely. The most relevant statement about regexps is this:

Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.

(Worth a scribbling on a WC wall.)

Obviously regexps are very useful and commonplace but if you find you
have to use some online site or other tools to help you write/understand
one or just generally need more than a couple of minutes to
write/understand it then it's time to back off and figure out a better
way to write your code for the sake of whoever has to read it 6 months
later (and usually for robustness too as it's hard to be sure all rainy
day cases are handled correctly in a lengthy and/or complicated regexp).

Regexps are nothing for newbies.

The inherent fine thing with Regexps is that you can incrementally
compose them[*].[**]

It seems you haven't found a sensible way to work with them?
(And I'm really astonished about that since I know you worked with
Regexps for years if not decades.)

I have no problem working with regexps, I just don't write lengthy or complicated regexps, just brief, simple BREs or EREs, and I don't
restrict myself to trying to solve problems with a single regexp.

In those cases where Regexps *are* the tool for a specific task -
I don't expect you to use them where they are inappropriate?! -

Right, I don't, but I see many people using them for tasks that could be
done more clearly and robustly if not done with a single regexp.

what would be the better solution[***] then?

It all depends on the problem. For example, if you need to match an
input string that must contain each of a, b, and c in any order then you
could do that in awk with this regexp or similar:

awk '/(a.*(b.*c|c.*b))|(b.*(a.*c|c.*a))|(c.*(a.*b|b.*a))/'

or you could do it with this condition comprised of regexp segments:

awk '/a/ && /b/ && /c/'

I would prefer the second solution as it's more concise and easier to
enhance (try adding "and d" to both).

As another example, someone on StackOverflow recently said they had
written the following regexp to isolate the last string before a set of
parens in a line that contains multiple such strings, some of them
nested, and they said it works in python:

^(?:^[^(]+$[^)]+$ $([^(]+)\([^)]+$\))|[^(]+$([^(]+)\([^)]+$,\s([^$]+)\([^)]+$\s$[^$]+\)\)|(?:(?:.*?)$(.*?)\(.*?$\))|(?:[^(]+$([^)]+)$)$

I personally wouldn't consider anything remotely as lengthy or
complicated as that regexp despite their assurances that it works, I'd
use this any-awk script or similar instead:

{
rec = $0
while ( match(rec, /\([^()]*)/) ) {
tgt = substr($0,RSTART+1,RLENGTH-2)
rec = substr(rec,1,RSTART-1) RS substr(rec,RSTART+1,RLENGTH-2)
RS substr(rec,RSTART+RLENGTH)
}
gsub(/ *\([^()]*) */, "", tgt)
print tgt
}

It's a bit more code but, unlike that regexp, anyone assigned to
maintain this code in future can tell what it does with just a little
thought (and maybe adding a debugging print in the loop if they aren't
very familiar with awk), can then be sure it does what is required and
nothing else, and could easily maintain/enhance it if necessary.

Ed.

Janis

[*] Like the corresponding FSMs.

[**] And you can also decompose them if they are merged in a huge
expression, too large for you to grasp it. (BTW, I'm doing such decompositions also with other expressions in program code that
are too bulky.)

[***] Can you answer the question that another poster failed to do?

--- Synchronet 3.21a-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.misc on Sun Nov 24 06:42:59 2024

From Newsgroup: comp.lang.misc

Rainer Weikusat <rweikusat@talktalk.net> writes:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

[...]

Personally I think that writing bulky procedural stuff for
something like [0-9]+ can only be much worse, and that further
abbreviations like \d+ are the better direction to go if targeting
a good interface. YMMV.

Assuming that p is a pointer to the current position in a string, e
is a pointer to the end of it (ie, point just past the last byte)
and - that's important - both are pointers to unsigned quantities,
the 'bulky' C equivalent of [0-9]+ is

while (p < e && *p - '0' < 10) ++p;

To force the comparison to be done as unsigned:

while (p < e && *p - '0' < 10u) ++p;
--- Synchronet 3.21a-Linux NewsLink 1.2

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.misc on Sun Nov 24 20:08:24 2024

From Newsgroup: comp.lang.misc

Kaz Kylheku <643-408-1753@kylheku.com> writes:

Here is an example: using a regex match to capture a C comment /* ... */
in Lex compared to just recognizing the start sequence /* and handling
the discarding of the comment in the action.

Without non-greedy repetition matching, the regex for a C comment is
quite obtuse. The procedural handling is straightforward: read
characters until you see a * immediately followed by a /.

Regular expressions are neither greedy nor non-greedy. One of the
key points of regular expressions is that they are declarative
rather than procedural. Any procedural change of behavior overlaid
on a regular expression is a property of the tool, not the regular
expression. It's easy to write a regular expression that exactly
matches a /* ... */ comment and that isn't hard to understand.
--- Synchronet 3.21a-Linux NewsLink 1.2

Who's Online

Recent Visitors

System Info

Re: Command Languages Versus Programming Languages