• Re: Command Languages Versus Programming Languages

    From Muttley@Muttley@dastardlyhq.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Sat Nov 23 11:40:37 2024
    From Newsgroup: comp.lang.misc

    On Fri, 22 Nov 2024 18:18:04 -0000 (UTC)
    Kaz Kylheku <643-408-1753@kylheku.com> gabbled:
    On 2024-11-22, Muttley@DastartdlyHQ.org <Muttley@DastartdlyHQ.org> wrote:
    Its not that simple I'm afraid since comments can be commented out.

    Umm, no.

    Umm, yes, they can.

    eg:

    // int i; /*

    This /* sequence is inside a // comment, and so the machinery that
    recognizes /* as the start of a comment would never see it.

    Yes, thats kind of the point. You seem to be arguing against yourself.

    A C99 and C++ compiler would see "int j" and compile it, a regex would
    simply remove everything from the first /* to */.

    No, it won't, because that's not how regexes are used in a lexical

    Yes, it will.

    Also the same probably applies to #ifdef's.

    Lexically analyzing C requires implementing the translation phases
    as described in the standard. There are preprocessor phases which
    delimit the input into preprocessor tokens (pp-tokens). Comments
    are stripped in preprocessing. But logical lines (backslash
    continuations) are recognized below comments; i.e. this is one
    comment:

    Not sure what your point is. A regex cannot be used to parse C comments because its doesn't know C/C++ grammar.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Ed Morton@mortonspam@gmail.com to comp.unix.shell,comp.unix.programmer,comp.lang.misc on Sat Nov 23 18:17:41 2024
    From Newsgroup: comp.lang.misc

    On 11/20/2024 9:53 AM, Janis Papanagnou wrote:
    On 20.11.2024 12:46, Ed Morton wrote:

    Definitely. The most relevant statement about regexps is this:

    Some people, when confronted with a problem, think "I know, I'll use
    regular expressions." Now they have two problems.

    (Worth a scribbling on a WC wall.)


    Obviously regexps are very useful and commonplace but if you find you
    have to use some online site or other tools to help you write/understand
    one or just generally need more than a couple of minutes to
    write/understand it then it's time to back off and figure out a better
    way to write your code for the sake of whoever has to read it 6 months
    later (and usually for robustness too as it's hard to be sure all rainy
    day cases are handled correctly in a lengthy and/or complicated regexp).

    Regexps are nothing for newbies.

    The inherent fine thing with Regexps is that you can incrementally
    compose them[*].[**]

    It seems you haven't found a sensible way to work with them?
    (And I'm really astonished about that since I know you worked with
    Regexps for years if not decades.)

    I have no problem working with regexps, I just don't write lengthy or complicated regexps, just brief, simple BREs or EREs, and I don't
    restrict myself to trying to solve problems with a single regexp.

    In those cases where Regexps *are* the tool for a specific task -
    I don't expect you to use them where they are inappropriate?! -

    Right, I don't, but I see many people using them for tasks that could be
    done more clearly and robustly if not done with a single regexp.

    what would be the better solution[***] then?

    It all depends on the problem. For example, if you need to match an
    input string that must contain each of a, b, and c in any order then you
    could do that in awk with this regexp or similar:

    awk '/(a.*(b.*c|c.*b))|(b.*(a.*c|c.*a))|(c.*(a.*b|b.*a))/'

    or you could do it with this condition comprised of regexp segments:

    awk '/a/ && /b/ && /c/'

    I would prefer the second solution as it's more concise and easier to
    enhance (try adding "and d" to both).

    As another example, someone on StackOverflow recently said they had
    written the following regexp to isolate the last string before a set of
    parens in a line that contains multiple such strings, some of them
    nested, and they said it works in python:

    ^(?:^[^(]+\([^)]+\) \(([^(]+)\([^)]+\)\))|[^(]+\(([^(]+)\([^)]+\),\s([^\(]+)\([^)]+\)\s\([^\)]+\)\)|(?:(?:.*?)\((.*?)\(.*?\)\))|(?:[^(]+\(([^)]+)\))$

    I personally wouldn't consider anything remotely as lengthy or
    complicated as that regexp despite their assurances that it works, I'd
    use this any-awk script or similar instead:

    {
    rec = $0
    while ( match(rec, /\([^()]*)/) ) {
    tgt = substr($0,RSTART+1,RLENGTH-2)
    rec = substr(rec,1,RSTART-1) RS substr(rec,RSTART+1,RLENGTH-2)
    RS substr(rec,RSTART+RLENGTH)
    }
    gsub(/ *\([^()]*) */, "", tgt)
    print tgt
    }

    It's a bit more code but, unlike that regexp, anyone assigned to
    maintain this code in future can tell what it does with just a little
    thought (and maybe adding a debugging print in the loop if they aren't
    very familiar with awk), can then be sure it does what is required and
    nothing else, and could easily maintain/enhance it if necessary.

    Ed.


    Janis

    [*] Like the corresponding FSMs.

    [**] And you can also decompose them if they are merged in a huge
    expression, too large for you to grasp it. (BTW, I'm doing such decompositions also with other expressions in program code that
    are too bulky.)

    [***] Can you answer the question that another poster failed to do?


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.misc on Sun Nov 24 06:42:59 2024
    From Newsgroup: comp.lang.misc

    Rainer Weikusat <rweikusat@talktalk.net> writes:

    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

    [...]

    Personally I think that writing bulky procedural stuff for
    something like [0-9]+ can only be much worse, and that further
    abbreviations like \d+ are the better direction to go if targeting
    a good interface. YMMV.

    Assuming that p is a pointer to the current position in a string, e
    is a pointer to the end of it (ie, point just past the last byte)
    and - that's important - both are pointers to unsigned quantities,
    the 'bulky' C equivalent of [0-9]+ is

    while (p < e && *p - '0' < 10) ++p;

    To force the comparison to be done as unsigned:

    while (p < e && *p - '0' < 10u) ++p;
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.misc on Sun Nov 24 20:08:24 2024
    From Newsgroup: comp.lang.misc

    Kaz Kylheku <643-408-1753@kylheku.com> writes:

    Here is an example: using a regex match to capture a C comment /* ... */
    in Lex compared to just recognizing the start sequence /* and handling
    the discarding of the comment in the action.

    Without non-greedy repetition matching, the regex for a C comment is
    quite obtuse. The procedural handling is straightforward: read
    characters until you see a * immediately followed by a /.

    Regular expressions are neither greedy nor non-greedy. One of the
    key points of regular expressions is that they are declarative
    rather than procedural. Any procedural change of behavior overlaid
    on a regular expression is a property of the tool, not the regular
    expression. It's easy to write a regular expression that exactly
    matches a /* ... */ comment and that isn't hard to understand.
    --- Synchronet 3.21a-Linux NewsLink 1.2