• Re: Prior art for operator parsing trick

    From Kaz Kylheku@643-408-1753@kylheku.com to comp.compilers on Sun Aug 3 22:16:40 2025
    From Newsgroup: comp.compilers

    On 2025-04-05, Kaz Kylheku <643-408-1753@kylheku.com> wrote:
    Hi all,

    I recently came up with neat trick in order to obtain particular
    (to me) desirable results out of an infix parser (which more
    or less follows Shunting Yard).

    This "precedence demotion" algorithm turns out to have some
    undesirable properties, leading to unintuitive parses, such as,
    oh, a + sqrt b * sqrt c parsing as (a + sqrt b) * (sqrt c).

    This led me to search for another way to capture the good stuff while
    avoiding the bad.

    When the loop is just about to process an infix operator O, we perform
    these steps before anything else:

    1. Determine whether the infix operator O is immediately followed by a
    sequence of one or more consecutive prefix operators P1, P2.

    We keep this step.

    2. If such a sequence is identified, we determine which of the Pi's
    has lowest precedence, and we subtract one from that precedence.

    We also keep this, but don't subtract one from the prededence.

    3. Then we compare the precedence of infix operator O with the
    precedence calculated in (3). If O has higher precedence, we
    *demote* it to the calculated precedence. (Only the currently
    occurring instance of that operator.)

    We replace this step entirely. No dynamic precedence adjustment takes
    place. Instead:

    3. We search the operator stack to find an prefix operator,
    which has a precdedence at least as high as the one determined in 2.
    If such an operator exists, we perform reductions until we remove it.
    If no such operator exists in the stack, we do nothing.

    4. We then continue with the usual algorithm, processing the operator
    stack based on comparing precedence and building nodes in the
    output tree and so on.

    Some live examples:

    1> (parse-infix '(sin x + x + cos y + y))
    (+ (sin (+ x x))
    (cos (+ y y)))

    [...]

    2> (parse-infix '(- x * x * - y * y))
    (* (- (* x x))
    (- (* y y)))

    [...]

    3> (parse-infix '(- x + x + - y + y)) ;; ... x + - y ...
    (+ (+ (+ (- x) x)
    (- y))
    y)

    [...]

    4> (parse-infix '(sin x + x + - cos y + y))
    (+ (sin (+ x x))
    (- (cos (+ y y))))

    I get exactly the same output on these, and all my existing test cases pass.

    Effectively we are still demoting the precedence of the infix operator,
    but in a surgical way: we are giving it lower precedence than
    specifically identified unparsed material to the left in the parsing
    stack, by doing reductions until we clear that material.

    E.g. in the case of

    (sin x + x + - cos y + y))

    At the second + , we see a sequence of prefix operators to
    the right, (- cos). The lowest precedence among them is that of cos.
    We the search the parsing stack until we find a prefix operator
    at least as low, and so sin is identified.

    What is going in is a kind of bracketing. We see some prefix operators to the right, and so we scan the left context to find a parallel prefix operator whose parse is not yet complete. If we find it, we then close off that operator, reducing it to a syntax node.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    --- Synchronet 3.21a-Linux NewsLink 1.2