• Simple to implement and to use

    From Christopher F Clark@christopher.f.clark@compiler-resources.com to comp.compilers on Sun Dec 11 19:41:50 2022
    From Newsgroup: comp.compilers

    Trying to bring this back to compilers (and their implementation).
    Over the years, I have noticed a couple of things in the various
    languages I have used.
    These are just my opinions and observations.

    ----------

    LL(1) parsing is good for statements. And a good rule of thumb is
    that every statement (except perhaps 1) should start with a keyword.
    The except perhaps 1 case is often assignment statements where you
    have lhs-expression assign-op rhs-expression. But, once you have that
    no other statement should start with an expression (without a
    keyword). You can make the keywords reserved in that context without
    undue burden to the user. PL/I's "decl" statement and Pascal's "var", "function", "procedure", etc statements are good examples of this.

    Curiously if you want a series of keywords to begin a statement, you
    should make the "reserved" keyword be last in the list or have
    something else that separates the list of keywords from the normal
    identifiers. In Yacc++ we have a variety of declarations that define
    tokens that are keywords, the reserved word for those declaration is
    "keyword" but we have a bunch of other words that aren't reserved that
    can modify keyword. Those words all must appear before keyword in the declaration. That way you can distinguish them from usage as
    identifiers. Doing that is easier with an LR grammar.

    e.g.

    case sensitive substring keyword keyword /* that keyword is an
    identifier */, case /* so is case */, substring /* and substring */;

    The first 4 words in the above declaration are all keywords, but then
    after the special keyword "keyword" those simply become identifiers,
    and the LR grammar has no issues telling those apart.

    An alternative formation might look like this:

    keyword keyword, case, sensitive : case sensitive substring;

    The colon (a reserved token) separates the modifying keywords from the
    list of identifiers.

    Note if I were doing a language like Pascal I might do it like:
    ("var"|"const") identifier (("," identifier)* (":" type-expr)? ("=" init-expr)? ("@" locatiion-expr)?)+ ";"

    Then in a type-expr, keywords like "int" and "float" become reserved,
    but not elsewhere.
    And after the at words like "static", or "heap" or "stack" would be reserved.

    ---------

    Languages with balance "parenthesizing" keywords are generally less
    ambiguous. if expr then stmt (else stmt)? fi where the if and fi
    match gets rid of dangling else problems and variations like if expr
    then stmt (else if expr then stmt)* (else stmt)? fi still don't have
    an issue. Note that in this case, you probably want "then" and "else"
    to be reserved words in your grammar or do something if "(" expr ")
    stmt (";" stmt)? fi // where ";" is a clear reserved token or if "("
    expr ")" "{" stmt "}" ("{" stmt "}")? fi where the parens and braces
    balance also works.

    Curiously, from C I learned that single character parentheses have
    their advantages. Thus () [] {}, but not really << >> or even
    "begin" and "end". However, the convention of ''' (3 of the relevant quote/paren) for multi-line bracketed items does seem to work well.
    And, 3 for that is better than 2. Backslash conventions may be a
    necessary evil, but they are not very friendly. Quoted strings where
    the same quote starts and ends the string also tend to be error prone,
    but they are so much a part of the heritage that it is another
    necessary evil.

    In fact, the worst part of error detection and recovery from my
    experience is "single character" errors that radically change the
    program. It is too easy for a single character to get inserted and
    break the program in a way that is easy to overlook.

    -------

    Another thing which works poorly is having both prefix and suffix
    operators. If you have them, they should not be at the same level of precedence, that almost always results in ambiguity.

    -------

    -- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres
    23 Bailey Rd voice: (508) 435-5016
    Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------ --- Synchronet 3.21b-Linux NewsLink 1.2