• What attributes of a programming language simplify its use?

    From gah4@gah4@u.washington.edu to comp.compilers on Thu Dec 1 14:20:50 2022
    From Newsgroup: comp.compilers

    We had the "What attributes of a programming language simplify its implementation?" discussion.

    It seems, though, that languages are implemented a small number of
    times, and used many times. So, designing for ease of use, instead of
    ease of implementation makes more sense.

    (Especially if you want a lot of people to want to use it.)

    One feature that I find makes them easier to use, and harder to implement, is no reserved words.

    For almost 50 years now, my favorite name for an otherwise unnamed
    program is "this". (That is, where many people seem to use "foo", and
    years before I knew about "foo".)

    That worked fine, until Java came along with reserved word "this".
    (Second choice, "that", fortunately isn't reserved in Java.)
    [I take your point, but in PL/I you can say:

    IF THEN = ELSE THEN BEGIN = IF; ELSE END = IF;

    COBOL famously has too many reserved words but PL/I overreacted. -John]
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From gah4@gah4@u.washington.edu to comp.compilers on Fri Dec 2 02:09:52 2022
    From Newsgroup: comp.compilers

    On Thursday, December 1, 2022 at 4:18:24 PM UTC-8, gah4 wrote:

    (snip)

    One feature that I find makes them easier to use, and harder to implement, is no reserved words.

    (snip)
    [I take your point, but in PL/I you can say:

    IF THEN = ELSE THEN BEGIN = IF; ELSE END = IF;

    COBOL famously has too many reserved words but PL/I overreacted. -John]

    COBOL had many useful English words used.

    As well as I know it, the idea for PL/I was that you shouldn't need to know about the parts of the language that you weren't using, even know the words.

    But also, PL/I more than other languages has simple rules, instead
    of arbitrary restrictions. You can use any expression in any place where
    an expression is allowed.

    Fortran has always, and still does, have unobvious restrictions on how expressions can be used. Some it seems intentionally to make it harder
    for the programmer. That is, to discourage practices that some don't like.

    My favorite complaint is about using REAL variables in DO loops.
    Yes there are good reasons not to do it, but it isn't up to the language
    to decide that.

    The actual reason I follow this one, is that many (many!) years ago
    I would sometimes translate BASIC programs to Fortran, where all
    variables are Fortran REAL.


    PL/I allowed array expressions from the beginning, Fortran added
    them much later. PL/I has a simple rule. The subscripts have the
    same value for all arrays in the expression.

    Fortran has complicated rules, where some arrays index from one,
    even if they are declared with different lower bound. You have to
    be extremely careful, which one it is using. And it gets even worse
    when you pass arrays to subroutines.

    But back to the reserved words. Just because it allows them,
    doesn't mean that you should use them. In all cases, the person
    writing the program should consider readability. And using IF
    related words in an IF statement is bound to be confusing.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.compilers on Sat Dec 3 10:25:40 2022
    From Newsgroup: comp.compilers

    gah4 <gah4@u.washington.edu> schrieb:
    We had the "What attributes of a programming language simplify its implementation?" discussion.

    It seems, though, that languages are implemented a small number of
    times, and used many times. So, designing for ease of use, instead of
    ease of implementation makes more sense.

    Very much so.

    (Especially if you want a lot of people to want to use it.)

    One feature that I find makes them easier to use, and harder to
    implement, is no reserved words.

    I think this is more a matter of extensibility than of ease of use,
    but both are somewhat intertwined.

    Adding a new reserved word is a breaking change, especially if that
    word is often used. See "new" in C++, which was something reasonable
    to use in C, and is reserved in C++.

    The life cycle of a programming language will have many revisions (if
    it is successful, that is), and not having reserved keywords certainly
    helps in two aspects: Existing user programs will continue to work,
    and new features can be added in a way that is easier to read than
    having to add special characters, so a new feature looks like a cat
    walked over the keyboard, with capslock on.

    Yes, this is a bit more pain for compiler writers, but far less than,
    let's say, having to deal with SIMD.
    [There's also the perl approach where you put a "use" at the top of the
    program file saying which version of the language you want. -John]
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Hans-Peter Diettrich@DrDiettrich1@netscape.net to comp.compilers on Sat Dec 3 22:16:44 2022
    From Newsgroup: comp.compilers

    On 12/3/22 11:25 AM, Thomas Koenig wrote:
    gah4 <gah4@u.washington.edu> schrieb:

    One feature that I find makes them easier to use, and harder to
    implement, is no reserved words.

    I think this is more a matter of extensibility than of ease of use,
    but both are somewhat intertwined.

    Adding a new reserved word is a breaking change, especially if that
    word is often used. See "new" in C++, which was something reasonable
    to use in C, and is reserved in C++.

    IMO C basic syntax is a bad base. As long as declarations and
    expressions can be distinguished only by the type of an identifier (type
    name or variable name) it's not a good idea to add new keywords that can
    be confused with variable or type names. Instead weird constructs like
    "long long" for int64_t have been introduced, while "int int" stays
    equivalent to "int".

    DoDi

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Christopher F Clark@christopher.f.clark@compiler-resources.com to comp.compilers on Sat Dec 3 23:33:07 2022
    From Newsgroup: comp.compilers

    The discussion on reserved words versus keywords reminds me of
    decisions we made while building Yacc++. It is worth noting that we
    (both of its developers) worked at Pr1me computer where PL/I dialects
    were the key programming language used in build both the OS and the
    compilers, so we were likely highly influenced by that.

    As a result, Yacc++ has very few reserved words, I'm pretty sure the
    number is 3 or less. There is only 1 that I can think of "yy_eof"
    which is reserved because it is used in the library in a place where
    we have hand-written code that we don't want to modify.(*) And, we
    specifically reserve all yy_ words for use in the library, although
    most can be used in grammars (and code) without any ill effect. And in
    doing so, we feel we haven't taken away any common words from the
    users' vocabulary, and we have done so in a way that when the words
    have special meaning, it is generally the same meaning as traditional lex/flex/yacc/bison variants.

    However, we do have plenty of context sensitive keywords. But we
    structured their usage (as keywords) such that they are easily
    disambiguated. Thus, left, right, nonassoc don't have special
    characters in them, as opposed to yacc where they are %left et al.
    Now, %prec we couldn't make unambiguous, so it retains the required %
    spelling.

    Still, worth noting to make that a possibility, we had to require that
    all productions have a terminating semicolon (;}, rather than
    depending upon name colon (:) to identify the start of the rule. That
    also gets rid of the lexical hack required to make the grammar LALR(1)
    not 2. We could have handled LALR(2) grammars, but in our opinion, it
    made the error recovery and messages less obvious. Sometimes
    simplicity of implementation makes for a simpler and more regular
    language.

    But to continue this part of the explanation, words like fast, small,
    readable are keywords that describe different ways we layout the
    tables and in specific contexts have those meanings. But in those
    contexts, normal identifiers cannot appear. And, in any context where
    a normal identifier can appear, they are simply identifiers and don't
    carry any significance and in the library code where we need them to
    have special meaning we use yy_fast et al. So, in the declarations
    within a grammar where we need them to have special meanings, we don't
    need them to be spelled some "special" way (i.e. you don't say yy_fast yy_tables, you say "fast tables" and it is perfectly clear, but you
    can also use "fast" and "tables" in your grammar as tokens or
    non-terminals without worrying that you are using a "reserved" word.
    In fact, we do so to describe the grammar of Yacc++ grammars.

    Thus, we feel like we have most achieved a similar level of balance as
    PL/I had, without creating a write-only language. Yes, you can
    probably use Yacc++ to write an extensible language that diverges into
    a bunch of unique and incomprehensible variants where no two
    programmers are using the same language. We haven't made that
    impossible, However, the freedom we have allowed does not inherently
    contribute to that nor encourage it. It simply let's people write
    things slightly more naturally without a lot of "line noise".

    ------

    Now, given that, I want to dispell the illusion that it makes parsing
    harder (beyond a very trivial amount). The "trick" (hack) that lets
    the grammar deal with keywords that are not reserved is quite simple
    (and we document how to do it for users who are designing their own
    languages in our manual) and it should apply to most parser generators
    and doesn't rely on any special feature of Yacc++, although we have
    some features that make doing so easier.

    So, for example you have a list of keywords (tokens) that you want
    treated like identifiers, say "if" "then" "else" ala PL/I or "left"
    "right" "token" ala Yacc++ and you have a token identifier than you
    want to define other identifiers as in:

    token identifier, if, then, else, left, right, token;
    identifier: "a" .. "z" ("a" .. "z" | "0" .. "9")*;
    if: "if";
    then: "then";
    else: "else";
    left: "left";
    right: "right";
    token: "token";

    To get the desired property, you simply define a non-terminal (we'll
    call it "ident") that you use to represent identifiers in contexts
    where you want the keywords to be allowed, as in:

    ident: identifier | if | then | else | left | right | token;

    Now, simply use ident where you would have used identifier previously,

    rule: ident ":" ident* ";" ;

    And, use the keywords where they have their special meaning:

    if_stmt: if expression then stmt (else stmt)?;
    left_decl: left ident ("," ident)* ";" ;

    As long as the uses are unambiguous (and the generator uses) the
    prefer shift in shift-reduce conflict method of resolution (or you can
    force it to with "precedence" declarations), then the grammar will
    work as expected.

    If it is ambiguous and you want to disallow certain keywords, simply
    introduce other non-terminals, such as

    ident_not_if: identifier | then | else | left | right | token;

    assignment : ident_not_if "=" expression; // if keyword not allowed before =

    -----

    Now, as I said we have features in Yacc++ that make this easier, but
    the principle doesn't require our tool. And, there is much more you
    can do. This is just one of the relevant grammar hacks.

    *) yy_error might also be a reserved word used as part of error recovery.

    -- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres
    23 Bailey Rd voice: (508) 435-5016
    Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------ --- Synchronet 3.21b-Linux NewsLink 1.2
  • From gah4@gah4@u.washington.edu to comp.compilers on Sat Dec 3 17:15:55 2022
    From Newsgroup: comp.compilers

    On Saturday, December 3, 2022 at 4:27:51 PM UTC-8, christoph...@compiler-resources.com wrote:
    The discussion on reserved words versus keywords reminds me of
    decisions we made while building Yacc++. It is worth noting that we
    (both of its developers) worked at Pr1me computer where PL/I dialects
    were the key programming language used in build both the OS and the compilers, so we were likely highly influenced by that.

    This is reminding me of some cases in TeX where optional keywords can
    arise in unexpected places. There is TeX glue that allows:

    \hskip 1cm

    or

    \hskip 1cm plus 1cm

    In normal use, you mix TeX commands and text to be formatted. If a macro expands to

    \hskip 1cm

    and is followed by text starting with

    plus

    you get a surprising error message.

    I believe that plus is only a "reserved word" in that specific context.

    And a project I was working on some years ago, just happened to run
    into that case, however unlikely that might be.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.compilers on Tue Dec 6 09:56:17 2022
    From Newsgroup: comp.compilers

    Hans-Peter Diettrich <DrDiettrich1@netscape.net> writes:
    IMO C basic syntax is a bad base. As long as declarations and
    expressions can be distinguished only by the type of an identifier (type
    name or variable name) it's not a good idea to add new keywords that can
    be confused with variable or type names. Instead weird constructs like
    "long long" for int64_t have been introduced, while "int int" stays equivalent to "int".

    long long and int64_t are not the same (though int64_t may be the same
    type as long long in a given implementation). long long is *at least* 64
    bits. int64_t is *exactly* 64 bits, and must have a 2's-complement representation and no padding bits. "int int" is a syntax error.

    (I'm not arguing that C's integer type system isn't overly complicated.)

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Working, but not speaking, for XCOM Labs
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From gah4@gah4@u.washington.edu to comp.compilers on Tue Dec 6 12:43:31 2022
    From Newsgroup: comp.compilers

    On Tuesday, December 6, 2022 at 10:28:44 AM UTC-8, Keith Thompson wrote:

    (snip)

    long long and int64_t are not the same (though int64_t may be the same
    type as long long in a given implementation). long long is *at least* 64 bits. int64_t is *exactly* 64 bits, and must have a 2's-complement representation and no padding bits. "int int" is a syntax error.

    (I'm not arguing that C's integer type system isn't overly complicated.)

    It seems that many Fortran programmers now assume that KIND=8
    (for REAL) is a 64 bit IEEE floating point value, and I suspect for
    INTEGER that it is a 64 bit integer.

    Fortran makes no claim on the numerical values of KINDs.

    It doesn't seem too surprising, then, that some would miss the
    distinction between int64_t and long long.

    In the early days of 64 bit computing, which I mostly remember from
    the DEC Alpha, C compilers made long the 64 bit type.

    That, then, broke too much software assuming long was 32 bits.

    Much of IP networking evolved when C int was either 16 or 32 bits, but
    you didn't really know. When short was reliably 16 bits, and long was
    reliably 32 bits.

    So, we have things like htonl() and ntohl() for converting 32 bit
    values to/from network byte order. (The l stands for long.)

    Since networking code, especially cross platform, depends more on
    exact lengths than many others, that was one that had to get done
    right pretty early. (Cross platform file formats, too.)

    So then we got long long as the (close enough to) reliable 64 bit
    type.

    Maybe in a few years, we will have the long long long 128 bit type.

    But C syntax has been confusing due to the reserved words and need for additions in more than just data types.

    There are stories that I don't remember on the different uses of the
    word "static" in C.

    Though maybe not quite as many as Fortran uses for *.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.compilers on Wed Dec 7 10:14:44 2022
    From Newsgroup: comp.compilers

    gah4 <gah4@u.washington.edu> writes:
    In the early days of 64 bit computing, which I mostly remember from
    the DEC Alpha, C compilers made long the 64 bit type.

    The early days of 64-bit computing are on the CDC Star and Cray-1, but
    C was a minor language for them.

    Yes, we got the first mainstream 64-bit Unix with Digital OSF/1 on the
    Alpha, and 64-bit APIs and ABIs on Unix had 64-bit long.

    That, then, broke too much software assuming long was 32 bits.

    Obviously not, or other Unix vendors would not have also made longs
    64-bit in their interfaces.

    So then we got long long as the (close enough to) reliable 64 bit
    type.

    GCC introduced long long indepenently of any 64-bit port; this is easy
    to see because the original GCC documentation specified that long long
    int is twice as long as long int. Later, when the Alpha port (and
    later 64-bit ports) came, the porter decided to make long long 64-bit,
    i.e., the same size as long; I don't know if the Alpha API/ABI had a requirement on the size of long long, or if the people responsible for
    the Alpha port did deviate from the documentation for some other
    reason. When we reported this as a bug, the fix was to change the documentation to say that long long is twice as long as int.

    Concerning IL32P64, i.e., 32-bit longs with 64-bit pointers, that
    seems to be a specialty of 64-bit Windows. Fortunately, I don't have
    to deal with this API (64-bit Cygwin supports the Unix API, i.e.,
    64-bit long).

    Maybe in a few years, we will have the long long long 128 bit type.

    GCC has supported 128-bit integers for a while, originally we wrote,
    e.g.:

    typedef int int128_t __attribute__((__mode__(TI)));

    (makes me wonder how the compiler sees the "TI"; it's not a keyword,
    and it's not a defined name in any of the name spaces; gcc tends to
    pass such things as literal strings (cf. extended asm), but here it
    does not).

    Nowadays it seems to (also) have __int128_t as an
    implementation-specific keyword. I see no motions in the direction of
    long long long (and, looking at history, it would only have 64 bits in length:-).

    - anton
    --
    M. Anton Ertl
    anton@mips.complang.tuwien.ac.at
    http://www.complang.tuwien.ac.at/anton/
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Hans-Peter Diettrich@DrDiettrich1@netscape.net to comp.compilers on Wed Dec 7 12:13:54 2022
    From Newsgroup: comp.compilers

    On 12/6/22 6:56 PM, Keith Thompson wrote:
    Hans-Peter Diettrich <DrDiettrich1@netscape.net> writes:
    IMO C basic syntax is a bad base. As long as declarations and
    expressions can be distinguished only by the type of an identifier (type
    name or variable name) it's not a good idea to add new keywords that can
    be confused with variable or type names.

    Nobody seems to disagree with my opinion?


    Instead weird constructs like
    "long long" for int64_t have been introduced, while "int int" stays
    equivalent to "int".

    long long and int64_t are not the same (though int64_t may be the same
    type as long long in a given implementation). long long is *at least* 64 bits. int64_t is *exactly* 64 bits, and must have a 2's-complement representation and no padding bits.

    You are right, my sloppy wording was not appropriate in this NG :-(

    "int int" is a syntax error.

    I could not find in the (older) C++ grammar why "int int" should be a
    *syntax* error. Aren't both "int" and "long" simple-type-specifier's
    which can occur multiple times in a decl-specifier-seq?

    It looks to me like additional rules apply which decide that
    "long int"
    "long long int"
    "long int long" //what's that?
    are all valid while
    "long int long int"
    throws an "two or more data types..." error.

    In former times it was much easier to decide with a single basic type id (int...) and type modifiers (long...).

    DoDi
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Hans-Peter Diettrich@DrDiettrich1@netscape.net to comp.compilers on Thu Dec 8 21:42:26 2022
    From Newsgroup: comp.compilers

    On 12/8/22 2:53 AM, Keith Thompson wrote:
    Hans-Peter Diettrich <DrDiettrich1@netscape.net> writes:
    On 12/6/22 6:56 PM, Keith Thompson wrote:
    [...]
    "int int" is a syntax error.

    I could not find in the (older) C++ grammar why "int int" should be a
    *syntax* error. Aren't both "int" and "long" simple-type-specifier's
    which can occur multiple times in a decl-specifier-seq?

    No, there are specific rules that specify the way they can be used.
    In the 2011 ISO C standard standard (I use the draft from https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf), the valid
    type specifiers are listed in section 6.7.2.

    Thanks for the link :-)

    At least one type specifier shall be given in the declaration
    specifiers in each declaration, and in the specifier-qualifier
    list in each struct declaration and type name. Each list of type
    specifiers shall be one of the following multisets (delimited
    by commas, when there is more than one multiset per item);
    the type specifiers may occur in any order, possibly intermixed
    with the other declaration specifiers.

    So let me repeat my questions:

    - Why is "int int" a syntax error? "At least one..." allows for more
    than one type-specifier in declaration-specifiers (6.7).

    - What's "long int long"? My current (Arduino) C++ compiler doesn't flag
    it as an error.

    DoDi
    [This is getting close to comp.lang.c but I'm OK with a little more
    discussion of the design decisions in C's very messy declarations. -John]
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From gah4@gah4@u.washington.edu to comp.compilers on Thu Dec 8 14:44:08 2022
    From Newsgroup: comp.compilers

    On Thursday, December 8, 2022 at 12:45:04 PM UTC-8, Hans-Peter Diettrich wrote:

    (snip)

    So let me repeat my questions:

    - Why is "int int" a syntax error? "At least one..." allows for more
    than one type-specifier in declaration-specifiers (6.7).

    - What's "long int long"? My current (Arduino) C++ compiler doesn't flag
    it as an error.

    DoDi
    [This is getting close to comp.lang.c but I'm OK with a little more discussion of the design decisions in C's very messy declarations. -John]

    Now we can get closer to compilers.

    I suspect that it isn't a syntax error, though it will depend on how the compiler is written.

    The compiler (parser) can accept any combination of the specifiers,
    and even more than one of them, and then later the compiler decides
    that the ones give are not valid.

    There was a story many years ago, about a compiler with only one error
    message: "SYNTAX ERROR". (Likely in the days of upper case only.)

    In any case, it is often easier to write the parser more general than
    the actual language, and then flag them later.

    But also, the same can be done for the language standard.

    As well as I know it, in early C variables default to int. Later, it
    was required that they be declared, but the default type was still
    int. You could declare:

    auto i;

    which declares i as automatic, and (by default) int.

    It gets more interesting in Fortran, where you can give variables
    attributes in separate statements:

    INTEGER I
    DIMENSION I(10,10)
    PUBLIC I
    ALLOCATABLE I
    ASYNCHRONOUS I
    CONTIGUOUS I
    INTENT(IN) I
    OPTIONAL I
    POINTER I
    PROTECTED I
    SAVE I
    TARGET I
    VOLATILE I

    All might be legal syntax separately, but not legal in all combinations.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From gah4@gah4@u.washington.edu to comp.compilers on Mon Dec 12 00:00:56 2022
    From Newsgroup: comp.compilers

    On Tuesday, December 6, 2022 at 10:28:44 AM UTC-8, Keith Thompson wrote:

    (big snip)

    (I'm not arguing that C's integer type system isn't overly complicated.)

    One reason for that, as noted above, is reserved words.

    Adding new reserved words risks invalidating existing programs.

    I do notice that Java has a reserved word "goto" without a defined use.
    Someone was planning ahead.

    C could have reserved some words for future use, if someone thought about it.

    So adding new types is complicated.
    [I think this is where you use #pragma to say which new keywords you're
    using. Yes, it's a kludge. -John]
    --- Synchronet 3.21b-Linux NewsLink 1.2