Forum: Too Lazy BBS

What attributes of a programming language simplify its use?

From gah4@gah4@u.washington.edu to comp.compilers on Thu Dec 1 14:20:50 2022

From Newsgroup: comp.compilers

We had the "What attributes of a programming language simplify its implementation?" discussion.

It seems, though, that languages are implemented a small number of
times, and used many times. So, designing for ease of use, instead of
ease of implementation makes more sense.

(Especially if you want a lot of people to want to use it.)

One feature that I find makes them easier to use, and harder to implement, is no reserved words.

For almost 50 years now, my favorite name for an otherwise unnamed
program is "this". (That is, where many people seem to use "foo", and
years before I knew about "foo".)

That worked fine, until Java came along with reserved word "this".
(Second choice, "that", fortunately isn't reserved in Java.)
[I take your point, but in PL/I you can say:

IF THEN = ELSE THEN BEGIN = IF; ELSE END = IF;

COBOL famously has too many reserved words but PL/I overreacted. -John]
--- Synchronet 3.21b-Linux NewsLink 1.2

From gah4@gah4@u.washington.edu to comp.compilers on Fri Dec 2 02:09:52 2022

From Newsgroup: comp.compilers

On Thursday, December 1, 2022 at 4:18:24 PM UTC-8, gah4 wrote:

(snip)

One feature that I find makes them easier to use, and harder to implement, is no reserved words.

(snip)

[I take your point, but in PL/I you can say:

IF THEN = ELSE THEN BEGIN = IF; ELSE END = IF;

COBOL famously has too many reserved words but PL/I overreacted. -John]

COBOL had many useful English words used.

As well as I know it, the idea for PL/I was that you shouldn't need to know about the parts of the language that you weren't using, even know the words.

But also, PL/I more than other languages has simple rules, instead
of arbitrary restrictions. You can use any expression in any place where
an expression is allowed.

Fortran has always, and still does, have unobvious restrictions on how expressions can be used. Some it seems intentionally to make it harder
for the programmer. That is, to discourage practices that some don't like.

My favorite complaint is about using REAL variables in DO loops.
Yes there are good reasons not to do it, but it isn't up to the language
to decide that.

The actual reason I follow this one, is that many (many!) years ago
I would sometimes translate BASIC programs to Fortran, where all
variables are Fortran REAL.

PL/I allowed array expressions from the beginning, Fortran added
them much later. PL/I has a simple rule. The subscripts have the
same value for all arrays in the expression.

Fortran has complicated rules, where some arrays index from one,
even if they are declared with different lower bound. You have to
be extremely careful, which one it is using. And it gets even worse
when you pass arrays to subroutines.

But back to the reserved words. Just because it allows them,
doesn't mean that you should use them. In all cases, the person
writing the program should consider readability. And using IF
related words in an IF statement is bound to be confusing.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.compilers on Sat Dec 3 10:25:40 2022

From Newsgroup: comp.compilers

gah4 <gah4@u.washington.edu> schrieb:

We had the "What attributes of a programming language simplify its implementation?" discussion.

It seems, though, that languages are implemented a small number of
times, and used many times. So, designing for ease of use, instead of
ease of implementation makes more sense.

Very much so.

(Especially if you want a lot of people to want to use it.)

One feature that I find makes them easier to use, and harder to
implement, is no reserved words.

I think this is more a matter of extensibility than of ease of use,
but both are somewhat intertwined.

Adding a new reserved word is a breaking change, especially if that
word is often used. See "new" in C++, which was something reasonable
to use in C, and is reserved in C++.

The life cycle of a programming language will have many revisions (if
it is successful, that is), and not having reserved keywords certainly
helps in two aspects: Existing user programs will continue to work,
and new features can be added in a way that is easier to read than
having to add special characters, so a new feature looks like a cat
walked over the keyboard, with capslock on.

Yes, this is a bit more pain for compiler writers, but far less than,
let's say, having to deal with SIMD.
[There's also the perl approach where you put a "use" at the top of the
program file saying which version of the language you want. -John]
--- Synchronet 3.21b-Linux NewsLink 1.2

From Hans-Peter Diettrich@DrDiettrich1@netscape.net to comp.compilers on Sat Dec 3 22:16:44 2022

From Newsgroup: comp.compilers

On 12/3/22 11:25 AM, Thomas Koenig wrote:

gah4 <gah4@u.washington.edu> schrieb:

One feature that I find makes them easier to use, and harder to
implement, is no reserved words.

I think this is more a matter of extensibility than of ease of use,
but both are somewhat intertwined.

Adding a new reserved word is a breaking change, especially if that
word is often used. See "new" in C++, which was something reasonable
to use in C, and is reserved in C++.

IMO C basic syntax is a bad base. As long as declarations and
expressions can be distinguished only by the type of an identifier (type
name or variable name) it's not a good idea to add new keywords that can
be confused with variable or type names. Instead weird constructs like
"long long" for int64_t have been introduced, while "int int" stays
equivalent to "int".

DoDi

--- Synchronet 3.21b-Linux NewsLink 1.2

From Christopher F Clark@christopher.f.clark@compiler-resources.com to comp.compilers on Sat Dec 3 23:33:07 2022

From Newsgroup: comp.compilers

The discussion on reserved words versus keywords reminds me of
decisions we made while building Yacc++. It is worth noting that we
(both of its developers) worked at Pr1me computer where PL/I dialects
were the key programming language used in build both the OS and the
compilers, so we were likely highly influenced by that.

As a result, Yacc++ has very few reserved words, I'm pretty sure the
number is 3 or less. There is only 1 that I can think of "yy_eof"
which is reserved because it is used in the library in a place where
we have hand-written code that we don't want to modify.(*) And, we
specifically reserve all yy_ words for use in the library, although
most can be used in grammars (and code) without any ill effect. And in
doing so, we feel we haven't taken away any common words from the
users' vocabulary, and we have done so in a way that when the words
have special meaning, it is generally the same meaning as traditional lex/flex/yacc/bison variants.

However, we do have plenty of context sensitive keywords. But we
structured their usage (as keywords) such that they are easily
disambiguated. Thus, left, right, nonassoc don't have special
characters in them, as opposed to yacc where they are %left et al.
Now, %prec we couldn't make unambiguous, so it retains the required %
spelling.

Still, worth noting to make that a possibility, we had to require that
all productions have a terminating semicolon (;}, rather than
depending upon name colon (:) to identify the start of the rule. That
also gets rid of the lexical hack required to make the grammar LALR(1)
not 2. We could have handled LALR(2) grammars, but in our opinion, it
made the error recovery and messages less obvious. Sometimes
simplicity of implementation makes for a simpler and more regular
language.

But to continue this part of the explanation, words like fast, small,
readable are keywords that describe different ways we layout the
tables and in specific contexts have those meanings. But in those
contexts, normal identifiers cannot appear. And, in any context where
a normal identifier can appear, they are simply identifiers and don't
carry any significance and in the library code where we need them to
have special meaning we use yy_fast et al. So, in the declarations
within a grammar where we need them to have special meanings, we don't
need them to be spelled some "special" way (i.e. you don't say yy_fast yy_tables, you say "fast tables" and it is perfectly clear, but you
can also use "fast" and "tables" in your grammar as tokens or
non-terminals without worrying that you are using a "reserved" word.
In fact, we do so to describe the grammar of Yacc++ grammars.

Thus, we feel like we have most achieved a similar level of balance as
PL/I had, without creating a write-only language. Yes, you can
probably use Yacc++ to write an extensible language that diverges into
a bunch of unique and incomprehensible variants where no two
programmers are using the same language. We haven't made that
impossible, However, the freedom we have allowed does not inherently
contribute to that nor encourage it. It simply let's people write
things slightly more naturally without a lot of "line noise".

------

Now, given that, I want to dispell the illusion that it makes parsing
harder (beyond a very trivial amount). The "trick" (hack) that lets
the grammar deal with keywords that are not reserved is quite simple
(and we document how to do it for users who are designing their own
languages in our manual) and it should apply to most parser generators
and doesn't rely on any special feature of Yacc++, although we have
some features that make doing so easier.

So, for example you have a list of keywords (tokens) that you want
treated like identifiers, say "if" "then" "else" ala PL/I or "left"
"right" "token" ala Yacc++ and you have a token identifier than you
want to define other identifiers as in:

token identifier, if, then, else, left, right, token;
identifier: "a" .. "z" ("a" .. "z" | "0" .. "9")*;
if: "if";
then: "then";
else: "else";
left: "left";
right: "right";
token: "token";

To get the desired property, you simply define a non-terminal (we'll
call it "ident") that you use to represent identifiers in contexts
where you want the keywords to be allowed, as in:

ident: identifier | if | then | else | left | right | token;

Now, simply use ident where you would have used identifier previously,

rule: ident ":" ident* ";" ;

And, use the keywords where they have their special meaning:

if_stmt: if expression then stmt (else stmt)?;
left_decl: left ident ("," ident)* ";" ;

As long as the uses are unambiguous (and the generator uses) the
prefer shift in shift-reduce conflict method of resolution (or you can
force it to with "precedence" declarations), then the grammar will
work as expected.

If it is ambiguous and you want to disallow certain keywords, simply
introduce other non-terminals, such as

ident_not_if: identifier | then | else | left | right | token;

assignment : ident_not_if "=" expression; // if keyword not allowed before =

-----

Now, as I said we have features in Yacc++ that make this easier, but
the principle doesn't require our tool. And, there is much more you
can do. This is just one of the relevant grammar hacks.

*) yy_error might also be a reserved word used as part of error recovery.

-- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------ --- Synchronet 3.21b-Linux NewsLink 1.2

From gah4@gah4@u.washington.edu to comp.compilers on Sat Dec 3 17:15:55 2022

From Newsgroup: comp.compilers

On Saturday, December 3, 2022 at 4:27:51 PM UTC-8, christoph...@compiler-resources.com wrote:

The discussion on reserved words versus keywords reminds me of
decisions we made while building Yacc++. It is worth noting that we
(both of its developers) worked at Pr1me computer where PL/I dialects
were the key programming language used in build both the OS and the compilers, so we were likely highly influenced by that.

This is reminding me of some cases in TeX where optional keywords can
arise in unexpected places. There is TeX glue that allows:

\hskip 1cm

or

\hskip 1cm plus 1cm

In normal use, you mix TeX commands and text to be formatted. If a macro expands to

\hskip 1cm

and is followed by text starting with

plus

you get a surprising error message.

I believe that plus is only a "reserved word" in that specific context.

And a project I was working on some years ago, just happened to run
into that case, however unlikely that might be.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.compilers on Tue Dec 6 09:56:17 2022

From Newsgroup: comp.compilers

Hans-Peter Diettrich <DrDiettrich1@netscape.net> writes:

IMO C basic syntax is a bad base. As long as declarations and
expressions can be distinguished only by the type of an identifier (type
name or variable name) it's not a good idea to add new keywords that can
be confused with variable or type names. Instead weird constructs like
"long long" for int64_t have been introduced, while "int int" stays equivalent to "int".

long long and int64_t are not the same (though int64_t may be the same
type as long long in a given implementation). long long is *at least* 64
bits. int64_t is *exactly* 64 bits, and must have a 2's-complement representation and no padding bits. "int int" is a syntax error.

(I'm not arguing that C's integer type system isn't overly complicated.)

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.21b-Linux NewsLink 1.2

From gah4@gah4@u.washington.edu to comp.compilers on Tue Dec 6 12:43:31 2022

From Newsgroup: comp.compilers

On Tuesday, December 6, 2022 at 10:28:44 AM UTC-8, Keith Thompson wrote:

(snip)

long long and int64_t are not the same (though int64_t may be the same
type as long long in a given implementation). long long is *at least* 64 bits. int64_t is *exactly* 64 bits, and must have a 2's-complement representation and no padding bits. "int int" is a syntax error.

(I'm not arguing that C's integer type system isn't overly complicated.)

It seems that many Fortran programmers now assume that KIND=8
(for REAL) is a 64 bit IEEE floating point value, and I suspect for
INTEGER that it is a 64 bit integer.

Fortran makes no claim on the numerical values of KINDs.

It doesn't seem too surprising, then, that some would miss the
distinction between int64_t and long long.

In the early days of 64 bit computing, which I mostly remember from
the DEC Alpha, C compilers made long the 64 bit type.

That, then, broke too much software assuming long was 32 bits.

Much of IP networking evolved when C int was either 16 or 32 bits, but
you didn't really know. When short was reliably 16 bits, and long was
reliably 32 bits.

So, we have things like htonl() and ntohl() for converting 32 bit
values to/from network byte order. (The l stands for long.)

Since networking code, especially cross platform, depends more on
exact lengths than many others, that was one that had to get done
right pretty early. (Cross platform file formats, too.)

So then we got long long as the (close enough to) reliable 64 bit
type.

Maybe in a few years, we will have the long long long 128 bit type.

But C syntax has been confusing due to the reserved words and need for additions in more than just data types.

There are stories that I don't remember on the different uses of the
word "static" in C.

Though maybe not quite as many as Fortran uses for *.
--- Synchronet 3.21b-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.compilers on Wed Dec 7 10:14:44 2022

From Newsgroup: comp.compilers

gah4 <gah4@u.washington.edu> writes:

In the early days of 64 bit computing, which I mostly remember from
the DEC Alpha, C compilers made long the 64 bit type.

The early days of 64-bit computing are on the CDC Star and Cray-1, but
C was a minor language for them.

Yes, we got the first mainstream 64-bit Unix with Digital OSF/1 on the
Alpha, and 64-bit APIs and ABIs on Unix had 64-bit long.

That, then, broke too much software assuming long was 32 bits.

Obviously not, or other Unix vendors would not have also made longs
64-bit in their interfaces.

So then we got long long as the (close enough to) reliable 64 bit
type.

GCC introduced long long indepenently of any 64-bit port; this is easy
to see because the original GCC documentation specified that long long
int is twice as long as long int. Later, when the Alpha port (and
later 64-bit ports) came, the porter decided to make long long 64-bit,
i.e., the same size as long; I don't know if the Alpha API/ABI had a requirement on the size of long long, or if the people responsible for
the Alpha port did deviate from the documentation for some other
reason. When we reported this as a bug, the fix was to change the documentation to say that long long is twice as long as int.

Concerning IL32P64, i.e., 32-bit longs with 64-bit pointers, that
seems to be a specialty of 64-bit Windows. Fortunately, I don't have
to deal with this API (64-bit Cygwin supports the Unix API, i.e.,
64-bit long).

Maybe in a few years, we will have the long long long 128 bit type.

GCC has supported 128-bit integers for a while, originally we wrote,
e.g.:

typedef int int128_t __attribute__((__mode__(TI)));

(makes me wonder how the compiler sees the "TI"; it's not a keyword,
and it's not a defined name in any of the name spaces; gcc tends to
pass such things as literal strings (cf. extended asm), but here it
does not).

Nowadays it seems to (also) have __int128_t as an
implementation-specific keyword. I see no motions in the direction of
long long long (and, looking at history, it would only have 64 bits in length:-).

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
--- Synchronet 3.21b-Linux NewsLink 1.2

From Hans-Peter Diettrich@DrDiettrich1@netscape.net to comp.compilers on Wed Dec 7 12:13:54 2022

From Newsgroup: comp.compilers

On 12/6/22 6:56 PM, Keith Thompson wrote:

Hans-Peter Diettrich <DrDiettrich1@netscape.net> writes:

IMO C basic syntax is a bad base. As long as declarations and
expressions can be distinguished only by the type of an identifier (type
name or variable name) it's not a good idea to add new keywords that can
be confused with variable or type names.

Nobody seems to disagree with my opinion?

Instead weird constructs like

"long long" for int64_t have been introduced, while "int int" stays
equivalent to "int".

long long and int64_t are not the same (though int64_t may be the same
type as long long in a given implementation). long long is *at least* 64 bits. int64_t is *exactly* 64 bits, and must have a 2's-complement representation and no padding bits.

You are right, my sloppy wording was not appropriate in this NG :-(

"int int" is a syntax error.

I could not find in the (older) C++ grammar why "int int" should be a
*syntax* error. Aren't both "int" and "long" simple-type-specifier's
which can occur multiple times in a decl-specifier-seq?

It looks to me like additional rules apply which decide that
"long int"
"long long int"
"long int long" //what's that?
are all valid while
"long int long int"
throws an "two or more data types..." error.

In former times it was much easier to decide with a single basic type id (int...) and type modifiers (long...).

DoDi
--- Synchronet 3.21b-Linux NewsLink 1.2

From Hans-Peter Diettrich@DrDiettrich1@netscape.net to comp.compilers on Thu Dec 8 21:42:26 2022

From Newsgroup: comp.compilers

On 12/8/22 2:53 AM, Keith Thompson wrote:

Hans-Peter Diettrich <DrDiettrich1@netscape.net> writes:

On 12/6/22 6:56 PM, Keith Thompson wrote:

[...]

"int int" is a syntax error.

I could not find in the (older) C++ grammar why "int int" should be a
*syntax* error. Aren't both "int" and "long" simple-type-specifier's
which can occur multiple times in a decl-specifier-seq?

No, there are specific rules that specify the way they can be used.
In the 2011 ISO C standard standard (I use the draft from https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf), the valid
type specifiers are listed in section 6.7.2.

Thanks for the link :-)

At least one type specifier shall be given in the declaration
specifiers in each declaration, and in the specifier-qualifier
list in each struct declaration and type name. Each list of type
specifiers shall be one of the following multisets (delimited
by commas, when there is more than one multiset per item);
the type specifiers may occur in any order, possibly intermixed
with the other declaration specifiers.

So let me repeat my questions:

- Why is "int int" a syntax error? "At least one..." allows for more
than one type-specifier in declaration-specifiers (6.7).

- What's "long int long"? My current (Arduino) C++ compiler doesn't flag
it as an error.

DoDi
[This is getting close to comp.lang.c but I'm OK with a little more
discussion of the design decisions in C's very messy declarations. -John]
--- Synchronet 3.21b-Linux NewsLink 1.2

From gah4@gah4@u.washington.edu to comp.compilers on Thu Dec 8 14:44:08 2022

From Newsgroup: comp.compilers

On Thursday, December 8, 2022 at 12:45:04 PM UTC-8, Hans-Peter Diettrich wrote:

(snip)

So let me repeat my questions:

- Why is "int int" a syntax error? "At least one..." allows for more
than one type-specifier in declaration-specifiers (6.7).

- What's "long int long"? My current (Arduino) C++ compiler doesn't flag
it as an error.

DoDi
[This is getting close to comp.lang.c but I'm OK with a little more discussion of the design decisions in C's very messy declarations. -John]

Now we can get closer to compilers.

I suspect that it isn't a syntax error, though it will depend on how the compiler is written.

The compiler (parser) can accept any combination of the specifiers,
and even more than one of them, and then later the compiler decides
that the ones give are not valid.

There was a story many years ago, about a compiler with only one error
message: "SYNTAX ERROR". (Likely in the days of upper case only.)

In any case, it is often easier to write the parser more general than
the actual language, and then flag them later.

But also, the same can be done for the language standard.

As well as I know it, in early C variables default to int. Later, it
was required that they be declared, but the default type was still
int. You could declare:

auto i;

which declares i as automatic, and (by default) int.

It gets more interesting in Fortran, where you can give variables
attributes in separate statements:

INTEGER I
DIMENSION I(10,10)
PUBLIC I
ALLOCATABLE I
ASYNCHRONOUS I
CONTIGUOUS I
INTENT(IN) I
OPTIONAL I
POINTER I
PROTECTED I
SAVE I
TARGET I
VOLATILE I

All might be legal syntax separately, but not legal in all combinations.
--- Synchronet 3.21b-Linux NewsLink 1.2

From gah4@gah4@u.washington.edu to comp.compilers on Mon Dec 12 00:00:56 2022

From Newsgroup: comp.compilers

On Tuesday, December 6, 2022 at 10:28:44 AM UTC-8, Keith Thompson wrote:

(big snip)

(I'm not arguing that C's integer type system isn't overly complicated.)

One reason for that, as noted above, is reserved words.

Adding new reserved words risks invalidating existing programs.

I do notice that Java has a reserved word "goto" without a defined use.
Someone was planning ahead.

C could have reserved some words for future use, if someone thought about it.

So adding new types is complicated.
[I think this is where you use #pragma to say which new keywords you're
using. Yes, it's a kludge. -John]
--- Synchronet 3.21b-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Geek2
  Sun May 17 07:06:15 2026
  from Euclid, Oh via Telnet
- Geek2
  Sat May 16 21:25:04 2026
  from Euclid, Oh via Telnet
- Jas Hud
  Sat May 16 00:50:28 2026
  from Bbs.Eob-Bbs.Com,wi via Telnet
- Geek2
  Fri May 15 19:53:20 2026
  from Euclid, Oh via Telnet

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	65
Nodes:	6 (0 / 6)
Uptime:	08:02:09
Calls:	862
Files:	1,311
D/L today:	1 files (1,366K bytes)
Messages:	264,936

What attributes of a programming language simplify its use?

Who's Online

Recent Visitors

System Info