Hello Compiler Experts!
I am reading the book, "Programming Languages, Application and Interpretation"
by Shriram Krishnamurthi.
The book says that Lisp and Scheme have a primitive called "read".
The book says, "The read primitive is a crown jewel of Lisp and Scheme."
Some of my notes from reading the book:
- Read does tokenizing and reading.
- Read returns a value known as an s-expression.
- The s-expression is an intermediate representation.
- The output of read is either a number or a list. That's it!
I've read several compiler books and none of them talked about this. They talk
about creating a lexer to generate a stream of tokens and a parser that receives the tokens and arranges them into a tree data structure. Why no mention of the "crown jewel" of tokenizing/parsing? Why no mention of "one of the great ideas of computer science"?
I have done some work with Flex and Bison and recently I've done some work with building parsers using read. My experience is the latter is much easier. Why isn't read more widely discussed and used in the compiler community?
Surely the concept that read embodies is not specific to Lisp and Scheme, right?
/Roger
[Yes, it's specific to Lisp and Scheme. They have an extremely simple
symtax called S expressions of nested parenthesized lists of space
separated tokens with some quoting. The original plan was that Lisp 2
would have M expressions that looked more like a normal language but
it's over 50 years later and they still haven't gotten around to it.
[[...] The original plan was that Lisp 2
would have M expressions that looked more like a normal language but
it's over 50 years later and they still haven't gotten around to it.
-John]
I have done some work with Flex and Bison and recently I've done some work with building parsers using read. My experience is the latter is much easier. Why isn't read more widely discussed and used in the compiler community? Surely the concept that read embodies is not specific to Lisp and Scheme, right?
It is specific to when you have a very simple and uniform syntax andexperience
suggests that this isn't to most people's taste. Whether it is a result of "mental wiring" or tradition (including mathematical tradition) to which one gets exposed from a young age , I don't know. What I mean by this is thatmost
people seem to find it easier to read
a + b * c
as opposed to
(+ a (* b c))
and I don't know if this is just the result of early exposure or an inherent part of how most humans' brains function.
[[...] The original plan was that Lisp 2
would have M expressions that looked more like a normal language but
it's over 50 years later and they still haven't gotten around to it.
-John]
Actually they have. Some HOPL paper (or several of them) discuss
this: There were several attempts at an Algol-like syntax, but Lisp proprammers found that they preferred programming in S-Expressions
over the Algol-like syntax, whether it's M-Expressions, Dylan syntax,
or several other attempts.
Example of tokenizing/parsing using read:
(+ 3 4) --> read --> (list `+ 3 4) --> parse --> (add (num 3) (num 4))
The first expression (+ 3 4) is the concrete syntax.
The middle expression (list `+ 3 4) is an s-expression. It is an intermediate representation.
The last expression (add (num 3) (num 4)) is the abstract syntax.
The book says: read is one of the great ideas of computer science. It helps decompose a fundamentally difficult process - generalized parsing of the input
stream - into two simple processes:
(1) reading the input stream into an intermediate representation
(2) parsing that intermediate representation
I've read several compiler books and none of them talked about this. They talk
about creating a lexer to generate a stream of tokens and a parser that receives the tokens and arranges them into a tree data structure. Why no mention of the "crown jewel" of tokenizing/parsing? Why no mention of "one of the great ideas of computer science"?
I have done some work with Flex and Bison and recently I've done some work with building parsers using read. My experience is the latter is much easier. Why isn't read more widely discussed and used in the compiler community? Surely the concept that read embodies is not specific to Lisp and Scheme, right?
Apart from the already mentioned problem that it forces you into a
syntax that a lot of people don't like, there's also the problem that
you have to deal with hostile input. Where you expect "(+ 2 3)" someone
will enter "(+ 2 3 #.(progn (launch-the-nukes) 4))".
Roger L Costello <costello@mitre.org> writes:
I have done some work with Flex and Bison and recently I've done some work >> with building parsers using read. My experience is the latter is much easier.
Why isn't read more widely discussed and used in the compiler community?
Surely the concept that read embodies is not specific to Lisp and Scheme,
right?
Apart from the already mentioned problem that it forces you into a
syntax that a lot of people don't like, there's also the problem that
you have to deal with hostile input. Where you expect "(+ 2 3)" someone
will enter "(+ 2 3 #.(progn (launch-the-nukes) 4))". A lot of security
problems in real world settings come from not correctly validating
inputs and by the time you have worked around all these problems read
isn't all that easy anymore. C for example has a somewhat similar
facility scanf that tries to pattern match input and is also considered unsafe.
A good rule of thumb for production ready software is to define
a grammar for valid input and provide a validating parser.
If I recall correctly, at one time, PCCTS and itrCOs related tool
Sorcerer used an S-expression-like representation for ASTs, including an equivelnt to CAR and CDR, whose names I donrCOt recall but they were something like rCLdownrCY and rCLrightrCY.
Not every Lisp dialect has hash-dot read-time evaluation; that's
a feature of Common Lisp, disabled by setting/binding *read-eval*
to nil. I don't seem to recall that Scheme has it. I deliberately
kept it out of TXR Lisp.
A good rule of thumb for production ready software is to define
a grammar for valid input and provide a validating parser.
Sure, if you want to waste your time defining grammars and
writing validating parsers.
This is no longer done that much outside of the Lisp world. People use XML, JSON, Yaml, ..., whose grammar they definitely didn't design or
implement, and validate the content/shape of the object that comes out.
| Sysop: | Amessyroom |
|---|---|
| Location: | Fayetteville, NC |
| Users: | 59 |
| Nodes: | 6 (0 / 6) |
| Uptime: | 20:55:30 |
| Calls: | 810 |
| Calls today: | 1 |
| Files: | 1,287 |
| D/L today: |
11 files (21,026K bytes) |
| Messages: | 194,568 |