• PS Level 1 grammar

    From luser droog@luser.droog@gmail.com to comp.lang.postscript on Sun Nov 7 16:40:42 2021
    From Newsgroup: comp.lang.postscript

    Here's a rough draft of the grammar for a PS tokenizer using my new functions. This is almost the exact same code as the previous version pc9token.ps
    with just a few function names changed and no handlers yet to transform
    the data. My previous code didn't do the recursion for procedures, but
    this one does or should assuming it works.

    I'm building recursive parsers by starting with a "forwarding" proc
    /myparser {-777 exec} def
    which can be composed with other parsers and filled
    in later by doing
    //myparser 0 //composed-parser put
    This is the simplest way I've found so far after struggling with more complicated ways.

    It's missing some stuff like e notation, hex strings, ASCII85.
    pc11atoken.ps:

    (pc11a.ps)run

    /delimiters ( \t\n()/%[]<>{}) def
    /delimiter delimiters anyof def
    /octal (0)(7) range def
    /digit (0)(9) range def
    /alpha (a)(z) range (A)(Z) range alt def
    /regular delimiters noneof def

    /number //digit some def
    /opt-number //digit many def
    /rad-digits //digit //alpha plus some def
    /rad-integer //digit //digit maybe then (#) char then //rad-digits then def /integer (+-) anyof maybe //number then def
    /real (+-) anyof maybe
    //number (.) char then //opt-number then
    (.) char //number then alt then def
    /name //regular some def

    /ps-char {-777 exec} def
    /escape (\\) char
    (\\) char
    (\() char alt
    (\)) char alt
    (n) char alt
    (r) char alt
    (t) char alt
    (b) char alt
    (f) char alt
    //octal //octal maybe then //octal maybe then alt
    then def
    /substring (\() char //ps-char many then (\)) char then def
    //ps-char 0 //escape
    //substring alt
    (()) noneof alt put
    /ps-string (\() char //ps-char many then (\)) char then def

    /spaces ( \t\n) anyof many def
    /object {-777 exec} def
    /ps-token //spaces //object xthen def
    /object 0 //rad-integer
    //real alt
    //integer alt
    //name alt
    (/) char //name then alt
    (/) char (/) char then //name then alt
    //ps-string alt
    ({) char //ps-token many then spaces (}) char xthen then alt
    //delimiter alt put
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From luser droog@luser.droog@gmail.com to comp.lang.postscript on Mon Nov 8 09:49:29 2021
    From Newsgroup: comp.lang.postscript

    On Sunday, November 7, 2021 at 6:40:43 PM UTC-6, luser droog wrote:
    Here's a rough draft of the grammar for a PS tokenizer using my new functions.
    This is almost the exact same code as the previous version pc9token.ps
    with just a few function names changed and no handlers yet to transform
    the data. My previous code didn't do the recursion for procedures, but
    this one does or should assuming it works.

    I'm building recursive parsers by starting with a "forwarding" proc /myparser {-777 exec} def
    which can be composed with other parsers and filled
    in later by doing
    //myparser 0 //composed-parser put
    This is the simplest way I've found so far after struggling with more complicated ways.

    It's missing some stuff like e notation, hex strings, ASCII85. pc11atoken.ps:

    (pc11a.ps)run

    /delimiters ( \t\n()/%[]<>{}) def
    /delimiter delimiters anyof def
    /octal (0)(7) range def
    /digit (0)(9) range def
    /alpha (a)(z) range (A)(Z) range alt def
    /regular delimiters noneof def

    /number //digit some def
    /opt-number //digit many def
    /rad-digits //digit //alpha plus some def
    /rad-integer //digit //digit maybe then (#) char then //rad-digits then def /integer (+-) anyof maybe //number then def
    /real (+-) anyof maybe
    //number (.) char then //opt-number then
    (.) char //number then alt then def
    /name //regular some def

    /ps-char {-777 exec} def
    /escape (\\) char
    (\\) char
    (\() char alt
    (\)) char alt
    (n) char alt
    (r) char alt
    (t) char alt
    (b) char alt
    (f) char alt
    //octal //octal maybe then //octal maybe then alt
    then def
    /substring (\() char //ps-char many then (\)) char then def
    //ps-char 0 //escape
    //substring alt
    (()) noneof alt put
    /ps-string (\() char //ps-char many then (\)) char then def


    /hex-char //digit (a)(f) range (A)(F) range alt alt def
    /non-hex-char //hex-char none def
    /hex-string (<) char //non-hex-char many //hex-char xthen many then (>) char then def


    /spaces ( \t\n) anyof many def
    /object {-777 exec} def
    /ps-token //spaces //object xthen def
    /object 0 //rad-integer
    //real alt
    //integer alt
    //name alt
    (/) char //name then alt
    (/) char (/) char then //name then alt
    //ps-string alt

    //hex-string alt


    ({) char //ps-token many then spaces (}) char xthen then alt
    //delimiter alt put

    Adding hex strings needed a new combinator `none` that I'd been able to avoid until now. In earlier versions it had been a factor of `noneof` which matches the inverse of a set of characters.

    pc9.ps:
    noneof { anyof none }
    none {p} { { dup /p exec [] ne { zero }{ item } ifelse exec } ll } @func

    But I found a simpler way to write `anyof` and `noneof` since this version builds everything on top of `pred satisfy`. So they can both use a factor `within` that checks a character against a string.

    pc11.ps:
    anyof { {within} curry satisfy }
    noneof { {within not} curry satisfy }

    But to do the inverse of a parser built out of 3 ranges, I really need the
    more general `none` now.

    So this function takes a parser as a named parameter then constructs
    a new procedure with this parameter substituted inside (like a primitive 'lambda') and yields this procedure as its result.

    none{ p }{
    { dup /p exec +is-ok { pop [ /p ( succeeded) ] fail }{ pop item } ifelse exec } ll
    } @func

    It also just includes the parameter parser as part of the error message
    would could result in a very unhelpful message. But I think it's the best
    that can be done here with the information available. I might be nicer
    if `none` had access to a higher level description of its parameter.
    But I'm not sure how to orchestrate that right now.

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From luser droog@luser.droog@gmail.com to comp.lang.postscript on Tue Nov 9 08:56:45 2021
    From Newsgroup: comp.lang.postscript

    On Monday, November 8, 2021 at 11:49:30 AM UTC-6, luser droog wrote:
    On Sunday, November 7, 2021 at 6:40:43 PM UTC-6, luser droog wrote:
    Here's a rough draft of the grammar for a PS tokenizer using my new functions.

    A little more fleshed out, formatted, and slightly tested. I've been having
    to futz around with the innards of several parsers like `then` and `many`
    to get `xthen` and `thenx` to work reliably. I had been using `append` to combine the results of two sequential parsers, and `append` works like
    a Lisp list append; ie. it scans to the end of the cdr chain and then replaces the last null with the new element. That all works for the most part.

    It fails when you try to do fancy stuff like `xthen` and `thenx`. These
    are sequencing combinators like `then` which runs one parser and
    then the other on the remainder from the first. But `xthen` has the
    extra trick of discarding the result of the first parser, and `thenx`
    discards the result of the second parser.

    These are great for discarding stuff during the parse. Like when
    processing escape codes, some simple ones like (\\) (\() (\)) are
    completely handled by simply discarding the first slash. And all
    the escape handling is simplified by just doing that wholesale
    in all cases.

    But if you're appending results into a long list, then you've lost
    the <first> vs. <second> structure! The obvious solution for that
    was to replace the calls to `append` with calls to `cons` which
    just groups the two parts into a 2-element array: easy to grab
    the two pieces out later.

    Doing that caused a bug that took a while to track down. It caused
    a problem in the handlers, all the procedures composed into the
    parsers with `using`. In all of them I was calling a function called
    `flatten` that only knew how to deal with 1-D Lisp lists. So it went
    wild with a weird non-list cons structure.

    So now it all works by using a more powerful function called
    `unwrap` which can tease apart whatever weird cons tangle
    is thrown at it. But you can't see any of this here; it's all inside
    the `fix` function from pc11a.ps.

    With this fix, I only just now got hex strings to appear to work,
    discarding non hex characters it finds. Still need to interpret
    the hex characters and do some handling for procedures.
    And e notation.

    Then a further challenge if I actually want to emulate the
    `token` operator. I'll need to reliably recreate the remainder substring.
    This string may or may not be reliably tucked into the lazy
    remainder list still in string form. So some fiddly business may
    be needed to reconstruct this string.

    %errordict/typecheck{pq}put
    (pc11a.ps)run <<
    /interpret-octal { 0 exch { first 48 sub exch 8 mul add } forall }
    /to-char { 1 string dup 0 4 3 roll put }
    begin

    /delimiters ( \t\n()/%[]<>{}) def
    /delimiter delimiters anyof def
    /octal (0)(7) range def
    /digit (0)(9) range def
    /alpha (a)(z) range (A)(Z) range alt def
    /regular delimiters noneof def

    /rad-digit //digit //alpha alt def
    /rad-integer //digit //digit maybe then (#) char then //rad-digit some then def
    /number //digit some def
    /opt-number //digit many def
    /integer (+-) anyof maybe //number then def
    /real (+-) anyof maybe
    //number (.) char then //opt-number then
    (.) char //number then alt then def

    /name //regular some def

    /ps-char {-777 exec} def
    /escape (\\) char
    (\\) char
    (\() char alt
    (\)) char alt
    (n) char { pop (\n) one } using alt
    (r) char { pop (\r) one } using alt
    (t) char { pop (\t) one } using alt
    (b) char { pop (\b) one } using alt
    (f) char { pop (\f) one } using alt
    //octal //octal maybe then //octal maybe then
    { fix interpret-octal to-char one } using alt
    xthen def
    /ps-string (\() char //ps-char executeonly many then (\)) char then def //ps-char 0 //escape
    //ps-string alt
    (()) noneof alt put

    /hex-char //digit (a)(f) range (A)(F) range alt alt def
    /non-hex-char //hex-char (>) char alt none def
    /hex-string (<) char
    //non-hex-char many //hex-char xthen many then //non-hex-char many thenx
    (>) char then def

    /spaces ( \t\n) anyof many def
    /object {-777 exec} def
    /ps-token //spaces //object executeonly xthen def

    //object 0 //rad-integer { fix to-string cvi } using
    //real { fix to-string cvr } using alt
    //integer { fix to-string cvi } using alt
    //name { fix to-string cvn cvx } using alt
    (/) char //name then { fix to-string rest cvn cvlit } using alt
    (/) char (/) char then //name then { fix to-string rest rest cvn load } using alt
    //ps-string { fix to-string 1 1 index length 2 sub getinterval } using alt
    //hex-string { fix 1 1 index length 2 sub getinterval } using alt
    ({) char //ps-token many then //spaces (}) char xthen then alt
    //delimiter { fix to-string cvn cvx } using alt
    put

    /mytoken {
    dup length 0 gt {
    0 0 3 2 roll string-input //ps-token exec
    }{ pop false } ifelse
    } def

    {
    0 0 (47) string-input //integer exec pc
    0 0 (47) string-input //number exec pc
    0 0 (8#117) string-input
    //digit //digit maybe then (#) char then //rad-digit some then exec pc
    %quit
    0 0 (8#117) string-input //rad-integer exec pc
    0 0 (1.17) string-input //real exec pc
    } pop

    (8#117) mytoken pc
    (47) mytoken pc
    (string) mytoken pc
    ([stuff) mytoken pc
    (/litname) mytoken pc
    (42.42) mytoken pc
    ((a\\117 \\\\string\\n)) mytoken ps second first print clear
    /thing 12 def
    (//thing) mytoken pc
    (<abc defg>) mytoken pc

    quit

    $ gsnd -dNOSAFER pc11atoken.ps
    GPL Ghostscript 9.52 (2020-03-19)
    Copyright (C) 2020 Artifex Software, Inc. All rights reserved.
    This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
    see the file COPYING for details.
    stack:
    [/OK [79 []]]
    :stack
    stack:
    [/OK [47 {0 2 () string-input}]]
    :stack
    stack:
    [/OK [string []]]
    :stack
    stack:
    [/OK [[ {0 1 (stuff) string-input}]]
    :stack
    stack:
    [/OK [/litname []]]
    :stack
    stack:
    [/OK [42.42 []]]
    :stack
    stack:
    [/OK [(aO \\string\n) {0 18 () string-input}]]
    :stack
    aO \string
    stack:
    [/OK [12 []]]
    :stack
    stack:
    [/OK [[(a) (b) (c) (d) (e) (f)] {0 10 () string-input}]]
    :stack
    --- Synchronet 3.21d-Linux NewsLink 1.2