• bison parser : retrieving values from recursive pattern

    From Archana Deshmukh@desharchana19@gmail.com to comp.compilers on Thu Jul 6 02:12:38 2023
    From Newsgroup: comp.compilers

    Hello,

    I have a following rule

    num :
    | integer comma num
    | integer closeroundbkt
    | integer closesquarebkt


    I need to parse data like
    efg @main(%data: r[(1, 2, 4, 4), float32], %param_1: or[(2, 1, 5, 5), float32], %param_2: or[(20), float32], %param_3: or[(5, 2, 5, 5), float32], %param_4: or[(50), float32], %param_5: or[(50, 80), float32], %param_6: Tensor[(50), float32], %param_7: or[(10, 50), float32], %param_8: or[(20), float32]

    I also need to retrieve these values and store to a lsit.

    Retreiving and storing values for patterns like
    | integer closeroundbkt
    | integer closesquarebkt

    is simple.

    However, I am not able to find a way to retrieve and store recursive numbers from pattern

    | integer comma num

    Sometimes there can be 2 numbers (50, 80), sometimes there can be 4 numbers ((1, 2, 4, 4)). How to handle this?

    Any suggestions are welcome.

    Best Regards,
    Archana Deshmukh
    [For a list of numbers in parens I would do something like this:

    parennumlist: '(' numlist ')' ;

    numlist: integer
    | numlist ',' integer ;

    For the bracketed lists:

    bracketlist: '[' parennumlist ',' datatype ']':

    datatype: FLOAT32 | ... whatever other types there are ... ;

    The usual way you do a variable length list is to make a recursive rule with one item
    for a single item and another rule to add an item. Any book about compiler design should
    give advice on writing grammar rules or my "flex & bison" has example grammars that
    include lists. -John]
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Kaz Kylheku@864-117-4973@kylheku.com to comp.compilers on Fri Jul 7 01:14:04 2023
    From Newsgroup: comp.compilers

    On 2023-07-06, Archana Deshmukh <desharchana19@gmail.com> wrote:
    Hello,

    I have a following rule

    num :
    | integer comma num
    | integer closeroundbkt
    | integer closesquarebkt


    Recognizing close brackets in a different rule from the open ones is
    not absolutely off the table, but it's a code smell.

    Consider a nice grammar like

    list : '(' items ')'
    | '(' ')'
    | '[' items ']'
    | '[' ']'

    items : items ',' item
    | item
    ;

    item : list | num | type | decl

    decl : keyword ':' oper list

    keyword : KW_main | KW_data | KW_param_1

    type : TYPE_float32 | ...

    oper : OPER_r | OPER_or

    I'd make all the symbols just one token type SYMBOL, and deal with it
    all semantically later in the pipeline.

    I.e. the over-generated grammar would allow nonsense like

    @data(%float32: foo[(1, 2, 3, 4), param_1], main: ...)

    This would be checked for validity semantically; that the right
    kinds of symbols are in the right positions in the shape.


    list : '(' items ')'
    | '(' ')'
    | '[' items ']'
    | '[' ']'

    items : items ',' item
    | item
    ;

    item : list | num | SYMBOL | decl

    decl : SYMBOL ':' SYMBOL list

    Lisp teaches us that reserved keywords are largely inflexible
    and counterproductive.

    Make your SYMBOl objects interned, and give them a type like
    "struct symbol *". Interned means that when the same symbol
    occurs more than once, the parser returns the same pointer:

    SYMBOL { $$ = intern($1); } /* $1 is the yytext lexeme */

    The first time intern("foo") is called it creates and return
    s a symbol sym such that sym->name is foo (a strdup-ed copy)
    The second time intern("foo") is called, it returns exactly
    the same object!

    In your program you can have initialization like this:

    struct symbol *float32_s;

    void global_init(void)
    {
    float32_s = intern("float32");

    ...
    }

    Then when the parser sees float32, it will produce
    the same pointer.

    The upshot is that you never have to compare strings.
    If you want to check, is x the float32 symbol, you just use
    the == operator;

    void foo(struct symbol *x)
    {
    if (X == float32_s) {
    // we are looking at the float32 symbol

    }

    }

    Because symbols are just pointers, they are also fast to hash.
    A hash table which maps symbols to other things just has
    to hash the 4 or 8 byte pointer, not the string. This can
    be done in a few bit operations.

    Important global properties about symbols can be stored
    in the struct symbol itself. For instance float32 is
    a type, so there can be a sym->is_type property,
    which is true for float32. Then you can easily check
    whether some list has a type symbol in a certain position.
    First check there is a symbol and if so, that it is
    one with the is_type property true.
    --- Synchronet 3.21b-Linux NewsLink 1.2