Forum: Too Lazy BBS

bison parser : retrieving values from recursive pattern

From Archana Deshmukh@desharchana19@gmail.com to comp.compilers on Thu Jul 6 02:12:38 2023

From Newsgroup: comp.compilers

Hello,

I have a following rule

num :
| integer comma num
| integer closeroundbkt
| integer closesquarebkt

I need to parse data like
efg @main(%data: r[(1, 2, 4, 4), float32], %param_1: or[(2, 1, 5, 5), float32], %param_2: or[(20), float32], %param_3: or[(5, 2, 5, 5), float32], %param_4: or[(50), float32], %param_5: or[(50, 80), float32], %param_6: Tensor[(50), float32], %param_7: or[(10, 50), float32], %param_8: or[(20), float32]

I also need to retrieve these values and store to a lsit.

Retreiving and storing values for patterns like
| integer closeroundbkt
| integer closesquarebkt

is simple.

However, I am not able to find a way to retrieve and store recursive numbers from pattern

| integer comma num

Sometimes there can be 2 numbers (50, 80), sometimes there can be 4 numbers ((1, 2, 4, 4)). How to handle this?

Any suggestions are welcome.

Best Regards,
Archana Deshmukh
[For a list of numbers in parens I would do something like this:

parennumlist: '(' numlist ')' ;

numlist: integer
| numlist ',' integer ;

For the bracketed lists:

bracketlist: '[' parennumlist ',' datatype ']':

datatype: FLOAT32 | ... whatever other types there are ... ;

The usual way you do a variable length list is to make a recursive rule with one item
for a single item and another rule to add an item. Any book about compiler design should
give advice on writing grammar rules or my "flex & bison" has example grammars that
include lists. -John]
--- Synchronet 3.21b-Linux NewsLink 1.2

From Kaz Kylheku@864-117-4973@kylheku.com to comp.compilers on Fri Jul 7 01:14:04 2023

From Newsgroup: comp.compilers

On 2023-07-06, Archana Deshmukh <desharchana19@gmail.com> wrote:

Hello,

I have a following rule

num :
| integer comma num
| integer closeroundbkt
| integer closesquarebkt

Recognizing close brackets in a different rule from the open ones is
not absolutely off the table, but it's a code smell.

Consider a nice grammar like

list : '(' items ')'
| '(' ')'
| '[' items ']'
| '[' ']'

items : items ',' item
| item
;

item : list | num | type | decl

decl : keyword ':' oper list

keyword : KW_main | KW_data | KW_param_1

type : TYPE_float32 | ...

oper : OPER_r | OPER_or

I'd make all the symbols just one token type SYMBOL, and deal with it
all semantically later in the pipeline.

I.e. the over-generated grammar would allow nonsense like

@data(%float32: foo[(1, 2, 3, 4), param_1], main: ...)

This would be checked for validity semantically; that the right
kinds of symbols are in the right positions in the shape.

list : '(' items ')'
| '(' ')'
| '[' items ']'
| '[' ']'

items : items ',' item
| item
;

item : list | num | SYMBOL | decl

decl : SYMBOL ':' SYMBOL list

Lisp teaches us that reserved keywords are largely inflexible
and counterproductive.

Make your SYMBOl objects interned, and give them a type like
"struct symbol *". Interned means that when the same symbol
occurs more than once, the parser returns the same pointer:

SYMBOL { $$ = intern($1); } /* $1 is the yytext lexeme */

The first time intern("foo") is called it creates and return
s a symbol sym such that sym->name is foo (a strdup-ed copy)
The second time intern("foo") is called, it returns exactly
the same object!

In your program you can have initialization like this:

struct symbol *float32_s;

void global_init(void)
{
float32_s = intern("float32");

...
}

Then when the parser sees float32, it will produce
the same pointer.

The upshot is that you never have to compare strings.
If you want to check, is x the float32 symbol, you just use
the == operator;

void foo(struct symbol *x)
{
if (X == float32_s) {
// we are looking at the float32 symbol

}

}

Because symbols are just pointers, they are also fast to hash.
A hash table which maps symbols to other things just has
to hash the 4 or 8 byte pointer, not the string. This can
be done in a few bit operations.

Important global properties about symbols can be stored
in the struct symbol itself. For instance float32 is
a type, so there can be a sym->is_type property,
which is true for float32. Then you can easily check
whether some list has a type symbol in a certain position.
First check there is a symbol and if so, that it is
one with the is_type property true.
--- Synchronet 3.21b-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Geek2
  Sun May 17 07:06:15 2026
  from Euclid, Oh via Telnet
- Geek2
  Sat May 16 21:25:04 2026
  from Euclid, Oh via Telnet
- Jas Hud
  Sat May 16 00:50:28 2026
  from Bbs.Eob-Bbs.Com,wi via Telnet
- Geek2
  Fri May 15 19:53:20 2026
  from Euclid, Oh via Telnet

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	65
Nodes:	6 (0 / 6)
Uptime:	08:01:09
Calls:	862
Files:	1,311
D/L today:	1 files (1,366K bytes)
Messages:	264,936

bison parser : retrieving values from recursive pattern

Who's Online

Recent Visitors

System Info