• Code generation for DOES> in Gforth

    From Anton Ertl@21:1/5 to All on Sat Sep 21 17:25:51 2024
    I recently noticed that Gforth still used the following COMPILE,
    implementation for words defined with CREATE...SET-DOES> (and
    consequently also for words defined with CREATE...DOES>):

    : does, ( xt -- ) does-check ['] does-xt peephole-compile, , ;

    Ignore DOES-CHECK (it has to do with stack-depth checking, still
    incomplete). The rest means that it compiles the primitive DOES-XT
    with the xt of the COMPILE,d word as immediate argument. DOES-XT
    pushes the body of the word and then EXECUTEs the xt that SET-DOES>
    has registered for this word. In most cases this is a colon
    definition (always if DOES> is used), so the next thing that happens
    is DOCOL, and then the code for the colon definition is run.

    I have now replaced this with

    : does, ( xt -- ) does-check dup >body lit, >extra @ compile, ;

    What this does is to compile the body as a literal, and then it
    COMPILE,s the xt that DOES-XT would EXECUTE. In the common case of a
    colon definition this compiles a call to the colon definition. This
    saves the overhead of accessing the doesfield and of dispatching on
    its contents at run-time; all that is now done during compilation.

    Let us first look at the generated code. Consider the example:

    : myconst create , does> @ ;
    5 myconst five
    : foo five ;

    SIMPLE-SEE FOO shows:

    old new
    $7F6F5CAE6BC8 does-xt 1->1 $7F46A7EA92B8 lit 1->1
    $7F6F5CAE6BD0 five $7F46A7EA92C0 five
    $7F6F5CAE6BD8 ;s 1->1 ok $7F46A7EA92C8 call 1->1
    $7F46A7EA92D0 $7F46A7C0A168
    $7F46A7EA92D8 ;s 1->1

    For the following microbenchmark:

    : d1 ( "name" -- )
    create 0 ,
    does> ( -- addr )
    ; \ yes, an empty DOES> exists in an application program
    d1 z1

    : bench-z1-comp ( -- )
    iterations 0 ?do
    1 z1 +!
    loop ;

    I see the following results per iteration (startup overhead included)
    on a Rocket Lake:

    old new
    8.2 7.5 cycles:u
    34.0 29.0 instructions:u
    5.2 4.2 branches:u

    So five instructions less (including one branch), resulting in a small
    speedup for this microbenchmark.

    The Gforth image contained 129 occurences of does-xt and after the
    change it contains 12 (a part of the image is created with the
    cross-compiler, which still compiles to DOES-XT. As a result, the
    image size and gforth-fast (AMD64) native-code size in bytes are as
    follows:

    old new
    2189364 2193264 image
    448291 448659 native-code

    The larger image is no surprise. For the 117 replaced does-xts, the
    threaded code grows by 2 cells each, and the meta-data grows
    correspondingly.

    For the native code, the growth is not that expected. Let's see how
    the code looks:

    does-xt lit call
    add rbx,$10 mov $00[r13],r8
    mov $00[r13],r8 sub r13,$08
    mov r8,-$08[rbx] mov r8,$08[rbx]
    sub r13,$08 mov rax,$18[rbx]
    sub rbx,$08 sub r14,$08
    mov rax,-$08[r8] add rbx,$20
    mov rdx,$18[rax] mov [r14],rbx
    mov rax,-$10[rdx] mov rbx,rax
    jmp eax mov rax,[rbx]
    jmp eax

    34 bytes 35 bytes

    Ok, it's larger, but that explains only 117 extra bytes. Maybe the
    interaction with other optimizations explains the rest.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2024: https://euro.theforth.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Anton Ertl on Thu Oct 3 10:59:26 2024
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    I recently noticed that Gforth still used the following COMPILE, >implementation for words defined with CREATE...SET-DOES> (and
    consequently also for words defined with CREATE...DOES>):

    : does, ( xt -- ) does-check ['] does-xt peephole-compile, , ;

    Ignore DOES-CHECK (it has to do with stack-depth checking, still
    incomplete). The rest means that it compiles the primitive DOES-XT
    with the xt of the COMPILE,d word as immediate argument. DOES-XT
    pushes the body of the word and then EXECUTEs the xt that SET-DOES>
    has registered for this word. In most cases this is a colon
    definition (always if DOES> is used), so the next thing that happens
    is DOCOL, and then the code for the colon definition is run.

    I have now replaced this with

    : does, ( xt -- ) does-check dup >body lit, >extra @ compile, ;

    What this does is to compile the body as a literal, and then it
    COMPILE,s the xt that DOES-XT would EXECUTE. In the common case of a
    colon definition this compiles a call to the colon definition. This
    saves the overhead of accessing the doesfield and of dispatching on
    its contents at run-time; all that is now done during compilation.

    Another benefit: Gforth used to implement special COMPILE,
    implementations for 2VALUE and FVALUE. Here's the old implementation
    of FVALUE:

    : opt-fval ( xt -- ) >body postpone Literal postpone f@ ;

    create dummy-fvalue
    ' f@ set-does>
    ' fvalue-to set-to
    ' opt-fval set-optimizer

    : fvalue ( r "name" -- ) \ floating-ext f-value
    \g Define @i{name} @code{( -- r1 )} where @i{r1} initially is
    \g @i{r}; this value can be changed with @code{to @i{name}} or
    \g @code{->@i{name}}.
    ['] dummy-fvalue create-from reveal f, ;

    The new DOES, generates exactly the same code for FVALUEs as OPT-FVAL
    does, so we no longer need OPT-FVAL and the use of SET-OPTIMIZER here.
    Likewise for 2VALUE. This simplification reduces the image size by
    927 bytes and the native-code size by 176 bytes.

    The code for compiling an FVALUE looks as follows (before and after
    the change):

    5e fvalue x ok
    : bla x ; ok
    see-code bla
    $7F1341F2D5A8 lit 1->2
    $7F1341F2D5B0 x
    7F1341A4EA63: mov r15,$08[rbx]
    $7F1341F2D5B8 f@ 2->1
    7F1341A4EA67: movsd [r12],xmm15
    7F1341A4EA6D: movsd xmm15,[r15]
    7F1341A4EA72: sub r12,$08
    $7F1341F2D5C0 ;s 1->1
    7F1341A4EA76: mov rbx,[r14]
    7F1341A4EA79: add r14,$08
    7F1341A4EA7D: mov rax,[rbx]
    7F1341A4EA80: jmp eax

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2024: https://euro.theforth.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)