• Locals revisited

    From albert@spenarnc.xs4all.nl@21:1/5 to All on Tue Mar 25 15:01:37 2025
    In hindsight my locals definition is not convincing,
    because carnal knowledge about the behaviour of
    the return stack is required.
    "
    : local R> SWAP DUP >R @ >R >R CO R> R> ! ;

    VARIABLE A
    VARIABLE B

    : divide
    A local
    B local
    B ! A ! A @ B @ /
    . CR
    ;

    15 3 divide
    "

    Imagine a RiscV processor. Every low level word, doesn't use
    the return stack, but uses a link register, so the above
    probably doesn't work on a riscV.

    To abstract of this you need an extra stack, for example
    the "system stack" from Marcel Hendrix.
    In the above it is necessary to temporarily store the return
    information

    : local
    R> \ Return address
    SWAP DUP >R @ >R \ Abuse return stack for extra storage
    >R \ Restore return address
    CO
    R> R> ! \ Further abuse
    ;


    With the "system stack" it becomes :
    "
    : local DUP >S @ >S CO S> S> ! ;

    VARIABLE A
    VARIABLE B

    : divide
    A local
    B local
    B ! A ! A @ B @ /
    . CR
    ;

    15 3 divide
    "
    Don't ask me these questions:

    If you have an extra stack, why not use a locals stack?

    R R> R@ 2R> etc. are a poor mans tool. Isn't it time to
    replace them with >S S> S@ 2S> etc. and terminate all worries
    these facilities interfere with other stuff?

    (I get 30 registers in RISCV that can serve as a stack pointer.)

    Groetjes Albert.
    --
    Temu exploits Christians: (Disclaimer, only 10 apostles)
    Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
    Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
    And Gifts For Friends Family And Colleagues.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to albert@spenarnc.xs4all.nl on Wed Mar 26 12:14:19 2025
    albert@spenarnc.xs4all.nl writes:
    In hindsight my locals definition is not convincing, because carnal
    knowledge about the behaviour of the return stack is required.

    It's ok if it's for a specific implementation. But what I'm having
    trouble seeing is how the locals get popped in case of an exception. Do
    you not need to implement something like (LOCAL) ?

    (I get 30 registers in RISCV that can serve as a stack pointer.)

    In some models of the RISCV, only 14, I think. And in almost all
    models, 8 of them are more efficient to address than the rest, because
    of the compressed instruction format.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From mhx@21:1/5 to All on Thu Mar 27 06:40:23 2025
    In some models of the RISCV, only 14, I think. And in almost all
    models, 8 of them are more efficient to address than the rest, because
    of the compressed instruction format.

    Surely more efficiently than storing them in memory.
    Anyway, it is hard to come up with code that needs more than 4+1
    stacks in its critical path.

    -marcel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Paul Rubin on Thu Mar 27 07:48:35 2025
    Paul Rubin <no.email@nospam.invalid> writes:
    albert@spenarnc.xs4all.nl writes:
    (I get 30 registers in RISCV that can serve as a stack pointer.)

    All 31 can serve as a stack pointer, but not all at the same time.
    You want at least two registers for temporary values, for implementing
    a word such as "+". And you typically want to keep the top-of-stack
    of many stacks in a register, too. Or several stack items.

    In some models of the RISCV, only 14, I think.

    There is the E subspecification of the RISC-V specification with 16
    registers. I don't know if anybody has implemented this.

    And in almost all
    models, 8 of them are more efficient to address than the rest, because
    of the compressed instruction format.

    More efficient in code size. In instruction execution, typically the
    same speed.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
    EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to mhx on Thu Mar 27 07:58:47 2025
    mhx@iae.nl (mhx) writes:
    Anyway, it is hard to come up with code that needs more than 4+1
    stacks in its critical path.

    What are the 4+1 stacks you have in mind?

    In Gforth, we have data, return, FP, and locals stack.

    One could also imagine a SIMD stack (with, e.g., 512-bit entries on a
    system with AVX-512) and a vector or object stack (probably one stack
    for both purposes); that would make 6. Although, given the
    commonality between SIMD and FP registers in most architectures, one
    could also imagine using one stack for both (with the disadvantage
    that each FP stack item needs as much space as the largest SIMD item).

    In <http://www.euroforth.org/ef22/papers/ertl.pdf>, figure 1 shows:
    data stack, objects stack, control-flow stack (rarely accessed, i.e.,
    does not merit a stack pointer in a register), system r-stack,
    optional data r-stack, optional object r-stack, and an FP stack. If
    you include the optional stacks and add a SIMD stack and a locals
    stack, that would need 8 stack pointers.

    A problem with having many stacks is that you then need additional stack-manipulation words, transfer words, and possibly inter-stack
    operation words, so despite having a number of discussions about
    adding more stacks for, e.g., addresses or objects, this usually has
    only happened when the data types are non-overlapping (i.e., the FP
    stack), and for the return stack.

    On the other hand, in
    <http://www.euroforth.org/ef13/papers/ertl-paf.pdf> I point out that
    for a large subset of Forth, only one stack pointer is needed; that
    stack contains locals and items from the various logical stacks (data
    stack, return stack, etc.), when they no longer fit in registers.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
    EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From albert@spenarnc.xs4all.nl@21:1/5 to Anton Ertl on Thu Mar 27 13:14:30 2025
    In article <2025Mar27.084835@mips.complang.tuwien.ac.at>,
    Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    Paul Rubin <no.email@nospam.invalid> writes:
    albert@spenarnc.xs4all.nl writes:
    (I get 30 registers in RISCV that can serve as a stack pointer.)

    All 31 can serve as a stack pointer, but not all at the same time.
    You want at least two registers for temporary values, for implementing
    a word such as "+". And you typically want to keep the top-of-stack
    of many stacks in a register, too. Or several stack items.

    In some models of the RISCV, only 14, I think.

    There is the E subspecification of the RISC-V specification with 16 >registers. I don't know if anybody has implemented this.

    And in almost all
    models, 8 of them are more efficient to address than the rest, because
    of the compressed instruction format.

    More efficient in code size. In instruction execution, typically the
    same speed.

    The orange pi RV2 is a risc with 8 Gbyte at euro 64 with NVMe slots, wifi, sdcard,
    gbyte internet, usb, hdmi, mipi etc. and still the 26 pin Raspberry 1 slot.

    Worrying about code size is so seventies unless you are into 10 cent
    riscv embedded soc's (see noforth).


    - anton

    Groetjes Albert
    --
    Temu exploits Christians: (Disclaimer, only 10 apostles)
    Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
    Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
    And Gifts For Friends Family And Colleagues.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From albert@spenarnc.xs4all.nl@21:1/5 to no.email@nospam.invalid on Thu Mar 27 13:02:57 2025
    In article <87semzmwok.fsf@nightsong.com>,
    Paul Rubin <no.email@nospam.invalid> wrote:
    albert@spenarnc.xs4all.nl writes:
    In hindsight my locals definition is not convincing, because carnal
    knowledge about the behaviour of the return stack is required.

    It's ok if it's for a specific implementation. But what I'm having
    trouble seeing is how the locals get popped in case of an exception.


    I showed it as an example of the pretty convincing usefulness
    of CO. For this the example had to have to be portable.

    A simpler example would be
    \ Temporary set some-rounding-mode for the duration of this word.
    : rounding set-rounding-mode CO truncate-mode set-rounding-mode ;

    Most uses are ">R CO". If the stuff on the stack is a continuation ("nested-sys") . The arguments against given this combination a
    name become weaker, while the arguments agains "nested-sys" as
    a concept become weaker.

    Do you not need to implement something like (LOCAL) ?
    I don't use locals. If someone adds it to a ciforth application, let
    they worry about the interaction between (LOCAL) and THROW.

    Groetjes Albert
    --
    Temu exploits Christians: (Disclaimer, only 10 apostles)
    Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
    Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
    And Gifts For Friends Family And Colleagues.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to Anton Ertl on Thu Mar 27 14:41:49 2025
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    There is the E subspecification of the RISC-V specification with 16 registers. I don't know if anybody has implemented this.

    The CH32V003 uses it and is of some interest as a Forth target. It has
    16k of flash and 2k of ram.

    https://www.cnx-software.com/2022/10/22/10-cents-ch32v003-risc-v-mcu-offers-2kb-sram-16kb-flash-in-sop8-to-qfn20-packages/

    More efficient in code size. In instruction execution, typically the
    same speed.

    https://dl.acm.org/doi/10.1145/3578360.3580261 indicates some minor
    speed differences, not always favorable.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Paul Rubin on Fri Mar 28 08:27:03 2025
    Paul Rubin <no.email@nospam.invalid> writes:
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    There is the E subspecification of the RISC-V specification with 16
    registers. I don't know if anybody has implemented this.

    The CH32V003 uses it and is of some interest as a Forth target. It has
    16k of flash and 2k of ram.

    Interesting. But with 2KB of RAM, one is even less likely to want to
    use so many stack pointers than in less restricted settings. One
    probably just wants to use a data and a return stack, that's all.

    https://www.cnx-software.com/2022/10/22/10-cents-ch32v003-risc-v-mcu-offers-2kb-sram-16kb-flash-in-sop8-to-qfn20-packages/

    I see no mention of RV32E here, but searching further, I see

    https://wch-ic.com/downloads/CH32V003DS0_PDF.html

    which says "RV32EC instruction set".

    https://dl.acm.org/doi/10.1145/3578360.3580261 indicates some minor
    speed differences, not always favorable.

    The abstract says:

    |Binaries compiled for better compression show changes in their
    |execution time of at most ± 1.5 %. We analyze these against LLVM’s |spilling metrics, and conclude that the effect is probably not
    |systemic but a random fluctuation in the register allocation
    |heuristic.

    This sounds like what someone (Preston Briggs?) called IIRC
    "NP-completeness noise" (or something along these lines). The idea is
    that optimal register allocation is an NP-complete problem, so we use heuristics to get a good, but not necessarily optimal solution. Some
    of the decisions taken may lead to more suboptimality than others, and
    that's what the creator of the term saw.

    However, the performance effects may also be due to other effects that
    have little to do with register allocation itself, such as the effects
    of branch target alignment relative to instruction fetch granularity
    units, and of instructions straddling the fetch granularity boundaries
    or not.

    In any case, this work more supports than contradicts my statement
    "typically the same speed".

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
    EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From albert@spenarnc.xs4all.nl@21:1/5 to dxforth@gmail.com on Fri Mar 28 10:38:00 2025
    In article <19029ff0c8e7cf53335fe62639308e7f92d10240@i2pn2.org>,
    dxf <dxforth@gmail.com> wrote:
    On 27/03/2025 11:02 pm, albert@spenarnc.xs4all.nl wrote:
    In article <87semzmwok.fsf@nightsong.com>,
    Paul Rubin <no.email@nospam.invalid> wrote:
    albert@spenarnc.xs4all.nl writes:
    In hindsight my locals definition is not convincing, because carnal
    knowledge about the behaviour of the return stack is required.

    It's ok if it's for a specific implementation. But what I'm having
    trouble seeing is how the locals get popped in case of an exception.


    I showed it as an example of the pretty convincing usefulness
    of CO. For this the example had to have to be portable.

    A simpler example would be
    \ Temporary set some-rounding-mode for the duration of this word.
    : rounding set-rounding-mode CO truncate-mode set-rounding-mode ;
    ...

    Actually it was that example which caused me to *not* go ahead
    and implement ;: in the kernel despite a cost of only one header.
    How many calls to 'rounding' will you encounter in an application?

    `rounding: is an internal word in the fp package and it is used 4
    times.

    My guess is one. The usual example is HEX: but I already had (H.N)
    that's more flexible. For me at least locals was more credible but
    again it fell into a range. For a single use I'd do it manually;
    for extensive use (where exceptions etc are likely) a proper locals
    may be the only option. OTOH such decision-making is exactly what
    Forth has always been about.



    You can't argue with
    :NONAME ." Before " .S CO ." After " .S ;
    ' proc_contains_Heisenbug decorated

    Study python for the concept of decoration.
    This alone makes CO a worthwhile addition.

    I have (D.H) that uses CO. Undoubtedly your (H.N) is more complicated.

    Groetjes Albert
    --
    Temu exploits Christians: (Disclaimer, only 10 apostles)
    Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
    Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
    And Gifts For Friends Family And Colleagues.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From albert@spenarnc.xs4all.nl@21:1/5 to dxforth@gmail.com on Sun Mar 30 14:01:44 2025
    In article <4b84665e38d5a523efc2479f48338ed55d142185@i2pn2.org>,
    dxf <dxforth@gmail.com> wrote:
    On 28/03/2025 8:38 pm, albert@spenarnc.xs4all.nl wrote:
    In article <19029ff0c8e7cf53335fe62639308e7f92d10240@i2pn2.org>,
    dxf <dxforth@gmail.com> wrote:
    On 27/03/2025 11:02 pm, albert@spenarnc.xs4all.nl wrote:
    In article <87semzmwok.fsf@nightsong.com>,
    Paul Rubin <no.email@nospam.invalid> wrote:
    albert@spenarnc.xs4all.nl writes:
    In hindsight my locals definition is not convincing, because carnal >>>>>> knowledge about the behaviour of the return stack is required.

    It's ok if it's for a specific implementation. But what I'm having
    trouble seeing is how the locals get popped in case of an exception.


    I showed it as an example of the pretty convincing usefulness
    of CO. For this the example had to have to be portable.

    A simpler example would be
    \ Temporary set some-rounding-mode for the duration of this word.
    : rounding set-rounding-mode CO truncate-mode set-rounding-mode ;
    ...

    Actually it was that example which caused me to *not* go ahead
    and implement ;: in the kernel despite a cost of only one header.
    How many calls to 'rounding' will you encounter in an application?

    `rounding: is an internal word in the fp package and it is used 4
    times.

    My guess is one. The usual example is HEX: but I already had (H.N)
    that's more flexible. For me at least locals was more credible but
    again it fell into a range. For a single use I'd do it manually;
    for extensive use (where exceptions etc are likely) a proper locals
    may be the only option. OTOH such decision-making is exactly what
    Forth has always been about.



    You can't argue with
    :NONAME ." Before " .S CO ." After " .S ;
    ' proc_contains_Heisenbug decorated

    Study python for the concept of decoration.
    This alone makes CO a worthwhile addition.

    I have (D.H) that uses CO. Undoubtedly your (H.N) is more complicated.

    Possibly though much of the complication lies in what it needs to do - as >opposed to radix save/restore.

    \ Convert unsigned number u to a hexadecimal string c-addr2 u2 in the
    \ HOLD buffer beginning with the least-significant digits. Exactly
    \ +n hexadecimal characters are returned with any unused positions
    \ being filled with character '0'. BASE is preserved.
    : (H.N) ( u +n -- c-addr2 u2 )
    base @ >r hex <# 0 tuck ?do # loop #> r> base ! ;

    Same idea.
    ( Generate string with hex format of DOUBLE of LEN digits)
    : 4? 1+ 4 MOD 0= IF &, HOLD THEN ;
    : (DH.) HEX: <# 1- 0 ?DO # I 4? LOOP # #> ;

    ( Derive B. H. DH. from them)

    Only HEX: DEC: is made a separate facility:
    : HEX: R> BASE @ >R >R HEX CO R> BASE ! ;
    : DEC: R> BASE @ >R >R DECIMAL CO R> BASE ! ;
    To live on and fight another day, e.g. in formatting, or in at-xy that
    outputs a position in decimal.

    <SNIP>

    Groetjes Albert
    --
    Temu exploits Christians: (Disclaimer, only 10 apostles)
    Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
    Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
    And Gifts For Friends Family And Colleagues.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From sjack@21:1/5 to dxf on Mon Mar 31 15:01:30 2025
    dxf <dxforth@gmail.com> wrote:
    The question then is can HEX: DEC: be justified. While you might because
    CO exists, I'm not sure I could.


    Toad extension:

    -- BHEX ( R: -> base -- )
    -- HEX Back Trek
    -- save BASE
    -- Trek: set HEX
    -- Back: restore BASE
    : BHEX PRO
    BASE @ >R
    BACK R> BASE !
    TREK HEX
    CONT ;

    : foo bhex . ;

    100 dup
    i. foo --> 64
    i. . --> 100
    :)
    --
    me

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)