• Re: Calling conventions (particularly 32-bit ARM)

    From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.arch on Sat Feb 14 20:40:26 2026
    From Newsgroup: comp.arch

    George Neuner <gneuner2@comcast.net> writes:

    On Mon, 27 Jan 2025 17:09:59 -0800, Tim Rentsch
    <tr.17687@z991.linuxsc.com> wrote:

    George Neuner <gneuner2@comcast.net> writes:

    On Mon, 6 Jan 2025 20:10:13 +0000, mitchalsup@aol.com (MitchAlsup1)
    wrote:

    I looked high and low for codes using more than 8 arguments and
    returning aggregates larger than 8 double words, and about the
    only things I found were a handful of []print[]() calls.

    Large numbers of parameters may be generated either by closure
    conversion or by lambda lifting. These are FP language
    transformations that are analogous to, but potentially more complex
    than, the rewriting of object methods and their call sites to pass the
    current object in an OO language.

    [The difference between closure conversion and lambda lifting is the
    scope of the tranformation: conversion limits code transformations to
    within the defining call chain, whereas lifting pulls the closure to
    top level making it (at least potentially) globally available.]

    In either case the original function is rewritten such that non-local
    variables can be passed as parameters. The function's code must be
    altered to access the non-locals - either directly as explicit
    individual parameters, or by indexing from a pointer to an environment
    data structure.

    While in a simple case this could look exactly like the OO method
    transformation, recall that a general closure may require access to
    non-local variables spread through multiple environments. Even if
    whole environments are passed via single pointers, there still may
    need to be multiple parameters added.

    Isn't it the case that access to all of the enclosing environments
    can be provided by passing a single pointer? I'm pretty sure it
    is.

    Certainly, if the enclosing environments somehow are chained together.
    In real code though, in many instances such a chain will not already
    exist when the closure is constructed. The compiler would have to
    install pointers to the needed environments (or, alternatively,
    pointers directly to the needed values) into the new closure's
    immediate environment.
    [essentially this creates a private "display" for the closure.]

    Completely doable: it is simply that, if there are enough registers,
    passing the pointers as parameters will tend to be more performant.

    Sounds like you're saying that you agree that passing
    just one value is always feasible. Also that, depending
    on individual circumstances, either approach might have
    better performance.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From George Neuner@gneuner2@comcast.net to comp.arch on Tue Feb 17 15:35:33 2026
    From Newsgroup: comp.arch


    Hi Tim,

    On Sat, 14 Feb 2026 20:40:26 -0800, Tim Rentsch
    <tr.17687@z991.linuxsc.com> wrote:

    George Neuner <gneuner2@comcast.net> writes:

    On Mon, 27 Jan 2025 17:09:59 -0800, Tim Rentsch
    <tr.17687@z991.linuxsc.com> wrote:

    George Neuner <gneuner2@comcast.net> writes:

    On Mon, 6 Jan 2025 20:10:13 +0000, mitchalsup@aol.com (MitchAlsup1)
    wrote:

    I looked high and low for codes using more than 8 arguments and
    returning aggregates larger than 8 double words, and about the
    only things I found were a handful of []print[]() calls.

    Large numbers of parameters may be generated either by closure
    conversion or by lambda lifting. These are FP language
    transformations that are analogous to, but potentially more complex
    than, the rewriting of object methods and their call sites to pass the >>>> current object in an OO language.

    [The difference between closure conversion and lambda lifting is the
    scope of the tranformation: conversion limits code transformations to >>>> within the defining call chain, whereas lifting pulls the closure to
    top level making it (at least potentially) globally available.]

    In either case the original function is rewritten such that non-local
    variables can be passed as parameters. The function's code must be
    altered to access the non-locals - either directly as explicit
    individual parameters, or by indexing from a pointer to an environment >>>> data structure.

    While in a simple case this could look exactly like the OO method
    transformation, recall that a general closure may require access to
    non-local variables spread through multiple environments. Even if
    whole environments are passed via single pointers, there still may
    need to be multiple parameters added.

    Isn't it the case that access to all of the enclosing environments
    can be provided by passing a single pointer? I'm pretty sure it
    is.

    Certainly, if the enclosing environments somehow are chained together.
    In real code though, in many instances such a chain will not already
    exist when the closure is constructed. The compiler would have to
    install pointers to the needed environments (or, alternatively,
    pointers directly to the needed values) into the new closure's
    immediate environment.
    [essentially this creates a private "display" for the closure.]

    Completely doable: it is simply that, if there are enough registers,
    passing the pointers as parameters will tend to be more performant.

    Sounds like you're saying that you agree that passing
    just one value is always feasible. Also that, depending
    on individual circumstances, either approach might have
    better performance.

    You are correct ... it always is possible to pass the closure
    environment to the function using a single pointer.

    But you may not want to do it that way.


    My point was about the structure of closure environments. In general,
    you want to minimize what data needs to be persisted - particularly in
    a program that generates lots of /related/ closures - while also
    keeping in mind that data may need to be both shared among multiple
    closures [not just among multiple functions in a common closure].

    It may be necessary, e.g., to pull data out of a stack context and
    heap allocate it instead. That requires changing the stack context to
    be a pointer rather than a value, rewriting any functions that expect
    the value to use the pointer instead, and constructing new persistent "environment" structures that can find the relocated data.

    This can require a lot of effort by the compiler.


    OTOH, if the structure of the program is such that the closure's
    non-local data is guaranteed to be in scope when the closure is
    invoked, it often is simpler just to rewrite closure functions to
    access that data via a pointer parameter, and change the call sites to
    pass the required pointer(s).

    The closure may still need a persistent enviroment, but this method
    reduces or eliminates the need for /chained/ environments, and having
    to rewrite other non-closure functions that happen to use the data.

    This also can require a lot of effort by the compiler, but the effort
    can be more focused on the closures, and less on "regular" code.


    What you really don't want in any case is to have to preserve entire
    stacks just to support creating closures. It doesn't matter whether
    the stack is linear or a chain of heap allocated structures[*]. Some
    rewriting and data relocation (out of the stack) will be necessary in
    any case.

    [*] yes, this actually is done in some GC'd language implementations.
    When the stack shrinks, discarded contexts are cleaned up by the GC.

    --- Synchronet 3.21b-Linux NewsLink 1.2