• Re: stack sizes, Segments

    From John Levine@21:1/5 to All on Wed Jan 22 02:54:33 2025
    According to George Neuner <gneuner2@comcast.net>:
    Not standard compliant for sure, but you certainly can approximate
    stack use in C: just store (as byte*) the address of a local in your
    top level function, and check the (absolute value of) the difference
    to the address of a local in the current function.

    Ugh, but yes that would work with the usual stack structure,

    The bigger problem is knowing how much stack is available to use -
    there may be no way (or no easy way) to find the actual size ... or
    the limit if the stack expands ... and circumstances beyond the
    program may have limited it to be smaller than the program requested.

    There's often no way to tell since it may depend on things like
    running out of swap space which depends on how much memory other
    programs are using.

    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to John Levine on Wed Jan 22 15:25:43 2025
    On Wed, 22 Jan 2025 02:54:33 -0000 (UTC)
    John Levine <johnl@taugh.com> wrote:

    According to George Neuner <gneuner2@comcast.net>:
    Not standard compliant for sure, but you certainly can approximate
    stack use in C: just store (as byte*) the address of a local in your
    top level function, and check the (absolute value of) the difference
    to the address of a local in the current function.

    Ugh, but yes that would work with the usual stack structure,

    The bigger problem is knowing how much stack is available to use -
    there may be no way (or no easy way) to find the actual size ... or
    the limit if the stack expands ... and circumstances beyond the
    program may have limited it to be smaller than the program
    requested.

    There's often no way to tell since it may depend on things like
    running out of swap space which depends on how much memory other
    programs are using.


    At least you can know the size of reserved VA space which is both an
    upper bound of the limit and in 99% of the cases an actual limit.

    On Windows: https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-getcurrentthreadstacklimits

    I suppose that similar functions are available on other OSes as well.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Michael S on Wed Jan 22 15:01:34 2025
    Michael S <already5chosen@yahoo.com> writes:
    On Wed, 22 Jan 2025 02:54:33 -0000 (UTC)
    John Levine <johnl@taugh.com> wrote:

    According to George Neuner <gneuner2@comcast.net>:
    Not standard compliant for sure, but you certainly can approximate
    stack use in C: just store (as byte*) the address of a local in your
    top level function, and check the (absolute value of) the difference
    to the address of a local in the current function.

    Ugh, but yes that would work with the usual stack structure,

    The bigger problem is knowing how much stack is available to use -
    there may be no way (or no easy way) to find the actual size ... or
    the limit if the stack expands ... and circumstances beyond the
    program may have limited it to be smaller than the program
    requested.

    There's often no way to tell since it may depend on things like
    running out of swap space which depends on how much memory other
    programs are using.


    At least you can know the size of reserved VA space which is both an
    upper bound of the limit and in 99% of the cases an actual limit.

    On Windows: >https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-getcurrentthreadstacklimits

    I suppose that similar functions are available on other OSes as well.


    https://pubs.opengroup.org/onlinepubs/9799919799/functions/pthread_attr_getstack.html

    There is no equivlent function for the main process stack, other than
    the 'getrlimit(RLIMIT_STACK...' functionality.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Scott Lurndal on Thu Jan 23 01:45:16 2025
    On Wed, 22 Jan 2025 15:01:34 GMT
    scott@slp53.sl.home (Scott Lurndal) wrote:

    Michael S <already5chosen@yahoo.com> writes:
    On Wed, 22 Jan 2025 02:54:33 -0000 (UTC)
    John Levine <johnl@taugh.com> wrote:

    According to George Neuner <gneuner2@comcast.net>:
    Not standard compliant for sure, but you certainly can approximate
    stack use in C: just store (as byte*) the address of a local in
    your top level function, and check the (absolute value of) the
    difference to the address of a local in the current function.

    Ugh, but yes that would work with the usual stack structure,

    The bigger problem is knowing how much stack is available to use -
    there may be no way (or no easy way) to find the actual size ...
    or the limit if the stack expands ... and circumstances beyond the
    program may have limited it to be smaller than the program
    requested.

    There's often no way to tell since it may depend on things like
    running out of swap space which depends on how much memory other
    programs are using.


    At least you can know the size of reserved VA space which is both an
    upper bound of the limit and in 99% of the cases an actual limit.

    On Windows: >https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-getcurrentthreadstacklimits

    I suppose that similar functions are available on other OSes as well.


    https://pubs.opengroup.org/onlinepubs/9799919799/functions/pthread_attr_getstack.html

    There is no equivlent function for the main process stack,


    Do you mean "there is no equivlent *POSIX* function", right?
    But I sincerily hope that most Unix-like systems provide such
    functionality in system-specific manner. Because it looks usable.

    other than
    the 'getrlimit(RLIMIT_STACK...' functionality.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Michael S on Thu Jan 23 01:07:02 2025
    Michael S <already5chosen@yahoo.com> writes:
    On Wed, 22 Jan 2025 15:01:34 GMT
    scott@slp53.sl.home (Scott Lurndal) wrote:

    Michael S <already5chosen@yahoo.com> writes:
    On Wed, 22 Jan 2025 02:54:33 -0000 (UTC)
    John Levine <johnl@taugh.com> wrote:

    According to George Neuner <gneuner2@comcast.net>:
    Not standard compliant for sure, but you certainly can approximate
    stack use in C: just store (as byte*) the address of a local in
    your top level function, and check the (absolute value of) the
    difference to the address of a local in the current function.

    Ugh, but yes that would work with the usual stack structure,

    The bigger problem is knowing how much stack is available to use -
    there may be no way (or no easy way) to find the actual size ...
    or the limit if the stack expands ... and circumstances beyond the
    program may have limited it to be smaller than the program
    requested.

    There's often no way to tell since it may depend on things like
    running out of swap space which depends on how much memory other
    programs are using.


    At least you can know the size of reserved VA space which is both an
    upper bound of the limit and in 99% of the cases an actual limit.

    On Windows:
    https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-getcurrentthreadstacklimits

    I suppose that similar functions are available on other OSes as well.


    https://pubs.opengroup.org/onlinepubs/9799919799/functions/pthread_attr_getstack.html

    There is no equivlent function for the main process stack,


    Do you mean "there is no equivlent *POSIX* function", right?
    But I sincerily hope that most Unix-like systems provide such
    functionality in system-specific manner. Because it looks usable.

    What would an application (portable or otherwise) use the process
    stack base address for?

    The data is available through the /proc/ filesystem.

    $ cat /proc/$$/maps | grep "[stack]"

    7fff77b52000-7fff77b73000 rw-p 00000000 00:00 0 [stack]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From George Neuner@21:1/5 to All on Wed Jan 22 20:28:33 2025
    On Wed, 22 Jan 2025 15:01:34 GMT, scott@slp53.sl.home (Scott Lurndal)
    wrote:

    Michael S <already5chosen@yahoo.com> writes:
    On Wed, 22 Jan 2025 02:54:33 -0000 (UTC)
    John Levine <johnl@taugh.com> wrote:

    According to George Neuner <gneuner2@comcast.net>:
    Not standard compliant for sure, but you certainly can approximate
    stack use in C: just store (as byte*) the address of a local in your
    top level function, and check the (absolute value of) the difference
    to the address of a local in the current function.

    Ugh, but yes that would work with the usual stack structure,

    The bigger problem is knowing how much stack is available to use -
    there may be no way (or no easy way) to find the actual size ... or
    the limit if the stack expands ... and circumstances beyond the
    program may have limited it to be smaller than the program
    requested.

    There's often no way to tell since it may depend on things like
    running out of swap space which depends on how much memory other
    programs are using.


    At least you can know the size of reserved VA space which is both an
    upper bound of the limit and in 99% of the cases an actual limit.

    On Windows: >>https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-getcurrentthreadstacklimits

    I suppose that similar functions are available on other OSes as well.


    https://pubs.opengroup.org/onlinepubs/9799919799/functions/pthread_attr_getstack.html

    There is no equivlent function for the main process stack, other than
    the 'getrlimit(RLIMIT_STACK...' functionality.

    pthread_attr_getstack works for POSIX threads ... but no clue if it
    would work for the main thread or for OS threads not started by
    pthread_create.

    Unfortunately away from my Linux machine so can't try it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Scott Lurndal on Thu Jan 23 02:47:17 2025
    On Thu, 23 Jan 2025 1:07:02 +0000, Scott Lurndal wrote:

    Michael S <already5chosen@yahoo.com> writes:
    On Wed, 22 Jan 2025 15:01:34 GMT
    scott@slp53.sl.home (Scott Lurndal) wrote:

    Michael S <already5chosen@yahoo.com> writes:
    On Wed, 22 Jan 2025 02:54:33 -0000 (UTC)
    John Levine <johnl@taugh.com> wrote:

    According to George Neuner <gneuner2@comcast.net>:
    Not standard compliant for sure, but you certainly can approximate
    stack use in C: just store (as byte*) the address of a local in
    your top level function, and check the (absolute value of) the
    difference to the address of a local in the current function.

    Ugh, but yes that would work with the usual stack structure,

    The bigger problem is knowing how much stack is available to use -
    there may be no way (or no easy way) to find the actual size ...
    or the limit if the stack expands ... and circumstances beyond the
    program may have limited it to be smaller than the program
    requested.

    There's often no way to tell since it may depend on things like
    running out of swap space which depends on how much memory other
    programs are using.


    At least you can know the size of reserved VA space which is both an
    upper bound of the limit and in 99% of the cases an actual limit.

    On Windows:
    https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-getcurrentthreadstacklimits

    I suppose that similar functions are available on other OSes as well.


    https://pubs.opengroup.org/onlinepubs/9799919799/functions/pthread_attr_getstack.html

    There is no equivlent function for the main process stack,


    Do you mean "there is no equivlent *POSIX* function", right?
    But I sincerily hope that most Unix-like systems provide such
    functionality in system-specific manner. Because it looks usable.

    What would an application (portable or otherwise) use the process
    stack base address for?

    As a place to put TLS (or &TLS) on register-starved architectures.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Scott Lurndal on Thu Jan 23 07:24:12 2025
    scott@slp53.sl.home (Scott Lurndal) writes:
    What would an application (portable or otherwise) use the process
    stack base address for?

    Finding roots for garbage collection.

    The data is available through the /proc/ filesystem.

    $ cat /proc/$$/maps | grep "[stack]"

    7fff77b52000-7fff77b73000 rw-p 00000000 00:00 0 [stack]

    That's Linux (I just tried it on AIX 7.3.3; there is no
    /proc/$$/maps). For the kind of application under discussion Linux
    also has /proc/self/maps, so you don't need to getpid() and convert it
    to decimal, but in the example that would produce the maps of the
    "cat" process, not that of the surrounding shell.

    I would use "grep -F", because grep by default searches for regexp and
    '[' and ']' are regexp meta-characters.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to mitchalsup@aol.com on Thu Jan 23 14:00:41 2025
    mitchalsup@aol.com (MitchAlsup1) writes:
    On Thu, 23 Jan 2025 1:07:02 +0000, Scott Lurndal wrote:


    Do you mean "there is no equivlent *POSIX* function", right?
    But I sincerily hope that most Unix-like systems provide such >>>functionality in system-specific manner. Because it looks usable.

    What would an application (portable or otherwise) use the process
    stack base address for?

    As a place to put TLS (or &TLS) on register-starved architectures.

    That's a function of the implementation, not the programmer.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Scott Lurndal on Thu Jan 23 17:49:27 2025
    On Thu, 23 Jan 2025 14:00:41 +0000, Scott Lurndal wrote:

    mitchalsup@aol.com (MitchAlsup1) writes:
    On Thu, 23 Jan 2025 1:07:02 +0000, Scott Lurndal wrote:


    Do you mean "there is no equivlent *POSIX* function", right?
    But I sincerily hope that most Unix-like systems provide such >>>>functionality in system-specific manner. Because it looks usable.

    What would an application (portable or otherwise) use the process
    stack base address for?

    As a place to put TLS (or &TLS) on register-starved architectures.

    That's a function of the implementation, not the programmer.

    The compiler needs to know a way of getting TLS in a register starved
    ISA. {This is why segmentation bled over into x86-64}

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to mitchalsup@aol.com on Thu Jan 23 19:45:11 2025
    mitchalsup@aol.com (MitchAlsup1) writes:
    On Thu, 23 Jan 2025 14:00:41 +0000, Scott Lurndal wrote:

    mitchalsup@aol.com (MitchAlsup1) writes:
    On Thu, 23 Jan 2025 1:07:02 +0000, Scott Lurndal wrote:


    Do you mean "there is no equivlent *POSIX* function", right?
    But I sincerily hope that most Unix-like systems provide such >>>>>functionality in system-specific manner. Because it looks usable.

    What would an application (portable or otherwise) use the process
    stack base address for?

    As a place to put TLS (or &TLS) on register-starved architectures.

    That's a function of the implementation, not the programmer.

    The compiler needs to know a way of getting TLS in a register starved
    ISA.

    The compiler is part of the "implementation". Very few programmers
    work on compilers for register starved architectures (of which few
    are still in common use). And very few of them care about the
    stack base address.


    {This is why segmentation bled over into x86-64}

    Well, it gave them a couple scratch registers for use as
    kernel and user-mode thread specific region pointers (fs, gs).

    However, I doubt that played a huge factor in AMD keeping what's
    left of 80286 segments, they could have just re-used the
    encodings for FS and GS for new GPRs and reserved them for
    TLS in ABI's for implementations that support threads.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Scott Lurndal on Thu Jan 23 20:04:49 2025
    On Thu, 23 Jan 2025 19:45:11 +0000, Scott Lurndal wrote:

    mitchalsup@aol.com (MitchAlsup1) writes:
    On Thu, 23 Jan 2025 14:00:41 +0000, Scott Lurndal wrote:

    mitchalsup@aol.com (MitchAlsup1) writes:
    On Thu, 23 Jan 2025 1:07:02 +0000, Scott Lurndal wrote:


    Do you mean "there is no equivlent *POSIX* function", right?
    But I sincerily hope that most Unix-like systems provide such >>>>>>functionality in system-specific manner. Because it looks usable.

    What would an application (portable or otherwise) use the process
    stack base address for?

    As a place to put TLS (or &TLS) on register-starved architectures.

    That's a function of the implementation, not the programmer.

    The compiler needs to know a way of getting TLS in a register starved
    ISA.

    The compiler is part of the "implementation". Very few programmers
    work on compilers for register starved architectures (of which few
    are still in common use). And very few of them care about the
    stack base address.


    {This is why segmentation bled over into x86-64}

    Well, it gave them a couple scratch registers for use as
    kernel and user-mode thread specific region pointers (fs, gs).

    However, I doubt that played a huge factor in AMD keeping what's
    left of 80286 segments, they could have just re-used the
    encodings for FS and GS for new GPRs and reserved them for
    TLS in ABI's for implementations that support threads.

    We had long conversations about what to leave in and what to take
    (back) out.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Scott Lurndal on Fri Jan 24 08:11:34 2025
    scott@slp53.sl.home (Scott Lurndal) writes:
    Well, it gave them a couple scratch registers for use as
    kernel and user-mode thread specific region pointers (fs, gs).

    However, I doubt that played a huge factor in AMD keeping what's
    left of 80286 segments, they could have just re-used the
    encodings for FS and GS for new GPRs and reserved them for
    TLS in ABI's for implementations that support threads.

    Unfortunately, Mitch Alsup has not stated the reasoning behind their
    decisions, but my speculation why they decided on the current solution
    and not on what you outline is:

    * The hardware needs a 32+32+32+32-bit (i.e., four-input) adder in the
    address path for IA-32 anyway, and at least a three-input
    64+64+32-bit adder for AMD64, so the additional cost of requiring a
    64+64+64+32-bit adder for AMD64 is relatively small.

    * Also, decoding the FS:/GS: prefix as a segment prefix in IA-32 and
    as a GPR (for which register use in the instruction?) in AMD64
    complicates the decoder.

    * On the software side, having FS: as separate argument means that
    software can use the full power of the addressing modes to access
    TLS; however, thinking about usage scenarios (arrays in TLS), it
    seems to me that the TLS-base would often be a constant (fitting in
    the regular addressing modes), or is fetched from TLS (which can be
    done with three-address operations) and then used (which also does
    not need segment prefix if the fetched address is absolute rather
    than TLS-relative; and why should it be TLS-relative when the
    address is thread-specific rather than useful across threads).

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Anton Ertl on Fri Jan 24 14:50:57 2025
    On Fri, 24 Jan 2025 8:11:34 +0000, Anton Ertl wrote:

    scott@slp53.sl.home (Scott Lurndal) writes:
    Well, it gave them a couple scratch registers for use as
    kernel and user-mode thread specific region pointers (fs, gs).

    However, I doubt that played a huge factor in AMD keeping what's
    left of 80286 segments, they could have just re-used the
    encodings for FS and GS for new GPRs and reserved them for
    TLS in ABI's for implementations that support threads.

    Unfortunately, Mitch Alsup has not stated the reasoning behind their decisions,

    On purpose.

    but my speculation why they decided on the current solution
    and not on what you outline is:

    * The hardware needs a 32+32+32+32-bit (i.e., four-input) adder in the
    address path for IA-32 anyway, and at least a three-input
    64+64+32-bit adder for AMD64, so the additional cost of requiring a
    64+64+64+32-bit adder for AMD64 is relatively small.

    One can add the displacement and the segment base in DECODE, delaying
    the rest of address generation to AGEN, saving an input to the AGEN
    adder. Displacement is a constant, and segment base is a relative
    constant. Thus one never needs more than 3-input adders--even with segmentation.

    * Also, decoding the FS:/GS: prefix as a segment prefix in IA-32 and
    as a GPR (for which register use in the instruction?) in AMD64
    complicates the decoder.

    * On the software side, having FS: as separate argument means that
    software can use the full power of the addressing modes to access
    TLS; however, thinking about usage scenarios (arrays in TLS), it
    seems to me that the TLS-base would often be a constant (fitting in
    the regular addressing modes), or is fetched from TLS (which can be
    done with three-address operations) and then used (which also does
    not need segment prefix if the fetched address is absolute rather
    than TLS-relative; and why should it be TLS-relative when the
    address is thread-specific rather than useful across threads).

    - anton

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)