• First beta release of gawk 5.4.0

    From arnold@arnold@skeeve.com (Aharon Robbins) to comp.lang.awk on Mon Jan 19 13:21:31 2026
    From Newsgroup: comp.lang.awk

    This note is to announce the (first) BETA release of GNU Awk 5.4.0.

    It is available from:

    http://www.skeeve.com/gawk/gawk-5.3.65.tar.gz

    This is a major release.

    The important part of the NEWS file is below.

    In addition, I am attaching the README.matchers file, because what it
    has to say is very important.

    The documentation and code have largely hit the freeze point. The only
    port that is still work-in-progress is that for OpenVMS.

    So, why do a beta release? So that you, yes you, the end user, can see
    if anything I've done breaks gawk for you. Then you can TELL ME ABOUT
    IT so that I can fix it for the final release.

    The introduction of a new regexp matcher makes beta testing for this
    release doubly important. This is especially true for the GNU/Linux distributions. So please, test away!

    Much thanks,

    Arnold Robbins
    arnold@skeeve.com
    ---------------------------------------------
    Copyright (C) 2019, 2020, 2021, 2022, 2023, 2024, 2025
    Free Software Foundation, Inc.

    Copying and distribution of this file, with or without modification,
    are permitted in any medium without royalty provided the copyright
    notice and this notice are preserved.

    Changes from 5.3.x to 5.4.0
    ---------------------------

    1. This release now uses Mike Haertel's MinRX regular expression matcher
    as the default regexp engine. The old regex and dfa engines are still
    available. More detail is available in the manual, and in the file
    README_d/README.matchers. At the very least, read that file!

    2. There is now a new directive, @nsinclude, which works like @include
    but does not reset the namespace for the included file to "awk". See
    the manual for details.

    3. When using lshift() or rshift() and attempting to shift by as many
    or more bits than in a uintmax_t, gawk returns zero, instead of
    whatever the C compiler and hardware might have done.

    4. Gawk's use of persistent memory has changed somewhat:
    A. It's now possible to use persistent memory and dynamic extensions
    without problems. Gawk notices if an extension is being loaded from
    a different path than what was first used and produces a fatal error
    in this case.
    B. Gawk generates a warning if the version of gawk saved in the backing
    file doesn't match that of the current running gawk.
    C. Gawk now stores additional meta-information in the backing file.
    This means that if you have a backing file with important data
    in it, you should dump the data to a text file using the old version,
    create a new backing file, and then read your data back in with
    the new version.

    5. The ordchr extension now supports multibyte / wide characters.

    6. Per the 2024 POSIX standard, `length(array)' is no longer an extension,
    but a regular feature. Thus --posix no longer rejects it and --lint
    no longer warns about it.

    7. The --traditional option has been rationalized to bring gawk into
    sync with BWK awk. It no longer affects the return code from system(),
    and it no longer prevents using a regexp for RS. Internally, the
    code was cleaned up some as well.

    8. Assertions in the C code are now enabled. To disable them, manually
    edit the various Makefiles after running configure and before
    running make. You will need to add -DNDEBUG to the CFLAGS variable.

    9. PMA should now work on OpenBSD 7.*.

    10. Hexadecimal floating-point values may now be used in program source code,
    with strtonum(), and with the -n/--non-decimal-data option. See the
    manual for details.

    11. A large number of small "replacement" files for standard functions
    have been removed. These functions are now so standard that we
    simply expect them to always be available. This simplifies the
    distribution and the code maintenance.

    12. Support for UDP in gawk's networking support is now obsolete.
    It never worked very well. It will be removed in version 6.0.
    Gawk issues a warning when attempting to use it.

    13. Reading regular disk input files should be somewhat faster now,
    since gawk no longer checks for timeouts on such files. On one
    very large file, gawk '{ print }' saw approximately a 9% speedup.

    14. The MinGW port of gawk for MS-Windows now supports UTF-8 encoded
    non-ASCII text when the console window where gawk runs uses the
    Windows codepage 65001 for output, even if the system-wide locale
    specifies another codepage.

    15. There is a new option to configure: --enable-O3. This causes gcc to
    use -O3 instead of -O2 when compiling gawk. This is not the default
    because experience in some projects has shown (sadly) that -O3 can cause
    bugs.

    16. As usual, a number of small bugs have been fixed; see the ChangeLog
    for the details.

    Changes from 5.3.2 to 5.3.x
    ---------------------------

    1. The Hebrew translation has been revived.

    2. All non-standard variables are now not installed for --traditional
    and --posix.

    3. It's been discovered that persistent memory and dynamic extensions don't mix.
    For now, trying this combination produces a fatal error. It may one day
    get fixed. Or, it may not.

    4. A bug in the API has been fixed whereby using a numeric index to set an
    array element will work. As a result, the API minor version was increased to 1.

    ==================== README.matchers ========================
    Tue Dec 9 04:05:05 PM IST 2025
    ===============================

    * I * M * P * O * R * T * A * N * T *

    This release includes a new regular expression matcher, MinRX, written
    by Mike Haertel, the original author of GNU grep. It's available from https://github.com/mikehaertel/minrx.

    This matcher is fully POSIX compliant, which the current GNU matchers
    are not. In particular it follows POSIX rules for finding the longest
    leftmost submatches. It is also more strict as to regular expression
    syntax, but primarily in a few corner cases that normal, correct,
    regular expression usage should not encounter.

    Because regular expression matching is such a fundamental part of
    awk/gawk, the original GNU matchers are still included in gawk. In order
    to use them, give a value to the GAWK_GNU_MATCHERS environment variable
    before invoking gawk.

    If you find a difference in behavior between the new and original
    matchers, please report it. In particular if it adversely affects your
    current application(s). Note that if the difference is due to being fully POSIX compliant, then you should consider revising your application.
    Please use the gawkbug script to report any issues, as would be done
    for any other bug. See node Bugs in the manual for more details; it's
    online at https://www.gnu.org/software/gawk/manual/html_node/Bugs.html.

    PLEASE NOTE! The original GNU matchers will eventually be removed from
    gawk. So, please take the time to notice and report any issues in the
    MinRX matcher, so that they can be ironed out sooner rather than later.

    Thanks!
    --- Synchronet 3.21a-Linux NewsLink 1.2