• Re: performance regressions in 15.0 [The Microsoft Dev Kit 2023 buildworld took about 6 minutes less time for jemalloc 5.3.0, not more, for non-debug contexts]

    From Mark Millard@marklmi@yahoo.com to muc.lists.freebsd.stable on Sun Dec 7 08:18:56 2025
    From Newsgroup: muc.lists.freebsd.stable

    On Dec 6, 2025, at 19:03, Mark Millard <marklmi@yahoo.com> wrote:
    On Dec 6, 2025, at 14:25, Warner Losh <imp@bsdimp.com> wrote:

    On Sat, Dec 6, 2025, 3:06rC>PM Mark Millard <marklmi@yahoo.com> wrote:

    On Dec 6, 2025, at 06:14, Mark Millard <marklmi@yahoo.com> wrote:

    Mateusz Guzik <mjguzik_at_gmail.com> wrote on
    Date: Sat, 06 Dec 2025 10:50:08 UTC :

    I got pointed at phoronix: https://www.phoronix.com/review/freebsd-15-amd-epyc

    While I don't treat their results as gospel, a FreeBSD vs FreeBSD test >>>>> showing a slowdown most definitely warrants a closer look.

    They observed slowdowns when using iperf over localhost and when compiling llvm.

    I can confirm both problems and more.

    I found the profiling tooling for userspace to be broken again so I
    did not investigate much and I'm not going to dig into it further.

    Test box is AMD EPYC 9454 48-Core Processor, with the 2 systems
    running as 8 core vms under kvm.
    . . .



    Both of the below are from ampere3 (aarch64) instead, its
    2 most recent "bulk -a" runs that completed, elapsed times
    shown for qt6-webengine-6.9.3 builds:

    150releng-arm64-quarterly qt6-webengine-6.9.3 53:33:46
    135arm64-default qt6-webengine-6.9.3 38:43:36

    For reference:

    Host OSVERSION: 1600000
    Jail OSVERSION: 1500068

    vs.

    Host OSVERSION: 1600000
    Jail OSVERSION: 1305000

    The difference for the above is in the Jail's world builds,
    not in the boot's (kernel+world) builds.


    For reference:


    https://pkg-status.freebsd.org/ampere3/build.html?mastername=150releng-arm64-quarterly&build=88084f9163ae

    build of www/qt6-webengine | qt6-webengine-6.9.3 ended at Sun Nov 30 05:40:02 -00 2025
    build time: 2D:05:33:52


    https://pkg-status.freebsd.org/ampere3/build.html?mastername=135arm64-default&build=f5384fe59be6

    build of www/qt6-webengine | qt6-webengine-6.9.3 ended at Sat Nov 22 15:33:34 -00 2025
    build time: 1D:14:43:41


    Expanding the notes to before and after jemalloc 5.3.0
    was merged to main: beefy18 was the main-amd64 builder
    before and somewhat after the jemalloc 5.3.0 merge from
    vendor branch:

    Before: p2650762431ca_s51affb7e971 261:29:13 building 36074 port-packages, start 05 Aug 2025 01:10:59 GMT
    ( jemalloc 5.3.0 merge from vendor branch: 15 Aug 2025)
    After : p9652f95ce8e4_sb45a181a74c 428:49:20 building 36318 port-packages, start 19 Aug 2025 01:30:33 GMT

    (The log files are long gone for port-packages built.)

    main-15 used a debug jail world but 15.0-RELEASE does not.

    I'm not aware of such a port-package builder context for a
    non-debug jail world before and after a jemalloc 5.3.0 merge.

    A few months before I landed the jemalloc patches, i did 4 or 5 from dirt buildworlds. The elasped time was, iirc, with 1 or 2%. Enough to see maybe a diff with the small sample size, but not enough for ministat to trigger at 95%. I didn't recall keeping the data for this and can't find it now. And I'm not even sure, in hindsight, I ran a good experiment. It might be related, or not, but it would be easy enough for someone to setup a two jails: one just before and one just after. Build from scratch the world (same hash) on both. That would test it since you'd be holding all other variables constant.

    When we imported the tip of FreeBSD main at work, we didn't get a cpu change trigger from our tests that I recall...


    The range of commits look like:

    rCo git: 9a7c512a6149 - main - ucred groups: restore a useful comment Eric van Gyzen
    rCo git: bf6039f09a30 - main - jemalloc: Unthin contrib/jemalloc Warner Losh
    rCo git: a0dfba697132 - main - jemalloc: Update jemalloc.xml.in per FreeBSD-diffs Warner Losh
    rCo git: 718b13ba6c5d - main - jemalloc: Add FreeBSD's updates to jemalloc_preamble.h.in Warner Losh
    rCo git: 6371645df7b0 - main - jemalloc: Add JEMALLOC_PRIVATE_NAMESPACE for the libc namespace Warner Losh
    rCo git: da260ab23f26 - main - jemalloc: Only replace _pthread_mutex_init_calloc_cb in private namespace Warner Losh
    rCo git: c43cad871720 - main - jemalloc: Merge from jemalloc 5.3.0 vendor branch Warner Losh
    rCo git: 69af14a57c9e - main - jemalloc: Note update in UPDATING and RELNOTES Warner Losh

    I've started a build of a non-debug 9a7c512a6149 world
    to later create a chroot to do a test buildworld in.

    I'll also do a build of a non-debug 69af14a57c9e world
    to later create the other chroot to do a test
    buildworld in.

    non-debug means my use of:

    WITH_MALLOC_PRODUCTION=
    WITHOUT_ASSERT_DEBUG=
    WITHOUT_PTHREADS_ASSERTIONS=
    WITHOUT_LLVM_ASSERTIONS=

    I've used "env WITH_META_MODE=" as it cuts down on the
    volume and frequency of scrolling output. I'll do the
    same later.

    If there is anything you want controlled in a different
    way, let me know.

    The Windows Dev Kit 2023 is booted (world and kernel)
    with:

    # uname -apKU
    FreeBSD aarch64-main-pbase 16.0-CURRENT FreeBSD 16.0-CURRENT main-n281922-4872b48b175c GENERIC-NODEBUG arm64 aarch64 1600004 1600004

    which is from an official pkgbase distribution. So the
    boot-world is a debug world but the boot-kernel is not.

    The Windows Dev Kit 2023 will take some time for such
    -j8 builds and I may end up sleeping in the middle of
    the sequence someplace. So it may be a while before
    I've any comparison/contrast data to report.

    Summary for jemalloc for before vs. at 5.3.0
    for *non-debug* contexts doing the buildworld :
    before 5.3.0: 9754 seconds (about 2.7 hrs)
    with 5.3.0: 9384 seconds (about 2.6 hrs)
    So: somewhat less time with 5.3.0 but nearly
    the same.
    It does not clarify what is going on for building
    qt6-webengine-6.9.3 --other than suggesting
    including looking for alternative sources of
    issues.
    Also, it seems that the Mateusz Guzik
    microbenchmark results do not scale for the
    specific type of activity for the specific type
    of platform.
    Details . . .
    My two source trees for creating the 2 chroots are:
    # ~/fbsd-based-on-what-commit.sh -C /usr/src-jemalloc-5p3p0-before/ 9a7c512a6149 (HEAD) ucred groups: restore a useful comment
    Author: Eric van Gyzen <vangyzen@FreeBSD.org>
    Commit: Eric van Gyzen <vangyzen@FreeBSD.org>
    CommitDate: 2025-08-15 13:29:18 +0000
    # ~/fbsd-based-on-what-commit.sh -C /usr/src-jemalloc-5p3p0-at/
    69af14a57c9e (HEAD) jemalloc: Note update in UPDATING and RELNOTES
    Author: Warner Losh <imp@FreeBSD.org>
    Commit: Warner Losh <imp@FreeBSD.org>
    CommitDate: 2025-08-15 21:57:59 +0000
    Both have src.conf :
    WITH_MALLOC_PRODUCTION=
    WITHOUT_ASSERT_DEBUG=
    WITHOUT_PTHREADS_ASSERTIONS=
    WITHOUT_LLVM_ASSERTIONS=
    since that works for the main 16 in use. (But
    /etc/src.conf needs to be used in the chroot's.)
    Having main 16 build /usr/src-jemalloc-5p3p0-before/ :
    World build completed on Sat Dec 6 21:24:09 PST 2025
    World built in 11817 seconds, ncpu: 8, make -j8
    Having main 16 build /usr/src-jemalloc-5p3p0-at/ :
    World build completed on Sun Dec 7 00:46:25 PST 2025
    World built in 11996 seconds, ncpu: 8, make -j8
    (So: not much difference, as expected.)
    I then did installation and setup of the two chroot
    directory trees, creating:
    # ls -dC1 /usr/obj/DESTDIRs/jemalloc-5p3p0-*/ /usr/obj/DESTDIRs/jemalloc-5p3p0-at/
    /usr/obj/DESTDIRs/jemalloc-5p3p0-before/
    Both got /etc/src.conf :
    WITH_MALLOC_PRODUCTION=
    WITHOUT_ASSERT_DEBUG=
    WITHOUT_PTHREADS_ASSERTIONS=
    WITHOUT_LLVM_ASSERTIONS=
    I then created and, via rsync, populated each of:
    # ls -dC1 /usr/obj/DESTDIRs/jemalloc-5p3p0-*/usr/src-jemalloc-5p3p0-*/ /usr/obj/DESTDIRs/jemalloc-5p3p0-at/usr/src-jemalloc-5p3p0-at/ /usr/obj/DESTDIRs/jemalloc-5p3p0-before/usr/src-jemalloc-5p3p0-at/
    I then did:
    # chroot /usr/obj/DESTDIRs/jemalloc-5p3p0-before/
    # cd /usr/src-jemalloc-5p3p0-at/
    # env WITH_META_MODE make -j8 buildworld
    It resulted in:
    World build completed on Sun Dec 7 12:25:45 UTC 2025
    World built in 9754 seconds, ncpu: 8, make -j8
    (So definitely less time consuming than main 16's
    build of the src-jemalloc-5p3p0-at/ source, as
    expected.)
    After exiting that chroot, I then did:
    # chroot /usr/obj/DESTDIRs/jemalloc-5p3p0-at/
    # cd /usr/src-jemalloc-5p3p0-at/
    # env WITH_META_MODE make -j8 buildworld
    It resulted in:
    World build completed on Sun Dec 7 15:36:41 UTC 2025
    World built in 9384 seconds, ncpu: 8, make -j8
    So, less time than before jemalloc 5.3.0 .
    Note: the Microsoft Windows Dev Kit 2023 was using
    a 1.4 TByte Optane U.2 via a USB3 adapter, of all
    things.
    ===
    Mark Millard
    marklmi at yahoo.com
    --
    Posted automagically by a mail2news gateway at muc.de e.V.
    Please direct questions, flames, donations, etc. to news-admin@muc.de
    --- Synchronet 3.21a-Linux NewsLink 1.2