• Re: debian kernel compiler

    From pocket@homemail.com@21:1/5 to All on Sat Jan 4 18:50:01 2025
    Sent: Saturday, January 04, 2025 at 11:54 AM
    From: "Lee" <ler762@gmail.com>
    To: "Franco Martelli" <martellif67@gmail.com>
    Cc: debian-user@lists.debian.org
    Subject: Re: debian kernel compiler

    On Sat, Jan 4, 2025 at 7:52 AM Franco Martelli wrote:

    On 02/01/25 at 12:53, Istvan Toth wrote:
    amd 5700G cpu

    If you are new to kernel compiling maybe you don't know that you can optimize the kernel for your specific CPU architecture, if you are using the GCC compiler:

    first make a backup copy of the Makefile:
    $ cd linux-source-6.1
    $ cp arch/x86/Makefile arch/x86/Makefile.backup

    then edit "arch/x86/Makefile":
    $ cd linux-source-6.1
    $ vi arch/x86/Makefile

    at line 152 change:
    cflags-$(CONFIG_MK8) += -march=k8
    to
    cflags-$(CONFIG_MK8) += -march=znver3
    and below at line 159 change
    rustflags-$(CONFIG_MK8) += -Ctarget-cpu=k8
    to
    rustflags-$(CONFIG_MK8) += -Ctarget-cpu=znver3

    save and exit vim. "znver3" is the GCC's switch for the µarch of your GPU.

    GPU or central processing unit?

    As long as you're not cross-compiling, how is march=znver3 better than march=native ?

    On my machine, 'man gcc' has the "znver1" and "znver2" strings, but no "znver3" so it seems like "march=native" would be more correct.. or
    at least less chances of an error.
    ... assuming there are no drawbacks to using "march=native".

    TIA,
    Lee


    As someone that has compiled many a kernel on different platforms over 30 years......

    gcc has a facility to interrogate the system it is on, it will tell you what is available on the platform hardware and flag wise.

    You should use that.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From pocket@homemail.com@21:1/5 to All on Sun Jan 5 01:30:01 2025
    Sent: Saturday, January 04, 2025 at 7:12 PM
    From: "Jeffrey Walton" <noloader@gmail.com>
    To: pocket@homemail.com
    Cc: "debian-user" <debian-user@lists.debian.org>
    Subject: Re: debian kernel compiler

    On Sat, Jan 4, 2025 at 4:17 PM <pocket@homemail.com> wrote:

    Sent: Saturday, January 04, 2025 at 11:54 AM
    From: "Lee" <ler762@gmail.com>
    To: "Franco Martelli" <martellif67@gmail.com>
    Cc: debian-user@lists.debian.org
    Subject: Re: debian kernel compiler

    On Sat, Jan 4, 2025 at 7:52 AM Franco Martelli wrote:

    On 02/01/25 at 12:53, Istvan Toth wrote:
    amd 5700G cpu

    If you are new to kernel compiling maybe you don't know that you can optimize the kernel for your specific CPU architecture, if you are using
    the GCC compiler:

    first make a backup copy of the Makefile:
    $ cd linux-source-6.1
    $ cp arch/x86/Makefile arch/x86/Makefile.backup

    then edit "arch/x86/Makefile":
    $ cd linux-source-6.1
    $ vi arch/x86/Makefile

    at line 152 change:
    cflags-$(CONFIG_MK8) += -march=k8
    to
    cflags-$(CONFIG_MK8) += -march=znver3
    and below at line 159 change
    rustflags-$(CONFIG_MK8) += -Ctarget-cpu=k8
    to
    rustflags-$(CONFIG_MK8) += -Ctarget-cpu=znver3

    save and exit vim. "znver3" is the GCC's switch for the µarch of your GPU.

    GPU or central processing unit?

    As long as you're not cross-compiling, how is march=znver3 better than march=native ?

    On my machine, 'man gcc' has the "znver1" and "znver2" strings, but no "znver3" so it seems like "march=native" would be more correct.. or
    at least less chances of an error.
    ... assuming there are no drawbacks to using "march=native".

    As someone that has compiled many a kernel on different platforms over 30 years......

    gcc has a facility to interrogate the system it is on, it will tell you what is available on the platform hardware and flag wise.

    You should use that.

    The option to compile for "this machine" or "compiling machine" is -march=native. It should be added to CFLAGS and CXXFLAGS, assuming GNU makefile rules.

    The native option is available on certain targets. See, for example, <https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html>.

    Jeff

    30 years of building custom linux systems on multiple platforms says no.

    That will fail on many platforms.

    Again the correct way is as I stated above/previously

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christian Groessler@21:1/5 to Istvan Toth on Sun Jan 5 12:50:01 2025
    Istvan,

    I suspect you've got a bad memory chip. Try running a memory test.

    regards.
    chris


    On 1/4/25 20:58, Istvan Toth wrote:
    Hi Marko,
    thank you for your detailed and thorough advice.
    I implemented them, everything ran without errors. But unfortunately the error persists,

    Building module:
    Cleaning build area...
    env NV_VERBOSE=1 make -j16 modules KERNEL_UNAME=6.1.119- fah105..................(bad exit status: 2)
    Error! Bad return status for module build on kernel: 6.1.119-fah105 (x86_64) Consult /var/lib/dkms/nvidia-current/535.183.01/build/make.log for more information.
    Error! One or more modules failed to install during autoinstall.
    Refer to previous errors for more information.
    dkms: autoinstall for kernel: 6.1.119-fah105 failed!
    run-parts: /etc/kernel/postinst.d/dkms exited with return code 11

    ...

    dpkg: error processing package linux-image-6.1.119-fah105 (--install):
     installed linux-image-6.1.119-fah105 package post-installation script subprocess returned error exit status 1
    Errors were encountered while processing:

    These lines are very familiar, there were two cases where I didn't see
    them, and linux-image.x.y.z ran without error.

    With thanks to you ti

    On 1/4/25 11:26, Franco Martelli wrote:
    On 02/01/25 at 12:53, Istvan Toth wrote:
    amd 5700G cpu

    If you are new to kernel compiling maybe you don't know that you can
    optimize the kernel for your specific CPU architecture, if you are
    using the GCC compiler:

    first make a backup copy of the Makefile:
    $ cd linux-source-6.1
    $ cp arch/x86/Makefile arch/x86/Makefile.backup

    then edit "arch/x86/Makefile":
    $ cd linux-source-6.1
    $ vi arch/x86/Makefile

    at line 152 change:
    cflags-$(CONFIG_MK8)            += -march=k8
    to
    cflags-$(CONFIG_MK8)            += -march=znver3
    and below at line 159 change
    rustflags-$(CONFIG_MK8)         += -Ctarget-cpu=k8
    to
    rustflags-$(CONFIG_MK8)         += -Ctarget-cpu=znver3

    save and exit vim. "znver3" is the GCC's switch for the µarch of your
    GPU.

    Clean the old kernel build with:

    $ make -j16 clean

    then with the Kernel configuration tool, I use:

    $ make -j16 menuconfig

    in: "Processor type and features  --->"
    in: "Processor family"
    choose: "Opteron/Athlon64/Hammer/K8"

    then select "Exit" and "Save" the kernel configuration.
    To build the kernel I use the command:

    $ time make -s -j16 bindeb-pkg

    This command generates Linux-image and Linux-header packages (.deb)
    that can be installed with "dpkg -i" command.

    Be aware that objtool command may not support the "znver3" GCC's
    switch, showing you "warning ..." during the Kernel compilation
    process, in that circumstance you can try other GCC switch: "znver2"
    or "znver1".
    If also that fail, restore the "Makefile" file using the backup copy.

    Just my 2¢ tips, cheers.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Schmitt@21:1/5 to Istvan Toth on Thu Jan 2 14:00:01 2025
    Hi,

    Istvan Toth wrote:
    Most
    times, about 100 times so far, the compiler ran at 40 minutes, but then it does not compile the nvidia-current 183.216.01 or the 535.183.01 module. Because of this, there will be no /boot/initrd.img file, the boot will not start.

    This would match a prematurely ended "make" run.
    Does it report any error messages ?


    [...] amd 5700G cpu [...]
    The other time interval is more than 4 hours, on December 5 and
    December 26, but then nvidia turned on too, the initrd.img was downloaded, and the kernel started without errors and works.

    4 hours on a ~4 GHz 8 core CPU seems a bit long.
    Can it be that your system lacks the necessary RAM, swaps heavily, and
    then lets the Out-Of-Memory killer do its work ?

    (If enough RAM:
    It's several years ago that i compiled kernels 5.X on a 4 core Xeon.
    I remember that "make" option -j8 speeded up compilation a lot.)


    Could you help me with some ideas?

    You could try to catch the complete output of both kinds of runs in order
    to let "diff" compare them.
    At least you will have some possibly significant message lines to post
    here or to google for.

    Your report lets me assume that you will immediately get the messages of
    a short, bad run:

    make deb-pkg ...options... 2>&1 | tee -i "$HOME"/make_deb_pkg_log_1

    and then will have to re-try often to get a message log of a good long
    one:

    make deb-pkg ...options... 2>&1 | tee -i "$HOME"/make_deb_pkg_log_2

    Option -j might be unhelpful for the logging purpose, by making the
    sequence of messages non-deterministic.


    Have a nice day :)

    Thomas

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Schmitt@21:1/5 to Istvan Toth on Thu Jan 2 18:00:01 2025
    Hi,

    Istvan Toth wrote:
    when installing
    the image the build module does not run during the short-term (30-40 minute) compiler runtime (I attach it).
    [...]
    ~/kernel$ sudo dpkg -i linux-image-6.1.119-fah79_6.1.119-1_amd64.deb

    I might have misunderstood your problem description.
    I thought the difference of 40 minutes versus 4 hourd was with the run
    time of
    make deb-pkg LOCALVERSION=fah79 ...
    as in 8.10.4. "Compiling and Building the Package" of
    https://debian-handbook.info/browse/stable/sect.kernel-compilation.html

    So i now wonder whether the difference in time and success happened
    with the same file linux-image-6.1.119-fah79_6.1.119-1_amd64.deb or with
    each time newly built .deb package files.

    If the same .deb file sometimes succeeds but in most tries fails, then
    the problem is out of my range of experience.


    env NV_VERBOSE=1 make -j16 modules KERNEL_UNAME=6.1.119-fah79

    (At least the "make" run makes generous use of your CPU's cores ...)


    Consult /var/lib/dkms/nvidia-current/535.216.01/build/make.log for more information.

    As Greg Wooledge already stated, this is probably the best place to
    look for hints about the reason of failure.


    Have a nice day :)

    Thomas

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Wooledge@21:1/5 to Istvan Toth on Thu Jan 2 17:20:03 2025
    On Thu, Jan 02, 2025 at 17:11:10 +0100, Istvan Toth wrote:
    ~/kernel$ sudo dpkg -i linux-image-6.1.119-fah79_6.1.119-1_amd64.deb Selecting previously unselected package linux-image-6.1.119-fah79.
    (Reading database ... 192934 files and directories currently installed.) Preparing to unpack linux-image-6.1.119-fah79_6.1.119-1_amd64.deb ... Unpacking linux-image-6.1.119-fah79 (6.1.119-1) ...
    Setting up linux-image-6.1.119-fah79 (6.1.119-1) ...
    dkms: running auto installation service for kernel 6.1.119-fah79.
    Sign command: /usr/lib/linux-kbuild-6.1/scripts/sign-file
    Signing key: /var/lib/dkms/mok.key
    Public certificate (MOK): /var/lib/dkms/mok.pub

    Building module:
    Cleaning build area...
    env NV_VERBOSE=1 make -j16 modules KERNEL_UNAME=6.1.119-fah79.................................(bad exit status: 2)
    Error! Bad return status for module build on kernel: 6.1.119-fah79 (x86_64) Consult /var/lib/dkms/nvidia-current/535.216.01/build/make.log for more information.
    Error! One or more modules failed to install during autoinstall.
    Refer to previous errors for more information.
    dkms: autoinstall for kernel: 6.1.119-fah79 failed!

    I would look in /var/lib/dkms/nvidia-current/535.216.01/build/make.log
    as it says.

    I don't believe linux-image-6.1.119-fah79_6.1.119-1_amd64.deb is a
    Debian kernel package. If you got this from a third-party source, or
    if you built it yourself, then you will need the linux-headers-*
    package that matches it, in order to build DKMS modules.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Wooledge@21:1/5 to Istvan Toth on Thu Jan 2 20:50:01 2025
    On Thu, Jan 02, 2025 at 20:12:09 +0100, Istvan Toth wrote:
    I am attaching the nvidia make.log.I made the fah79 image deb package
    myself. It also included - linux-headers-6.1.119-fah79_6.1.119-1_amd64.deb

    The log is quite large. I'm surprised the mailing list actually allowed
    this.

    The intent was for *you* to read the log and find the errors and fix
    them, not for you to just dump it all on us.

    But anyway...


    #error dma_buf_export() conftest failed!

    #error wait_on_bit_lock() conftest failed!

    #error radix_tree_replace_slot() conftest failed!

    gcc -Wp,-MMD,/var/lib/dkms/nvidia-current/535.216.01/build/nvidia/.nv-dma.o.d -nostdinc -I./arch/x86/include -I./arch/x86/include/generated -I./include -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi -I./include/
    generated/uapi -include ./include/linux/compiler-version.h -include ./include/linux/kconfig.h -include ./include/linux/compiler_types.h -D__KERNEL__ -fmacro-prefix-map=./= -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-
    common -fshort-wchar -fno-PIE -Werror=implicit-function-declaration -Werror=implicit-int -Werror=return-type -Wno-format-security -std=gnu11 -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -fcf-protection=none -m64 -falign-jumps=1 -falign-loops=1 -mno-
    80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -mtune=generic -mno-red-zone -mcmodel=kernel -Wno-sign-compare -fno-asynchronous-unwind-tables -mindirect-branch=thunk-extern -mindirect-branch-register -mindirect-branch-cs-prefix -
    mfunction-return=thunk-extern -fno-jump-tables -fno-delete-null-pointer-checks -Wno-frame-address -Wno-format-truncation -Wno-format-overflow -Wno-address-of-packed-member -O2 -fno-allow-store-data-races -Wframe-larger-than=2048 -fstack-protector-strong -
    Wno-main -Wno-unused-but-set-variable -Wno-unused-const-variable -Wno-dangling-pointer -fomit-frame-pointer -ftrivial-auto-var-init=zero -fno-stack-clash-protection -Wvla -Wno-pointer-sign -Wcast-function-type -Wno-stringop-truncation -Wno-stringop-
    overflow -Wno-restrict -Wno-maybe-uninitialized -Werror -Wno-array-bounds -Wno-alloc-size-larger-than -Wimplicit-fallthrough=5 -fno-strict-overflow -fno-stack-check -fconserve-stack -Werror=date-time -Werror=incompatible-pointer-types -Werror=designated-
    init -Wno-packed-not-aligned -I/var/lib/dkms/nvidia-current/535.216.01/build/common/inc -I/var/lib/dkms/nvidia-current/535.216.01/build -Wall -Wno-cast-qual -Wno-error -Wno-format-extra-args -D__KERNEL__ -DMODULE -DNVRM -DNV_VERSION_STRING=\"535.216.01\"
    -Wno-unused-function -Wuninitialized -fno-strict-aliasing -ffreestanding -mno-red-zone -mcmodel=kernel -DNV_UVM_ENABLE -Werror=undef -DNV_SPECTRE_V2=0 -DNV_KERNEL_INTERFACE_LAYER -I/var/lib/dkms/nvidia-current/535.216.01/build/nvidia -DNVIDIA_UNDEF_
    LEGACY_BIT_MACROS -UDEBUG -U_DEBUG -DNDEBUG -DMODULE -DKBUILD_BASENAME='"nv_dma"' -DKBUILD_MODNAME='"nvidia"' -D__KBUILD_MODNAME=kmod_nvidia -c -o /var/lib/dkms/nvidia-current/535.216.01/build/nvidia/nv-dma.o /var/lib/dkms/nvidia-current/535.216.01/
    build/nvidia/nv-dma.c ; ./tools/objtool/objtool --hacks=jump_label --hacks=noinstr --orc --retpoline --rethunk --static-call --uaccess --module /var/lib/dkms/nvidia-current/535.216.01/build/nvidia/nv-dma.o
    gcc -Wp,-MMD,/var/lib/dkms/nvidia-current/535.216.01/build/nvidia/.nv-i2c.o.d -nostdinc -I./arch/x86/include -I./arch/x86/include/generated -I./include -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi -I./include/
    generated/uapi -include ./include/linux/compiler-version.h -include ./include/linux/kconfig.h -include ./include/linux/compiler_types.h -D__KERNEL__ -fmacro-prefix-map=./= -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-
    common -fshort-wchar -fno-PIE -Werror=implicit-function-declaration -Werror=implicit-int -Werror=return-type -Wno-format-security -std=gnu11 -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -fcf-protection=none -m64 -falign-jumps=1 -falign-loops=1 -mno-
    80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -mtune=generic -mno-red-zone -mcmodel=kernel -Wno-sign-compare -fno-asynchronous-unwind-tables -mindirect-branch=thunk-extern -mindirect-branch-register -mindirect-branch-cs-prefix -
    mfunction-return=thunk-extern -fno-jump-tables -fno-delete-null-pointer-checks -Wno-frame-address -Wno-format-truncation -Wno-format-overflow -Wno-address-of-packed-member -O2 -fno-allow-store-data-races -Wframe-larger-than=2048 -fstack-protector-strong -
    Wno-main -Wno-unused-but-set-variable -Wno-unused-const-variable -Wno-dangling-pointer -fomit-frame-pointer -ftrivial-auto-var-init=zero -fno-stack-clash-protection -Wvla -Wno-pointer-sign -Wcast-function-type -Wno-stringop-truncation -Wno-stringop-
    overflow -Wno-restrict -Wno-maybe-uninitialized -Werror -Wno-array-bounds -Wno-alloc-size-larger-than -Wimplicit-fallthrough=5 -fno-strict-overflow -fno-stack-check -fconserve-stack -Werror=date-time -Werror=incompatible-pointer-types -Werror=designated-
    init -Wno-packed-not-aligned -I/var/lib/dkms/nvidia-current/535.216.01/build/common/inc -I/var/lib/dkms/nvidia-current/535.216.01/build -Wall -Wno-cast-qual -Wno-error -Wno-format-extra-args -D__KERNEL__ -DMODULE -DNVRM -DNV_VERSION_STRING=\"535.216.01\"
    -Wno-unused-function -Wuninitialized -fno-strict-aliasing -ffreestanding -mno-red-zone -mcmodel=kernel -DNV_UVM_ENABLE -Werror=undef -DNV_SPECTRE_V2=0 -DNV_KERNEL_INTERFACE_LAYER -I/var/lib/dkms/nvidia-current/535.216.01/build/nvidia -DNVIDIA_UNDEF_
    LEGACY_BIT_MACROS -UDEBUG -U_DEBUG -DNDEBUG -DMODULE -DKBUILD_BASENAME='"nv_i2c"' -DKBUILD_MODNAME='"nvidia"' -D__KBUILD_MODNAME=kmod_nvidia -c -o /var/lib/dkms/nvidia-current/535.216.01/build/nvidia/nv-i2c.o /var/lib/dkms/nvidia-current/535.216.01/
    build/nvidia/nv-i2c.c ; ./tools/objtool/objtool --hacks=jump_label --hacks=noinstr --orc --retpoline --rethunk --static-call --uaccess --module /var/lib/dkms/nvidia-current/535.216.01/build/nvidia/nv-i2c.o
    In file included from /var/lib/dkms/nvidia-current/535.216.01/build/common/inc/conftest.h:28,
    from /var/lib/dkms/nvidia-current/535.216.01/build/common/inc/nv_stdarg.h:29,
    from /var/lib/dkms/nvidia-current/535.216.01/build/common/inc/os-interface.h:40,
    from /var/lib/dkms/nvidia-current/535.216.01/build/nvidia/nv-cray.c:26:
    /var/lib/dkms/nvidia-current/535.216.01/build/conftest/functions.h:74:2: error: #error dma_buf_export() conftest failed!
    74 | #error dma_buf_export() conftest failed!
    | ^~~~~ /var/lib/dkms/nvidia-current/535.216.01/build/conftest/functions.h:87:2: error: #error wait_on_bit_lock() conftest failed!
    87 | #error wait_on_bit_lock() conftest failed!
    | ^~~~~ /var/lib/dkms/nvidia-current/535.216.01/build/conftest/functions.h:90:2: error: #error radix_tree_replace_slot() conftest failed!
    90 | #error radix_tree_replace_slot() conftest failed!
    | ^~~~~


    There are three errors here that you can search the Internet for
    solutions to. With any luck, finding the solution for one of them
    will help you fix the other two.

    My first few search results included:


    <https://github.com/NVIDIA/open-gpu-kernel-modules/issues/468>
    Failure to build kernel modules after 530.41.03 #468
     <https://forums.developer.nvidia.com/t/conftest-failed-error-while-trying-to-install-nvidia-455-450-driver-in-debian/165007>
    Conftest failed error while trying to install nvidia-455 / 450 driver in debian

    <https://bbs.archlinux.org/viewtopic.php?id=295666>
    Nvidia driver does not compile with latest kernel [Solved]


    Your own search results may vary. Applicability of answers in any of
    these web pages is not guaranteed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Schmitt@21:1/5 to Greg Wooledge on Thu Jan 2 21:40:01 2025
    Hi,

    On Thu, Jan 02, 2025 at 20:12:09 +0100, Istvan Toth wrote:
    I am attaching the nvidia make.log.I made the fah79 image deb package myself. It also included - linux-headers-6.1.119-fah79_6.1.119-1_amd64.deb

    Greg Wooledge wrote:
    The log is quite large. I'm surprised the mailing list actually allowed this.

    I did not get that mail of 20:12:09 +0100 by Istvan Toth.
    The thread in the archive
    https://lists.debian.org/debian-user/2025/01/threads.html
    currently shows a hole before your mail of 14:45:14 -0500.

    The answers to my mails from Istvan Toth are Cc'ed to my address.
    Possibly you got such a Cc, too.


    Istvan Toth wrote:
    every time it's about newly built packages, because the .config file that I usually consider to be correct has not been able to compile properly.

    It would be interesting to know whether a .deb package which causes a
    good long-running "dpkg -i" does so reproducibly several times.
    That way you would learn that it is a problem at "make dep-pkg" time
    (if working once means working always) or a problem at "dpkg -i"-time
    (if working once is followed by failure next time).

    (Don't forget to record the output of you "make dep-pkg" runs in a file
    and to store one which causes immediate "dpkg -i"-failure and one which
    causes "dpkg -i" success at least once. Differences between those might
    give a clue why the nvidia code fails in conftest/functions.h.)


    Have a nice day :)

    Thomas

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)