• Re: [gentoo-user] Zombie Linux kernel

    From Grant Taylor@21:1/5 to gevisz on Thu Jan 30 22:40:01 2025
    On 1/30/25 11:49 AM, gevisz wrote:
    I have not updated my Gentoo system since May 31, 2024, so in the
    middle of October 2024 I had to install it anew.

    Why did you have to install it anew?

    I've pulled Gentoo systems more than three years forward. I've talked
    about how to do it on this mailing list in the past, including the
    scripts that I was using at the time.

    I posted about pulling a system last updated in March '21 to current
    last November.

    https://oldbytes.space/@drscriptt/113460218243172979

    In short, switch to a git based Gentoo repo and do a bunch of tiny jumps
    from the last update you did through current. It takes time and does a
    LOT of compilation. But it works.

    Just to remind you: during that time we all had to switch to the new
    Gentoo profile scheme, which made an update from my old system more
    difficult than a new install.

    I'm not able to log in and check the profile the system is on. But I
    feel like I could have handled the profile change in December '24 just
    like I could have when the profile change came out. The repo just
    needed to have the branch checked out that was shortly after when the
    profile came out.

    On October 26, I compiled a new Linux kernel. It had version 6.6.52
    and worked quite well.

    However, with time it disappeared from the Gentoo portage tree. So,
    11 days ago I compiled kernel version 6.6.62 and successfully booted
    my Gentoo system with it over the next 9 days.

    The old kernel of version 6.6.52 was deleted from the /boot directory
    just after compilation of kernel version 6.6.62 just because it could
    not support my home ZFS disks any more (because zfs-kmod should be
    compiled against the specific kernel version and would not work with
    another one).

    Um ... kernel modules are kernel version dependent. You should have had
    two different kernel modules, one for each kernel.

    /lib/modules/6.6.52.../zfs.ko
    /lib/modules/6.6.62.../zfs.ko

    But yesterday, after booting my Gentoo system, the uname -a command
    reported that I have been booted with the deleted old kernel version
    6.6.52 compiled on October 26, 2024! And, of course, it did not
    mount my ZFS /home.

    So the kernel is still there.

    Did you manually delete the zfs / spl kernel modules?

    I've had various other types of things break zfs / spl kernel module compatibility even in the same version. I'd have to look at my motes of
    what I do to get the module to work again. I think it usually involves
    a round of make bzImage modules modules_prepare, and re-emerging the zfs kmodule. But I may be mis-remembering.

    I have double checked everything: the old kernel of version 6.6.52
    together with its initramfs have been deleted from the /boot directory. Moreover, just a day before I deleted /usr/lib/modules/6.6.52-gentoo/ directory.

    initramfs complicate things and I tend to not use them. They have their
    own modules and other things in them. They are usually based off of the current system but can get out of sync relatively easily. So you could
    have modules coming from the initramfs and not the root file system.

    I tried to reboot and found out that GRUB menu had only an option of
    loading the old kernel of version 6.6.52.

    Did you try going to the GRUB command line and modifying it to match
    what should have been on the system?

    I've found that contemporary GRUB has tab completion and is generally
    nicer to work with than older Korn shells.

    However, I soon understood that the latter was because I have attached additional HDD before booting my Gentoo system, and as a result of
    this the system decided to boot from another HDD where I have not
    installed a new GRUB file. So, I fixed it and my Gentoo system was
    finally able to boot with the new kernel of version 6.6.62.

    Ya, device ordering can be a problem. That's one of the main reasons
    that I like UUIDs to identify file systems et al. If you can get the
    boot loader to start off of the correct disk, you're usually in place
    that you can pull yourself up by your boot straps.

    But the mystery of loading my Gentoo system with deleted kernel and
    deleted modules remains. How could that happen at all?

    The kernel and modules are somewhere that GRUB found them lest it
    wouldn't have booted them.

    My only explanation is that XFS actually had not deleted the old kernel
    and the modules directory but only marked them as such. So, the old
    GRUB file could load them even when they had been marked as deleted.
    But is this explanation actually correct?

    I don't know that it's not correct, but I am very suspicious of it. I
    would expect that GRUB's support for XFS would look at files that aren't
    marked for deletion.

    There are a number of things that come to mind that are pure speculation
    if you're no longer in that configuration to be able to look. Not the
    least of which is files in file systems on different devices; e.g. /boot
    vs /. Have a file in the /boot directory on the / (root) file system
    and it will be covered up when the /boot file system is mounted. Or
    something like that.



    --
    Grant. . . .

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to gevisz on Fri Jan 31 01:30:01 2025
    On 1/30/25 16:49, gevisz wrote:
    Because, as I wrote it, it was easier than to try to change a profile
    with the procedure described in the corresponding news.

    I take that as you chose to do the fresh install, not that something
    forced you to do the fresh install. It's perfectly fine if that's the
    choice you made.

    Moreover, it was written in the same news that the result is not
    guaranteed. So, why to do a lot of compilation just to try if it
    will work, while I did knew that starting from scratch I will get
    the guaranteed result with less compilation.

    Each chooses their own. ;-)

    Well, maybe you are right. I did (NOT) know this and so deleted the old kernel immediately after compiling the new one.

    Now you know for the next time. :-)

    Yes, I manually deleted the /usr/lib/modules/6.6.52-gentoo/ directory
    just before the last shutdown before this happened. Moreover,
    at the same time I manually deleted everything from directory /usr/src/linux-6.6.52-gentoo/ except for the .config file.

    I tend to keep my current and previous kernel source tree and module
    tree around for these very types of reasons. At least if I'm not tight
    on disk space.

    So far, I had no problems with incompatibility of the kernel and the
    ZFS kernel module. But I always followed the instruction to recompile zfs-kmod after compiling the kernel.

    Usually, the incompatibility is when I've made a big change in the
    kernel that can fundamentally alter things. E.g. enabled / disabled an
    entire sub-system or made changes to any security related settings.
    That is usually the size of the change that needs to be made in the
    kernel config to render the existing ZFS / SPL module incompatible for
    me. Smaller changes usually don't impact ZFS / SPL.

    I do not know if I needed it taking into account that I have
    compiled the XFS module into the kernel. But I have created initramfs-6.6.62-gentoo.img file with the command genkernel --install initramfs and I hope that the genkernel was wise enough to create it
    based on the /usr/lib/modules/6.6.62-gentoo/ directory and not the /usr/lib/modules/6.6.52-gentoo/ directory that still was present at
    that time.

    I don't use initramfs so I can't say. I try to keep what I need to boot
    and come up into single user mode in the kernel. But I've run into this
    type of discrepancy on other people's systems.

    Maybe, the file /boot/vmlinuz-6.6.52-gentoo also was present at the
    time of creating initramfs-6.6.62-gentoo.img but I definitely deleted /boot/vmlinuz-6.6.52-gentoo just after that.

    No, I never did it in my life even with online documentation present,
    so I even have not tried to do it with just a GRUB command line
    before me.

    Unexpectedly getting dropped at a grub prompt when booting is never fun.
    Some consider it a right of passage, sort of like accidentally
    rebooting the wrong system, or causing data loss in production.

    All that I managed to do is just to recall the commands
    grub-mkconfig -o /boot/grub/grub.cfg
    and
    grub-install /dev/sdc
    and executed them.

    But, as far as I know, the UUID does not help to identify the disk,
    from which the system starts as it is managed by BIOS.

    Correct.

    BIOS based booting is ... or can be annoyingly complicated.

    My alternative explanation was that the uname -a command was actually accessing not the current kernel but the GRUB logs.

    My understanding is that the kernel part of the uname command's output
    comes from the running kernel, not something on disk.

    Currently, with my new kernel loaded, lsmod does not report XFS kernel
    module being loaded into the memory.

    I would expect that XFS wouldn't show in the lsmod output if XFS is
    complied into the kernel.

    Remember, lsmod shows modules dynamically loaded into the running
    kernel, not things complied in.



    --
    Grant. . .

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)