Doesn't anyone build their own kernel anymore?No.
Is the average GNU/Linux user just a distro slave with no
technical competence or curiosity?
Doesn't anyone build their own kernel anymore?
I have a motherboard that contains on onboard sound chip
that the manual specifies as:
Realtek ALC1220-VB
The command "lspci -vv", using a live distro, also reports:
Intel Comet Lake PCH cAVS Audio
...
Kernel modules: snd_hda_intel, snd_soc_skl, snd_sof_pci_intel_cnl
This device should function with Alsa but absolutely
nowhere can I locate the kernel configuration parameters
for this device. The latest kernels do not seem to possess
the above modules.
Internet forums report nothing but the mainstream distros
which include every possible module in one giant, bloated mess.
No remedy is to be found.
Doesn't anyone build their own kernel anymore?
Is the average GNU/Linux user just a distro slave with no technical competence or curiosity?
I have a motherboard that contains on onboard sound chip
that the manual specifies as:
Realtek ALC1220-VB
The command "lspci -vv", using a live distro, also reports:
Intel Comet Lake PCH cAVS Audio
...
Kernel modules: snd_hda_intel, snd_soc_skl, snd_sof_pci_intel_cnl
This device should function with Alsa but absolutely
nowhere can I locate the kernel configuration parameters
for this device. The latest kernels do not seem to possess
the above modules.
Internet forums report nothing but the mainstream distros
which include every possible module in one giant, bloated mess.
No remedy is to be found.
Doesn't anyone build their own kernel anymore? Is the
average GNU/Linux user just a distro slave with no
technical competence or curiosity?
Fortunately I have a PCIe soundcard (SB_Audigy_5RX) for which
I was able to locate the exact kernel configuration parameters.
But it was not an easy task. Such information is very sparse.
GNU/Linux was begun as a project for the technical elite, but
now it seems that is has devolved into a definite idiocracy.
Leroy H wrote:
Doesn't anyone build their own kernel anymore?
I haven't built built one since I stopped needing to build one for
Xen reasons, or Intel wifi driver reasons.
On Mon, 11 May 2026 20:12:03 +0100, Andy Burns wrote:
Leroy H wrote:
Doesn't anyone build their own kernel anymore?
I haven't built built one since I stopped needing to build one for
Xen reasons, or Intel wifi driver reasons.
DoesnrCOt anybody order from a menu at a restaurant any more?
Is that proof of some kind of techno-smarts? Because building a Linux
kernel is just the same: making selections from a menu. Only the
finished product arrives a bit faster.
This device should function with Alsa but absolutely nowhere can I
locate the kernel configuration parameters for this device. The latest kernels do not seem to possess the above modules.
On Mon, 11 May 2026 19:00:12 +0000, Leroy H wrote:
This device should function with Alsa but absolutely nowhere can I
locate the kernel configuration parameters for this device. The latest
kernels do not seem to possess the above modules.
Sounds like Pipewire got you by the balls.
This device should function with Alsa but absolutely
nowhere can I locate the kernel configuration parameters
for this device. The latest kernels do not seem to possess
the above modules.
This device should function with Alsa but absolutely
nowhere can I locate the kernel configuration parameters
for this device.-a The latest kernels do not seem to possess
the above modules.
Some configuration entries are hidden beneath several levels of nested unchecked checkboxes, and the way to unlock them can be non-obvious at
times (e.g. how will you know that you need I2C drivers to drive your computer's SMBus controller?).
There is a search function in menuconfig/nconfig to help find exactly
what you want and how to unlock it, but it isn't very easy to use.
Though far easier said than done (I doubt if anyone will be even willing
to do this), this could make life easier to make a kernel tailored to
a specific computer:
-a - A program that sets Kconfig options based on what PCI devices
-a-a-a are on your computer;
-a - An index mapping PCI IDs, USB IDs, etc. to Kconfig options;
-a - Tools for compiling and maintaining said index.
On 5/11/26 18:47, Lawrence DrCOOliveiro wrote:
On Mon, 11 May 2026 20:12:03 +0100, Andy Burns wrote:
Leroy H wrote:
Doesn't anyone build their own kernel anymore?
NO. It shouldn't BE like that anymore !!!
I haven't built built one since I stopped needing to build one for
Xen reasons, or Intel wifi driver reasons.
DoesnrCOt anybody order from a menu at a restaurant any more?
You might have to TALK to someone !!!
Is that proof of some kind of techno-smarts? Because building a Linux
kernel is just the same: making selections from a menu. Only the
finished product arrives a bit faster.
Linux kernels have become WAY too big broad and deep
at this point. 99.999% are NOT gonna make custom-built
versions.
Look, great if you CAN or WANT TO ... but after so
many years most of us simply want Something That Just
Works.
Hopefully, yes.
You remind me of the society I once joined ...
"We want to be much bigger"
"Well go online, get lots more members and stop behaving like an elite little club"
"But we want to stay an elite little club"
...I left...
I used to build my own kernels, but this was back in the days of
systems with 512MB of total RAM and i586 class CPU's running at
150-200Mhz. Back in those days it /felt/ like there was a speedup by building the kernel against the specific CPU one had in one's box, and
with the 512MB RAM days, there *was* a benefit of stripping out drivers
that one did not use, as a smaller kernel left more RAM available for
one's applications.
But somewhere along the path from i586 class CPU's to i3/i5/i7 class
CPU's, and 512MB ram to 24G RAM it became a case where the performance increase from "build with CPU optimizations for a speific CPU" and
"Kernel is smaller" was no longer noticable. The standard kernel
Slackware built felt just as fast as the custom built kernel, and the difference in memory usage was a tiny sliver of the total, so it wasn't worth it to continue. So somewhere around the "Pentium 3" to Core era
I quit compiling custom kernels for myself as I no longer felt like I
could detect the benefits from doing so.
c186282 <c186282@nnada.net> wrote:
On 5/11/26 18:47, Lawrence DrCOOliveiro wrote:
On Mon, 11 May 2026 20:12:03 +0100, Andy Burns wrote:
I used to build my own kernels, but this was back in the days of
systems with 512MB of total RAM and i586 class CPU's running at
150-200Mhz. Back in those days it /felt/ like there was a speedup by building the kernel against the specific CPU one had in one's box, and
with the 512MB RAM days, there *was* a benefit of stripping out drivers
that one did not use, as a smaller kernel left more RAM available for
one's applications.
But somewhere along the path from i586 class CPU's to i3/i5/i7 class
CPU's, and 512MB ram to 24G RAM it became a case where the performance increase from "build with CPU optimizations for a speific CPU" and
"Kernel is smaller" was no longer noticable. The standard kernel
Slackware built felt just as fast as the custom built kernel, and the difference in memory usage was a tiny sliver of the total, so it wasn't
worth it to continue. So somewhere around the "Pentium 3" to Core era
I quit compiling custom kernels for myself as I no longer felt like I
could detect the benefits from doing so.
Rich <rich@example.invalid> writes:
I used to build my own kernels, but this was back in the days of
systems with 512MB of total RAM and i586 class CPU's running at
150-200Mhz. Back in those days it /felt/ like there was a speedup
by building the kernel against the specific CPU one had in one's
box, and with the 512MB RAM days, there *was* a benefit of stripping
out drivers that one did not use, as a smaller kernel left more RAM
available for one's applications.
But somewhere along the path from i586 class CPU's to i3/i5/i7 class
CPU's, and 512MB ram to 24G RAM it became a case where the
performance increase from "build with CPU optimizations for a
speific CPU" and "Kernel is smaller" was no longer noticable. The
standard kernel Slackware built felt just as fast as the custom
built kernel, and the difference in memory usage was a tiny sliver
of the total, so it wasn't worth it to continue. So somewhere
around the "Pentium 3" to Core era I quit compiling custom kernels
for myself as I no longer felt like I could detect the benefits from
doing so.
ItrCOs fairly straightforward today to build performance critical code multiple times for different targets and select the best one at runtime.
This includes using intrinsics, vector instructions or assembler
selected for particular classes of CPU, as well as just giving the
compiler a narrower target and letting it get on with it.
I donrCOt know how much the Linux kernel does that, offhand, but as a general statement, there are better options today than expecting end
users to rebuild from source for their specific CPU.
I used to build my own kernels, but this was back in the days of systems
with 512MB of total RAM and i586 class CPU's running at 150-200Mhz.
Back in those days it /felt/ like there was a speedup by building the
kernel against the specific CPU one had in one's box, and with the 512MB
RAM days, there *was* a benefit of stripping out drivers that one did
not use,
as a smaller kernel left more RAM available for one's applications.
Usually PiperWire "Just Works".
On Tue, 12 May 2026 00:39:53 -0400, c186282 wrote:
Usually PiperWire "Just Works".
After one Ubuntu update the 3.5mm jack I was using for my speakers stoped working and I would only the the Default (non) Selection. I found out more about pipewire, pulseaudio, wireplumber, and friends then I ever wanted to know.
Solution: my Bluetooth earbuds worked so I bought LogiTec Bluetooth
speakers. Problem solved.
And, in reality, for most of my system(s) uses, they sit waiting on me
rather than the other way around. And for when I do run something that
takes more time than I want to wait, I just toss it in the background
and go on about my business, the only actual noticable event being the
fact that the cooling fan in the tower speeds up a bit as the CPU warms
up from the load.
Richard Kettlewell <invalid@invalid.invalid> wrote:
Rich <rich@example.invalid> writes:
But the reality is that the increase was, even back in the days of an
i386 at 33Mhz, only a few percentage points anyway. And while 2%
better performance at an i7's performance is more raw increase than 2%
of an i386/33Mhz, it 'felt' like it was a bigger increase on the i386.
The i7's are just so blindingly fast that the standard distro kernel
and a custom built one just feel like they perform identically.
And, in reality, for most of my system(s) uses, they sit waiting on me
rather than the other way around. And for when I do run something that
takes more time than I want to wait, I just toss it in the background
and go on about my business, the only actual noticable event being the
fact that the cooling fan in the tower speeds up a bit as the CPU warms
up from the load.
On Mon, 11 May 2026 20:10:53 +0100, The Natural Philosopher wrote:
Hopefully, yes.
You remind me of the society I once joined ...
"We want to be much bigger"
"Well go online, get lots more members and stop behaving like an elite
little club"
"But we want to stay an elite little club"
...I left...
Did they feel threatened in their status as 1337 hax0r too?
I don't know the exact point but a build of our source code was something that you fired off and went to lunch and hoped it was done and didn't
error out by the time you got back. The machine wasn't good for anything
else as it cranked away.
Then it gt to a point where it took a few minutes and you could watch cat videos while you waited. It didn't improve too much after that. The
company was frugal so we never got monster machines but I'm not sure that would have made a huge difference. It might have for reprojecting massive raster files. That still took hours.
I can not think now of a task that puts the kernel CPU load at 100%.
On Tue, 12 May 2026 22:00:26 -0000 (UTC), Rich wrote:
And, in reality, for most of my system(s) uses, they sit waiting on me
rather than the other way around. And for when I do run something that
takes more time than I want to wait, I just toss it in the background
and go on about my business, the only actual noticable event being the
fact that the cooling fan in the tower speeds up a bit as the CPU warms
up from the load.
I don't know the exact point but a build of our source code was something that you fired off and went to lunch and hoped it was done and didn't
error out by the time you got back. The machine wasn't good for anything else as it cranked away.
On 2026-05-13 00:00, Rich wrote:
Richard Kettlewell <invalid@invalid.invalid> wrote:
Rich <rich@example.invalid> writes:
But the reality is that the increase was, even back in the days of an
i386 at 33Mhz, only a few percentage points anyway. And while 2%
better performance at an i7's performance is more raw increase than 2%
of an i386/33Mhz, it 'felt' like it was a bigger increase on the i386.
The i7's are just so blindingly fast that the standard distro kernel
and a custom built one just feel like they perform identically.
And, in reality, for most of my system(s) uses, they sit waiting on me
rather than the other way around. And for when I do run something that
takes more time than I want to wait, I just toss it in the background
and go on about my business, the only actual noticable event being the
fact that the cooling fan in the tower speeds up a bit as the CPU warms
up from the load.
If you do some task that is heavy on cpu, say, recoding a video with
ffmpeg, then it would be ffmpeg that would need rebuilding, not the
kernel. I can not think now of a task that puts the kernel CPU load at 100%.
On 13/05/2026 11:56, Carlos E.R. wrote:
I can not think now of a task that puts the kernel CPU load at 100%.
I can. Locks the machine up every time.
Of course is rogue code, but it exists
On Tue, 12 May 2026 13:52:51 -0000 (UTC), Rich wrote:
I used to build my own kernels, but this was back in the days of systems
with 512MB of total RAM and i586 class CPU's running at 150-200Mhz.
Back in those days it /felt/ like there was a speedup by building the
kernel against the specific CPU one had in one's box, and with the 512MB
RAM days, there *was* a benefit of stripping out drivers that one did
not use,
as a smaller kernel left more RAM available for one's applications.
It's been a long, long time but iirc you could also screw yourself royally if you said 'I don't need that' for some obscure feature that you
definitely did need.
But cross compiling for a Pi Pico apart from the very first build is
fairly instant
The first time I made this mistake it also taught me to *never*
overwrite my existing, working, lilo boot entry for the current kernel
and instead to install the new kernel as a second entry first. Then if
the new kernel panicked because I forgot to turn something on, I had a
way out other than "reinstall from 30+ floppy disks".
On Wed, 13 May 2026 14:56:02 -0000 (UTC), Rich wrote:
The first time I made this mistake it also taught me to *never*
overwrite my existing, working, lilo boot entry for the current kernel
and instead to install the new kernel as a second entry first. Then if
the new kernel panicked because I forgot to turn something on, I had a
way out other than "reinstall from 30+ floppy disks".
That was my initial Linux experience. Download a few boxes worth of
floppies with the Slackware stuff over dialup and carefully assemble the mess. iirc my first pass didn't have gcc; back to Slackware for more
floppy images.
On Wed, 13 May 2026 14:56:02 -0000 (UTC), Rich wrote:
The first time I made this mistake it also taught me to *never*
overwrite my existing, working, lilo boot entry for the current
kernel and instead to install the new kernel as a second entry
first. Then if the new kernel panicked because I forgot to turn
something on, I had a way out other than "reinstall from 30+ floppy
disks".
That was my initial Linux experience. Download a few boxes worth of floppies with the Slackware stuff over dialup and carefully assemble
the mess. iirc my first pass didn't have gcc; back to Slackware for
more floppy images.
On 13/05/2026 18:45, rbowman wrote:
On Wed, 13 May 2026 14:56:02 -0000 (UTC), Rich wrote:I THINK my first personal install was red hat from a CD...
The first time I made this mistake it also taught me to *never*
overwrite my existing, working, lilo boot entry for the current
kernel and instead to install the new kernel as a second entry
first. Then if the new kernel panicked because I forgot to turn
something on, I had a way out other than "reinstall from 30+ floppy
disks".
That was my initial Linux experience. Download a few boxes worth of
floppies with the Slackware stuff over dialup and carefully assemble
the mess. iirc my first pass didn't have gcc; back to Slackware for
more floppy images.
Later I mostly burned DVDs and then switched to a USB drive
The Natural Philosopher <tnp@invalid.invalid> wrote:
On 13/05/2026 11:56, Carlos E.R. wrote:
I can not think now of a task that puts the kernel CPU load at 100%.
I can. Locks the machine up every time.
Of course is rogue code, but it exists
I have several I perform on a regular basis:
video transcodes;
compression of data files (jpegxl, if one turns up the compression
efforts, will saturate the CPU at 100% (or 8x100% for the points where
it parallizes the compression run)
Carlos E.R. <robin_listas@es.invalid> wrote:
On 2026-05-13 00:00, Rich wrote:
Richard Kettlewell <invalid@invalid.invalid> wrote:
Rich <rich@example.invalid> writes:
But the reality is that the increase was, even back in the days of an
i386 at 33Mhz, only a few percentage points anyway. And while 2%
better performance at an i7's performance is more raw increase than 2%
of an i386/33Mhz, it 'felt' like it was a bigger increase on the i386.
The i7's are just so blindingly fast that the standard distro kernel
and a custom built one just feel like they perform identically.
And, in reality, for most of my system(s) uses, they sit waiting on me
rather than the other way around. And for when I do run something that
takes more time than I want to wait, I just toss it in the background
and go on about my business, the only actual noticable event being the
fact that the cooling fan in the tower speeds up a bit as the CPU warms
up from the load.
If you do some task that is heavy on cpu, say, recoding a video with
ffmpeg, then it would be ffmpeg that would need rebuilding, not the
kernel. I can not think now of a task that puts the kernel CPU load at 100%.
Yes, apply the "careful optimizing" to the programs that actually
benefit. But, as pointed out by R.K., ffmpeg is one of the
applications that compiles with the "include the various optimizations, select the appropriate code path at runtime based upon CPU executing
the process" so it is already mostly ready to take advantage of what it
finds in the CPU it is using at the time. Meaning I can just let
Slackware install the standard package and not feel any great need to "recomple" it to gain a noticable speedup.
The benefits of "custom optimized compiles for your specific setup",
while still technically present, have narrowed vs. the "generic distro version" that there is often not enough actual gain to be had to
justify the effort of "recompiling a specific version" anymore.
The Natural Philosopher <tnp@invalid.invalid> wrote:
On 13/05/2026 18:45, rbowman wrote:
On Wed, 13 May 2026 14:56:02 -0000 (UTC), Rich wrote:I THINK my first personal install was red hat from a CD...
The first time I made this mistake it also taught me to *never*
overwrite my existing, working, lilo boot entry for the current
kernel and instead to install the new kernel as a second entry
first. Then if the new kernel panicked because I forgot to turn
something on, I had a way out other than "reinstall from 30+ floppy
disks".
That was my initial Linux experience. Download a few boxes worth of
floppies with the Slackware stuff over dialup and carefully assemble
the mess. iirc my first pass didn't have gcc; back to Slackware for
more floppy images.
Later I mostly burned DVDs and then switched to a USB drive
You missed out on all the *fun* of downloading 30+ floppies, then
writing to 30+ floppies, then booting and installing from 30+ floppies,
only to find out something was wrong with the media on floppy # 29 (it
was always the last, or one just a step or two before last that had a
media issue) such that the install failed, and you had to start over
again after writing #29 (or whichever) to a new disk (and hoping it or
one of the others didn't now fail too).
The Natural Philosopher <tnp@invalid.invalid> wrote:
On 13/05/2026 18:45, rbowman wrote:
On Wed, 13 May 2026 14:56:02 -0000 (UTC), Rich wrote:
The first time I made this mistake it also taught me to *never*
overwrite my existing, working, lilo boot entry for the current
kernel and instead to install the new kernel as a second entry
first. Then if the new kernel panicked because I forgot to turn
something on, I had a way out other than "reinstall from 30+ floppy
disks".
That was my initial Linux experience. Download a few boxes worth of
floppies with the Slackware stuff over dialup and carefully assemble
the mess. iirc my first pass didn't have gcc; back to Slackware for
more floppy images.
I THINK my first personal install was red hat from a CD...
Later I mostly burned DVDs and then switched to a USB drive
You missed out on all the *fun* of downloading 30+ floppies, then
writing to 30+ floppies, then booting and installing from 30+ floppies,
only to find out something was wrong with the media on floppy # 29 (it
was always the last, or one just a step or two before last that had a
media issue) such that the install failed, and you had to start over
again after writing #29 (or whichever) to a new disk (and hoping it or
one of the others didn't now fail too).
Carlos E.R. <robin_listas@es.invalid> wrote:
On 2026-05-13 00:00, Rich wrote:
If you do some task that is heavy on cpu, say, recoding a video with
ffmpeg, then it would be ffmpeg that would need rebuilding, not the
kernel. I can not think now of a task that puts the kernel CPU load at 100%.
Yes, apply the "careful optimizing" to the programs that actually
benefit. But, as pointed out by R.K., ffmpeg is one of the
applications that compiles with the "include the various optimizations, select the appropriate code path at runtime based upon CPU executing
the process" so it is already mostly ready to take advantage of what it finds in the CPU it is using at the time.
The benefits of "custom optimized compiles for your specific setup",
while still technically present, have narrowed vs. the "generic distro version" that there is often not enough actual gain to be had to
justify the effort of "recompiling a specific version" anymore.
I can not think now of a task that puts the kernel CPU load at 100%.
Rich <rich@example.invalid> wrote:
Carlos E.R. <robin_listas@es.invalid> wrote:
If you do some task that is heavy on cpu, say, recoding a video with
ffmpeg, then it would be ffmpeg that would need rebuilding, not the
kernel. I can not think now of a task that puts the kernel CPU load
at 100%.
Yes, apply the "careful optimizing" to the programs that actually
benefit. But, as pointed out by R.K., ffmpeg is one of the
applications that compiles with the "include the various optimizations,
select the appropriate code path at runtime based upon CPU executing
the process" so it is already mostly ready to take advantage of what it
finds in the CPU it is using at the time.
Huh, I didn't know that. I couldn't find documentation about it,
but indeed the assembly functions like in
libavfilter/x86/af_anlmdn.asm are called conditional on a macro
that seems to check CPU capabilities set by functions in
libavutil/cpu.c at run-time.
eg. libavfilter/x86/af_anlmdn_init.c
30 int cpu_flags = av_get_cpu_flags();
31
32 if (EXTERNAL_SSE(cpu_flags)) {
33 s->compute_distance_ssd = ff_compute_distance_ssd_sse;
34 }
With related files:
libavutil/cpu.h
libavutil/cpu.c
libavutil/cpu_internal.h
libavutil/x86/cpu.h
libavutil/x86/cpu.c
Most specifically ff_get_cpu_flags_x86() in libavutil/x86/cpu.c
does the actual capabilities checking for that.
Is that "fairly straightforward today" like RK said? I'd say
writing optimised code in assembly and decoding CPU-specific
capabilities identifiers to decide at run-time whether it can be
used is really pretty full-on compared to just letting the compiler
build pure C code optimised for one specific CPU.
Carlos E.R. wrote:
I can not think now of a task that puts the kernel CPU load at 100%.
If you actually mean 100|u$(nproc)%, I can think of several ways to do
it, video encoding and CG rendering being two obvious ones.
Another one is doing some large build with rCLmake -jrCY, and forgetting
to specify a process limit. ;)
On 2026-05-13 16:52, Rich wrote:
The Natural Philosopher <tnp@invalid.invalid> wrote:
On 13/05/2026 11:56, Carlos E.R. wrote:
I can not think now of a task that puts the kernel CPU load at 100%.
I can. Locks the machine up every time.
Of course is rogue code, but it exists
Can you think of a proper task that loads the kernel to 100%? Something
one can do once a month normally? Something that makes optimizing the
kernel worthwhile?
I have several I perform on a regular basis:
video transcodes;
compression of data files (jpegxl, if one turns up the compression
efforts, will saturate the CPU at 100% (or 8x100% for the points where
it parallizes the compression run)
...
But none of those put the kernel at 100%. They are userland.
Rich <rich@example.invalid> wrote:
Carlos E.R. <robin_listas@es.invalid> wrote:
On 2026-05-13 00:00, Rich wrote:
If you do some task that is heavy on cpu, say, recoding a video with
ffmpeg, then it would be ffmpeg that would need rebuilding, not the
kernel. I can not think now of a task that puts the kernel CPU load at 100%.
Yes, apply the "careful optimizing" to the programs that actually
benefit. But, as pointed out by R.K., ffmpeg is one of the
applications that compiles with the "include the various optimizations,
select the appropriate code path at runtime based upon CPU executing
the process" so it is already mostly ready to take advantage of what it
finds in the CPU it is using at the time.
Huh, I didn't know that. I couldn't find documentation about it,
but indeed the assembly functions like in
libavfilter/x86/af_anlmdn.asm are called conditional on a macro
that seems to check CPU capabilities set by functions in
libavutil/cpu.c at run-time.
eg. libavfilter/x86/af_anlmdn_init.c
30 int cpu_flags = av_get_cpu_flags();
31
32 if (EXTERNAL_SSE(cpu_flags)) {
33 s->compute_distance_ssd = ff_compute_distance_ssd_sse;
34 }
With related files:
libavutil/cpu.h
libavutil/cpu.c
libavutil/cpu_internal.h
libavutil/x86/cpu.h
libavutil/x86/cpu.c
Most specifically ff_get_cpu_flags_x86() in libavutil/x86/cpu.c
does the actual capabilities checking for that.
Is that "fairly straightforward today" like RK said?
I'd say
writing optimised code in assembly and decoding CPU-specific
capabilities identifiers to decide at run-time whether it can be
used is really pretty full-on compared to just letting the compiler
build pure C code optimised for one specific CPU. Also not much
easier today than any time since the CPUID instruction was
introduced for x86 with the Pentium 1. Did he mean some programming
languages or libraries have a better way than that now?
Heck it looks like it's actually got very complicated over the
years, FFmpeg have had to deal with stuff like this:
libavutil/x86/cpu.c
186 if (!strncmp(vendor.c, "AuthenticAMD", 12)) {
187 /* Allow for selectively disabling SSE2 functions on AMD processors
188 with SSE2 support but not SSE4a. This includes Athlon64, some
189 Opteron, and some Sempron processors. MMX, SSE, or 3DNow! are faster
190 than SSE2 often enough to utilize this special-case flag.
191 AV_CPU_FLAG_SSE2 and AV_CPU_FLAG_SSE2SLOW are both set in this case
192 so that SSE2 is used unless explicitly disabled by checking
193 AV_CPU_FLAG_SSE2SLOW. */
194 if (rval & AV_CPU_FLAG_SSE2 && !(ecx & 0x00000040))
195 rval |= AV_CPU_FLAG_SSE2SLOW;
Eight confusing special-cases like that just for x86.
But anyway, good to see someone bothered to do that work for
FFmpeg. I'd have assumed I would benefit significantly from
compiling FFmpeg myself if I had fancy hardware and needed the
performance (shouldn't they mention this in INSTALL.md?). That
might still depend on whether most of the possible manual
optimisation work has indeed been done for the architecture you're
using (impossible to tell without benchmarking?).
The benefits of "custom optimized compiles for your specific setup",
while still technically present, have narrowed vs. the "generic distro
version" that there is often not enough actual gain to be had to
justify the effort of "recompiling a specific version" anymore.
Yet many distros themselves are choosing to drop support for eg.
x86_64-v1 (and later) so they can compile software better optimised
for newer CPUs.
On our other target host platform however the vendorrCOs 64-bit compiler
does not support inline assembler at all. If I want to do the same optimizations there then it will be necessary to either write whole
functions in assembler, or persuade my colleagues that migrating to
clang-cl is a fine plan.
For an example of getting the compiler to build something multiple times
for different targets, so that an executable can select the best one at runtime, see https://godbolt.org/z/PP1Mc3hfj. Other strategies are
possible, maybe better or worse depending on what yourCOre doing.
On 14/05/2026 10:00, Richard Kettlewell wrote:
On our other target host platform however the vendorrCOs 64-bit
compiler does not support inline assembler at all. If I want to do
the same optimizations there then it will be necessary to either
write whole functions in assembler, or persuade my colleagues that
migrating to clang-cl is a fine plan.
Ah. I see what you mean.
Back in the day I would in that case write some assembler as a C
function and link that in...
Inline code was pretty much not possible unless you replaced large
chunks of C with assembler
The Natural Philosopher <tnp@invalid.invalid> writes:
On 14/05/2026 10:00, Richard Kettlewell wrote:
On our other target host platform however the vendorrCOs 64-bit
compiler does not support inline assembler at all. If I want to do
the same optimizations there then it will be necessary to either
write whole functions in assembler, or persuade my colleagues that
migrating to clang-cl is a fine plan.
Ah. I see what you mean.
Back in the day I would in that case write some assembler as a C
function and link that in...
Writing whole functions is still a good strategy for some use cases.
ItrCOd probably work well for the toy example from above, if you thought
you could do a better job than the compiler.
Inline code was pretty much not possible unless you replaced large
chunks of C with assembler
ThatrCOs basically the issue IrCOm faced with - if I canrCOt have inline assembler in the innermost loop of a nontrivial function then I either
have to write the whole function in assembler (at least three times - we
care about performance on AMD64, Power and Aarch64) or pay a call/return
cost every time I call it, sacrificing much of the improvement.
Lawrence DrCOOliveiro <ldo@nz.invalid> writes:
Carlos E.R. wrote:
I can not think now of a task that puts the kernel CPU load at 100%.
Perhaps an IPSec VPN endpoint with a very fast network interface and
lots of clients that (for some reason) use DES3-CBC as bulk cipher?
(DES3 because itrCOs not accelerated and CBC mode because of its
dependency speedbump.)
IrCOm not going to do the experiment...
If you actually mean 100|u$(nproc)%, I can think of several ways to do
it, video encoding and CG rendering being two obvious ones.
Another one is doing some large build with rCLmake -jrCY, and forgetting
to specify a process limit. ;)
Those are all user process load. The question was about kernel CPU load.
On 14/05/2026 11:25, Richard Kettlewell wrote:
The Natural Philosopher <tnp@invalid.invalid> writes:Exactly.
Inline code was pretty much not possible unless you replaced largeThatrCOs basically the issue IrCOm faced with - if I canrCOt have inline
chunks of C with assembler
assembler in the innermost loop of a nontrivial function then I either
have to write the whole function in assembler (at least three times - we
care about performance on AMD64, Power and Aarch64) or pay a call/return
cost every time I call it, sacrificing much of the improvement.
Or write the loop in assembler itself
I agree nothing is as easy as a #asm or whatever.
I am surprised you aren't using Gnu C?
The Natural Philosopher <tnp@invalid.invalid> writes:
On 14/05/2026 11:25, Richard Kettlewell wrote:
The Natural Philosopher <tnp@invalid.invalid> writes:Exactly.
Inline code was pretty much not possible unless you replaced largeThatrCOs basically the issue IrCOm faced with - if I canrCOt have inline >>> assembler in the innermost loop of a nontrivial function then I either
chunks of C with assembler
have to write the whole function in assembler (at least three times - we >>> care about performance on AMD64, Power and Aarch64) or pay a call/return >>> cost every time I call it, sacrificing much of the improvement.
Or write the loop in assembler itself
I agree nothing is as easy as a #asm or whatever.
I am surprised you aren't using Gnu C?
GCC and Clang for Linux, macOS and embedded targets, and MicrosoftrCOs compiler for Windows, and itrCOs only the latter that doesnrCOt support inline assembler.
For the most part, the embedded targets is where performance matters,
but thererCOs a few places where itrCOs relevant on the host platforms too, and the code IrCOm working on currently is one of those places.
On 14/05/2026 12:28, Richard Kettlewell wrote:
The Natural Philosopher <tnp@invalid.invalid> writes:
On 14/05/2026 11:25, Richard Kettlewell wrote:
The Natural Philosopher <tnp@invalid.invalid> writes:Exactly.
Inline code was pretty much not possible unless you replaced largeThatrCOs basically the issue IrCOm faced with - if I canrCOt have inline >>>> assembler in the innermost loop of a nontrivial function then I either >>>> have to write the whole function in assembler (at least three times - we >>>> care about performance on AMD64, Power and Aarch64) or pay a call/return >>>> cost every time I call it, sacrificing much of the improvement.
chunks of C with assembler
Or write the loop in assembler itself
I agree nothing is as easy as a #asm or whatever.
I am surprised you aren't using Gnu C?
GCC and Clang for Linux, macOS and embedded targets, and MicrosoftrCOs
compiler for Windows, and itrCOs only the latter that doesnrCOt support
inline assembler.
Do you mean the targets are windows, or the dev platform?
For the most part, the embedded targets is where performance matters,
but thererCOs a few places where itrCOs relevant on the host platforms
too, and the code IrCOm working on currently is one of those places.
I cant quite picture an arrangement that fits that description, but if
its proprietary, its not that important to me either :-)
Oh. wait. You have host user platforms running custom code over
e,g. windows that need to talk to networked embedded stuff?
And gcc cant compile for/on a windows target...??
I am thinking now of an applet in XFCE 4 called "multiload ng". I think
it comes from an old gnome one. Well, the thing is it can graph
processor load as "User, System, Nice, or I/O wait". Kernel load would
be "System". But maybe other things are also "System". Maybe libc?
On Thu, 14 May 2026 12:54:48 +0200, Carlos E.R. wrote:
I am thinking now of an applet in XFCE 4 called "multiload ng". I think
it comes from an old gnome one. Well, the thing is it can graph
processor load as "User, System, Nice, or I/O wait". Kernel load would
be "System". But maybe other things are also "System". Maybe libc?
I had to install it on openSUSE with
sudo zypper install sysstat
sudo systemctl enable sysstat
sudo systemctl enable sysstat
There are a bunch of flags described in the man page but
sar
Linux 6.17.0-23-generic (kropotkin) 05/14/2026 _x86_64_ (8 CPU)
12:00:15 AM CPU %user %nice %system %iowait %steal %idle
12:10:18 AM all 0.48 0.00 0.48 0.04 0.00 99.00
12:20:05 AM all 0.50 0.08 0.51 0.07 0.00 98.84
12:30:03 AM all 0.48 0.00 0.47 0.05 0.00 99.00
12:40:01 AM all 0.49 0.00 0.49 0.06 0.00 98.96
12:50:01 AM all 0.47 0.00 0.47 0.05 0.00 99.00
01:00:14 AM all 0.49 0.00 0.48 0.06 0.00 98.98
That's cumulative but 'sar -u 2 10' will report every 2 seconds for 10 times.
The docs say 'system' means kernel. you also get iostat and pidstat. pidstat is useful to see who is dipping into the kernel.
not@telling.you.invalid (Computer Nerd Kev) writes:
Is that "fairly straightforward today" like RK said? I'd say
writing optimised code in assembly and decoding CPU-specific
capabilities identifiers to decide at run-time whether it can be
used is really pretty full-on compared to just letting the compiler
build pure C code optimised for one specific CPU.
The statement was:
| It's fairly straightforward today to build performance critical code
| multiple times for different targets and select the best one at
| runtime. This includes using intrinsics, vector instructions or
| assembler selected for particular classes of CPU, as well as just
| giving the compiler a narrower target and letting it get on with it.
For an example of getting the compiler to build something multiple times
for different targets, so that an executable can select the best one at runtime, see https://godbolt.org/z/PP1Mc3hfj.
Obviously the overall effort increases if you hand-optimize for specific targets, or add more architectures, but that's what you would expect;
the complexity scales with your ambitions. The basic point here is that putting together a design which allows for multiple targets to be
supported within a single executable or library is really quite easy
with modern compilers.
ThererCOs an advantage to running different compilers too: the warnings
they generate highlight different issues with the code. So more
different compilers = more warnings = more opportunities to spot bugs
before they reach the customers.
Richard Kettlewell <invalid@invalid.invalid> wrote:
The statement was:
| It's fairly straightforward today to build performance critical code
| multiple times for different targets and select the best one at
| runtime. This includes using intrinsics, vector instructions or
| assembler selected for particular classes of CPU, as well as just
| giving the compiler a narrower target and letting it get on with it.
For an example of getting the compiler to build something multiple times
for different targets, so that an executable can select the best one at
runtime, see https://godbolt.org/z/PP1Mc3hfj.
Thanks, yes that is a more approachable method of doing the
selection at run-time compared to that FFmpeg code. For those
playing along at home, I see the "target" attribute feature of GCC
is documented here (with an example that seems to avoid needing the
separate function to select for the CPU in use):
https://gcc.gnu.org/onlinedocs/gcc/Function-Multiversioning.html
ItrCOd probably work well for the toy example from above, if you thought
you could do a better job than the compiler.
On 2026-05-13 16:48, Rich wrote:
Yes, apply the "careful optimizing" to the programs that actually
benefit. But, as pointed out by R.K., ffmpeg is one of the
applications that compiles with the "include the various optimizations,
select the appropriate code path at runtime based upon CPU executing
the process" so it is already mostly ready to take advantage of what it
finds in the CPU it is using at the time. Meaning I can just let
Slackware install the standard package and not feel any great need to
"recomple" it to gain a noticable speedup.
Good to know.
The benefits of "custom optimized compiles for your specific setup",
while still technically present, have narrowed vs. the "generic distro
version" that there is often not enough actual gain to be had to
justify the effort of "recompiling a specific version" anymore.
Internet forums report nothing but the mainstream distros
which include every possible module in one giant, bloated mess.
No remedy is to be found.
Doesn't anyone build their own kernel anymore?
Is the
average GNU/Linux user just a distro slave with no
technical competence or curiosity?
GNU/Linux was begun as a project for the technical elite,
You just proved, once again that you know nothing about kernel. The
purpose of a module is to be loaded only if needed.
Doesn't anyone build their own kernel anymore?
I won't spend half of my time to compile my kernel. I know how it's
done, that's enough. I don't have to learn the same thing every week.
You just proved, once again that you know nothing about kernel. The
purpose of a module is to be loaded only if needed. So if you don't need
it, it's not loaded and no ressource is wasted. And if you need it, it's loaded because it's available. So it's the perfect world: everything available without the bloat part.
Le 11-05-2026, Leroy H <lh@somewhere.net> a |-crit-a:
Internet forums report nothing but the mainstream distros
which include every possible module in one giant, bloated mess.
You just proved, once again that you know nothing about kernel. The
purpose of a module is to be loaded only if needed. So if you don't need
it, it's not loaded and no ressource is wasted. And if you need it, it's loaded because it's available. So it's the perfect world: everything available without the bloat part.
It's not the kernel 1.x anymore. You don't have to chose if you will
need to use it or not at compile time to avoid having a huge kernel. Everything is modular, so you don't have to worry.
No remedy is to be found.
Agreed. You would need to learn the basics to be able to find the
remedy. But you lack the brain to do that.
Doesn't anyone build their own kernel anymore?
I won't spend half of my time to compile my kernel. I know how it's
done, that's enough. I don't have to learn the same thing every week.
Is the
average GNU/Linux user just a distro slave with no
technical competence or curiosity?
You don't have the technical competence because you wouldn't fail each
week a new kernel is out. You don't have the curiosity because ou would
learn new things instead of failing endlessly to learn how to compile
your kernel.
GNU/Linux was begun as a project for the technical elite,
No.
On 15 May 2026 21:04:20 GMT, St|-phane CARPENTIER wrote:
You just proved, once again that you know nothing about kernel. The
purpose of a module is to be loaded only if needed.
Who/what loads it?
You forget that I don't use systemd. I use my own hand-crafted
boot scripts. Therefore *I* am the one that loads and not some
piece of foreign code that is beyond my control.
Richard Kettlewell <invalid@invalid.invalid> a |-crit-a:
ItrCOd probably work well for the toy example from above, if you
thought you could do a better job than the compiler.
In the late 90's I new some people who could improve the binary code generated by the compiler. Not anymore. The compilers improved a lot,
the hardware is way more complex and the system is way more complex. So,
I don't believe anyone can do a better job than a compiler.
St|-phane CARPENTIER <sc@fiat-linux.fr> writes:
Richard Kettlewell <invalid@invalid.invalid> a |-crit-a:
ItrCOd probably work well for the toy example from above, if you
thought you could do a better job than the compiler.
In the late 90's I new some people who could improve the binary code
generated by the compiler. Not anymore. The compilers improved a lot,
the hardware is way more complex and the system is way more complex. So,
I don't believe anyone can do a better job than a compiler.
Have a look at OpenSSLrCOs bignum arithmetic implementation, lots of assembler code that was written because compilers donrCOt do a good job.
Richard Kettlewell <invalid@invalid.invalid> a |-crit-a:
St|-phane CARPENTIER <sc@fiat-linux.fr> writes:
In the late 90's I new some people who could improve the binary code
generated by the compiler. Not anymore. The compilers improved a
lot, the hardware is way more complex and the system is way more
complex. So, I don't believe anyone can do a better job than a
compiler.
Have a look at OpenSSLrCOs bignum arithmetic implementation, lots of
assembler code that was written because compilers donrCOt do a good job.
I'm not sure about what you are talking. A rapid search on the subject
didn't gave me anything interesting. Now, from what I heard, the issue
in security is that sometimes compilers do a too good job that must
be avoided. [...]
Now, if you are speaking about compiler being bad at optimizing fast
code, I'd like a link. I'm always willing to learn new things, mostly
when they don't compel with what I thought I know. I'd like to know in
which cases (and so: why) a compiler can't be as good as a human
being.
On 15 May 2026 21:04:20 GMT, St|-phane CARPENTIER wrote:
You just proved, once again that you know nothing about kernel. The
purpose of a module is to be loaded only if needed.
Who/what loads it?
You forget that I don't use systemd. I use my own hand-crafted
boot scripts.
On 2026-05-16 02:37, Leroy H wrote:
On 15 May 2026 21:04:20 GMT, St|-phane CARPENTIER wrote:
You just proved, once again that you know nothing about kernel. The
purpose of a module is to be loaded only if needed.
Who/what loads it?
You forget that I don't use systemd. I use my own hand-crafted
boot scripts. Therefore *I* am the one that loads and not some
piece of foreign code that is beyond my control.
LOL. Now you proved your lack of knowledge. It is not systemd (nor
initd) who loads kernel modules (although they can do it). I thought you were knowledgeable, but not anymore.
You forget that I don't use systemd. I use my own hand-crafted
boot scripts.
Irrelevant. The kernel module auto-loader subsystem is separate from
the boot scripts.
Yep, Leroy just proved Stephanie's post about Leroy being stupid and technically incompetent.
On Sat, 16 May 2026 18:39:21 -0000 (UTC), Rich wrote:
You forget that I don't use systemd. I use my own hand-crafted
boot scripts.
Irrelevant. The kernel module auto-loader subsystem is separate from
the boot scripts.
The hell it is.
On my system, no module loads without *my* explicit permission.
You should brush up on your knowledge of Linux.
On 2026-05-16 21:37, Leroy H wrote:
On Sat, 16 May 2026 18:39:21 -0000 (UTC), Rich wrote:
You forget that I don't use systemd. I use my own hand-crafted
boot scripts.
Irrelevant. The kernel module auto-loader subsystem is separate from
the boot scripts.
The hell it is.
On my system, no module loads without *my* explicit permission.
You should brush up on your knowledge of Linux.
Ha, ha. :-D
Carlos E.R. <robin_listas@es.invalid> wrote:
On 2026-05-16 21:37, Leroy H wrote:
On Sat, 16 May 2026 18:39:21 -0000 (UTC), Rich wrote:
You forget that I don't use systemd. I use my own hand-crafted
boot scripts.
Irrelevant. The kernel module auto-loader subsystem is separate from
the boot scripts.
The hell it is.
On my system, no module loads without *my* explicit permission.
You should brush up on your knowledge of Linux.
Ha, ha. :-D
Yep, pointing out that Leroy is technically incompetent seems to have
struck a nerve (which would likely indicate that Leroy **is**
technically incompetent).
Leroy H <lh@somewhere.net> wrote:
On 15 May 2026 21:04:20 GMT, St|-phane CARPENTIER wrote:
You just proved, once again that you know nothing about kernel. The
purpose of a module is to be loaded only if needed.
Who/what loads it?
The kernel itself has supported module auto-loading upon use/need for a
very long time.
You forget that I don't use systemd. I use my own hand-crafted
boot scripts.
Irrelevant. The kernel module auto-loader subsystem is separate from
the boot scripts.
Who/what loads it?
That's impressive. You really have no clue about what is a module and
how they are managed by Linux?
On my system, no module loads without *my* explicit permission.
Ha, ha. :-D
Le 16-05-2026, Rich <rich@example.invalid> a |-crit-a:
Thanks for your answer. My ISP didn't relay his message, without your
answer I would have miss it.
Leroy H <lh@somewhere.net> wrote:
On 15 May 2026 21:04:20 GMT, St|-phane CARPENTIER wrote:
You just proved, once again that you know nothing about kernel. The
purpose of a module is to be loaded only if needed.
Who/what loads it?
That's impressive. You really have no clue about what is a module and
how they are managed by Linux? And you pretend you can improve your
kernel by compiling it yourself? No way. You lack the basis of the
basis. You are the living proof that the kernel is so well designed it
can be compiled by an half-wit like you who select options at random.
Leroy H <lh@somewhere.net> wrote:
St|-phane CARPENTIER wrote:
You just proved, once again that you know nothing about kernel. The
purpose of a module is to be loaded only if needed.
Who/what loads it?
The kernel itself has supported module auto-loading upon use/need for
a very long time.
You forget that I don't use systemd. I use my own hand-crafted
boot scripts.
Irrelevant. The kernel module auto-loader subsystem is separate from
the boot scripts.
The kernel loads modules via the init_module and finit_module
syscalls. As far as I can tell there are no other paths into the module loader and no automatic loading by the kernel - all the automation
happens in user processes. If IrCOve missed anything then please do point
me at the implementation.
| Sysop: | Amessyroom |
|---|---|
| Location: | Fayetteville, NC |
| Users: | 65 |
| Nodes: | 6 (0 / 6) |
| Uptime: | 08:00:53 |
| Calls: | 862 |
| Files: | 1,311 |
| D/L today: |
1 files (1,366K bytes) |
| Messages: | 264,936 |