Forum: Too Lazy BBS

Re: Ubuntu 22 Boot Errors

From Andy Burns@usenet@andyburns.uk to alt.os.linux,uk.comp.os.linux on Sun Dec 8 12:00:34 2024

From Newsgroup: uk.comp.os.linux

Java Jive wrote:

-a-a-a-a-a ERROR: Unable to locate IOAPIC for GSI 37

Possible dodgy config tables in BIOS/UEFI.

firmware upgrade?
or legacy options you can disable?
--- Synchronet 3.21d-Linux NewsLink 1.2

From vallor@vallor@cultnix.org to alt.os.linux,uk.comp.os.linux on Mon Dec 9 03:36:32 2024

From Newsgroup: uk.comp.os.linux

On Sun, 8 Dec 2024 11:41:05 +0000, Java Jive <java@evij.com.invalid> wrote
in <vj40kk$3p436$1@dont-email.me>:

Dec 06 21:18:47 HOSTNAME kernel: *BAD*gran_size: 128M chunk_size: 2G
num_reg: 10 lose cover RAM: -834M

I don't know why this would happen, but if it happened to me,
I'd run "lsmem" and "free" and make sure all the memory eventually
made it online...

Also, it looks like it might have something to do with mtrr. On
my host, dmesg reads:

[ 0.000000] total RAM covered: 3071M
[ 0.000000] Found optimal setting for mtrr clean up
[ 0.000000] gran_size: 64K chunk_size: 128M num_reg: 3
lose cover RAM: 0G
[ 0.000000] MTRR map: 7 entries (3 fixed + 4 variable; max 20), built
from 9 variable MTRRs

So these entries may be due to MTRR, which according to Documentation/arch/x86/mtrr.rst is getting phased out. On my
system, I see this when I cat /proc/mtrr:

$ sudo cat /proc/mtrr
reg00: base=0x000000000 ( 0MB), size= 2048MB, count=1: write-back
reg01: base=0x080000000 ( 2048MB), size= 1024MB, count=1: write-back
reg02: base=0x0ba0a0000 ( 2976MB), size= 64KB, count=1: uncachable

Give a gander to that document, it outlines what mtrr might be used for
in modern-day kernels.

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/arch/x86/mtrr.rst?h=v6.12.3
--
-v System76 Thelio Mega v1.1 x86_64 NVIDIA RTX 3090 Ti
OS: Linux 6.12.3 Release: Mint 21.3 Mem: 258G
--- Synchronet 3.21d-Linux NewsLink 1.2

From Java Jive@java@evij.com.invalid to alt.os.linux,uk.comp.os.linux on Mon Dec 9 20:33:25 2024

From Newsgroup: uk.comp.os.linux

On 2024-12-09 03:36, vallor wrote:

On Sun, 8 Dec 2024 11:41:05 +0000, Java Jive <java@evij.com.invalid> wrote
in <vj40kk$3p436$1@dont-email.me>:

Dec 06 21:18:47 HOSTNAME kernel: *BAD*gran_size: 128M chunk_size: 2G
num_reg: 10 lose cover RAM: -834M

I don't know why this would happen, but if it happened to me,
I'd run "lsmem" and "free" and make sure all the memory eventually
made it online...

My first reaction on seeing the messages - which in my OP I should
have mentioned that I'd already done, but it slipped my mind - was to
boot into memcheck and let a full cycle complete, but no memory problems
were found.

Also, it looks like it might have something to do with mtrr.

Yes, as originally I linked.

On
my host, dmesg reads:

[ 0.000000] total RAM covered: 3071M
[ 0.000000] Found optimal setting for mtrr clean up
[ 0.000000] gran_size: 64K chunk_size: 128M num_reg: 3
lose cover RAM: 0G
[ 0.000000] MTRR map: 7 entries (3 fixed + 4 variable; max 20), built
from 9 variable MTRRs

So these entries may be due to MTRR, which according to Documentation/arch/x86/mtrr.rst is getting phased out. On my
system, I see this when I cat /proc/mtrr:

$ sudo cat /proc/mtrr
reg00: base=0x000000000 ( 0MB), size= 2048MB, count=1: write-back
reg01: base=0x080000000 ( 2048MB), size= 1024MB, count=1: write-back
reg02: base=0x0ba0a0000 ( 2976MB), size= 64KB, count=1: uncachable

Give a gander to that document, it outlines what mtrr might be used for
in modern-day kernels.

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/arch/x86/mtrr.rst?h=v6.12.3

These are both Dell Precisions, an M6700 and an M6800, so no chance of
PAT. One of the links I gave suggests giving some kernel boot
parameters to suggest a good compromise setting which minimises unused
RAM and thereby saves the failed testing which gave rise to the testing
trace that I quoted. I wan't very specific in my OP, but it think the
best I can do is determine such boot parameters, and any help with that
from someone more knowledgeable than myself would be much appreciated.
--

Fake news kills!

I may be contacted via the contact address given on my website: www.macfh.co.uk

--- Synchronet 3.21d-Linux NewsLink 1.2

From Java Jive@java@evij.com.invalid to alt.os.linux,uk.comp.os.linux on Tue Dec 10 13:42:31 2024

From Newsgroup: uk.comp.os.linux

On 2024-12-08 12:00, Andy Burns wrote:

Java Jive wrote:

-a-a-a-a-a-a ERROR: Unable to locate IOAPIC for GSI 37

Possible dodgy config tables in BIOS/UEFI.

firmware upgrade?
or legacy options you can disable?

Yes, thanks, thinking that there might be something to any of your
suggestions above, I've spent the last day or so trying to determine
exactly which PCs were showing that fault. Originally, I'm fairly sure
that there were more than just one, but now it seems to be just one,
this one. I think the difference between the time of my OP and now is
that in the meantime I've continued trying to clean up the boot
messages, and this has involved uninstalling Virtualbox on all of the
PCs, including this one. So ...

My best guess for the other PCs is that Virtualbox was causing the messages.

As for this PC, which is a Dell Precision M6800, something that I've
noticed for the M6700/6800 series, which are of near identical design to
each other, is that sometimes the COM port shows up under Windows as
having problems and not installing properly. I'm not quite sure why
this should be, but as there is no external serial connector anyway, and
I haven't used an actual COM port [*] for around two decades either, I
haven't bothered to investigate this phenomenon further.

* I've used USB-to-low-voltage-serial cables, such as the Sony DKU-5
cable that used to be used to connect their phones to a PC, to flash
hardware such as routers, but not an actual COM port.

Putting the above together with the fact that I've noticed that, on a
Dell Inspiron, a daughterboard for connecting an NVMe only actually has
the connector on those model variants that were supplied originally with
an NVMe drive, other models have an otherwise identical daughterboard
but without the actual connector - thus saving a few cents per PC
sale! - I'm wondering if with these M6700/6800s Dell may have been
doing something similar, populating the boards with some, but not all,
of the hardware necessary for the COM port, this time saving on some of
the actual chips required instead of just a connector.

Either that or a PCB has a fault, but, being ATM in the middle of
'churning' my hardware, I have three of these machines, and two of them
show similar symptoms relating to the COM port, which seems a rather
high improbability of 2 out of 3 machines bought pseudo randomly from different eBay suppliers having the same board fault on arrival?

At any rate, my best guess for this PC is that the original message that
I queried is being caused by the 'faulty' COM port.
--

Fake news kills!

I may be contacted via the contact address given on my website: www.macfh.co.uk

--- Synchronet 3.21d-Linux NewsLink 1.2

From Paul@nospam@needed.invalid to alt.os.linux,uk.comp.os.linux on Tue Dec 10 09:40:30 2024

From Newsgroup: uk.comp.os.linux

On Tue, 12/10/2024 8:42 AM, Java Jive wrote:

On 2024-12-08 12:00, Andy Burns wrote:

Java Jive wrote:

-a-a-a-a-a-a ERROR: Unable to locate IOAPIC for GSI 37

Possible dodgy config tables in BIOS/UEFI.

firmware upgrade?
or legacy options you can disable?

Yes, thanks, thinking that there might be something to any of your suggestions above, I've spent the last day or so trying to determine exactly which PCs were showing that fault.-a Originally, I'm fairly sure that there were more than just one, but now it seems to be just one, this one.-a I think the difference between the time of my OP and now is that in the meantime I've continued trying to clean up the boot messages, and this has involved uninstalling Virtualbox on all of the PCs, including this one.-a So ...

My best guess for the other PCs is that Virtualbox was causing the messages.

As for this PC, which is a Dell Precision M6800, something that I've noticed for the M6700/6800 series, which are of near identical design to each other, is that sometimes the COM port shows up under Windows as having problems and not installing properly.-a I'm not quite sure why this should be, but as there is no external serial connector anyway, and I haven't used an actual COM port [*] for around two decades either, I haven't bothered to investigate this phenomenon further.

*-a I've used USB-to-low-voltage-serial cables, such as the Sony DKU-5 cable that used to be used to connect their phones to a PC, to flash hardware such as routers, but not an actual COM port.

Putting the above together with the fact that I've noticed that, on a Dell Inspiron, a daughterboard for connecting an NVMe only actually has the connector on those model variants that were supplied originally with an NVMe drive, other models have an otherwise identical daughterboard but without the actual connector-a --a thus saving a few cents per PC sale!-a --a I'm wondering if with these M6700/6800s Dell may have been doing something similar, populating the boards with some, but not all, of the hardware necessary for the COM port, this time saving on some of the actual chips required instead of just a connector.

Either that or a PCB has a fault, but, being ATM in the middle of 'churning' my hardware, I have three of these machines, and two of them show similar symptoms relating to the COM port, which seems a rather high improbability of 2 out of 3 machines bought pseudo randomly from different eBay suppliers having the same board fault on arrival?

At any rate, my best guess for this PC is that the original message that I queried is being caused by the 'faulty' COM port.

This is just a random suggestion, with no evidence to back it up.

Power off the machine, remove one of the DIMMs and try your dmesg
readout a second time. and see if your granularity issue changes.

it could be that the address map on the chipset is defective somehow.

The machine I'm typing on, has such a problem, and it used to freeze
in the graphics driver, because the shared memory was somehow double defined
or something. It's my suspicion that with less than max RAM installed,
it would behave itself.

*******

The other idea I tried out here, is I figured a machine with Intel Management Engine,
the addressing may need to provide space for Minux to run. And maybe the offset causes by that, is the problem. But when I tested that theory on the Optiplex 780,
dmesg was as clean as could be. It looked like PAT had been used. So that does not
look like a credible possibility.

Paul
--- Synchronet 3.21d-Linux NewsLink 1.2

From Theo@theom+news@chiark.greenend.org.uk to alt.os.linux,uk.comp.os.linux on Wed Dec 11 14:53:10 2024

From Newsgroup: uk.comp.os.linux

In uk.comp.os.linux Java Jive <java@evij.com.invalid> wrote:

I'm going round my Ubuntu 22 machines trying to remove error and fail messages from the boot, mostly successfully, but five anomalies are
proving hard to fix, despite which the PCs all seem to work ...

1) IOAPIC

This is occurring on more than one PC. Searching for ...

ERROR: Unable to locate IOAPIC for GSI 37

That means it can't work out which interrupt controller is used for a particular interrupt. Likely means the ACPI tables are incomplete. If everything works you can ignore this.

2) blkmapd

I think this is occurring on ALL my Ubuntu 22 machines. Appears to be related to NFS, but networking seems fine (apart from a minor seemingly unrelated issue already solved). Searching for ...

"blkmapd[717]: open pipe file /run/rpc_pipefs/nfs/blocklayout
failed: No such file or directory"

Are you actually running NFS? If not you can ignore this.

3) CUPS Scheduler

This also is occuring on many or all of my Ubuntu 22 machines, even a
while after a successful boot. Oddly the status of cups service always shows it to be working.

Dec 07 17:06:00 HOSTNAME systemd[1]: cups.service: start operation timed out. Terminating.
Dec 07 17:06:00 HOSTNAME systemd[1]: cups.service: Failed with result 'timeout'.
Dec 07 17:06:00 HOSTNAME systemd[1]: Failed to start CUPS Scheduler.
Dec 07 17:06:00 HOSTNAME systemd[1]: cups.service: Scheduled restart
job, restart counter is at 5.
Dec 07 17:06:00 HOSTNAME systemd[1]: Stopped CUPS Scheduler.
Dec 07 17:06:00 HOSTNAME systemd[1]: cups.path: Deactivated successfully.
Dec 07 17:06:00 HOSTNAME systemd[1]: Stopped CUPS Scheduler.
Dec 07 17:06:00 HOSTNAME systemd[1]: Stopping CUPS Scheduler...
Dec 07 17:06:00 HOSTNAME systemd[1]: Started CUPS Scheduler.
Dec 07 17:06:00 HOSTNAME systemd[1]: cups.socket: Deactivated successfully. Dec 07 17:06:00 HOSTNAME systemd[1]: Closed CUPS Scheduler.
Dec 07 17:06:00 HOSTNAME systemd[1]: Stopping CUPS Scheduler...
Dec 07 17:06:00 HOSTNAME systemd[1]: Listening on CUPS Scheduler.
Dec 07 17:06:00 HOSTNAME systemd[1]: Starting CUPS Scheduler...
Dec 07 17:06:00 HOSTNAME audit[1760]: AVC apparmor="DENIED" operation="capable" profile="/usr/sbin/cupsd" pid=1760 comm="cupsd" capability=12 capname="net_>

I'm not seeing a problem there? CUPS didn't start first time round because something was busy, but tried again and succeeded.

4) UBSAN

This is on a laptop with two on-board GPUs, and I think is related to
that fact. However, there were no hits under DuckDuckGo, Google, or
Yahoo for ...

Ubuntu 22 "UBSAN: array-index-out-of-bounds in /build/linux-hwe-6.8-W0MdK2/linux-hwe-6.8-6.8.0/drivers/gpu/drm/radeon/radeon_atombios.c:633:33"

UBSAN is the Undefined Behaviour Sanitiser, ie a debugging tool. Something went wrong in the driver for AMD Radeon GPUs, ie a bug, maybe due to the relatively elderly GPU you have. If you aren't using the AMD GPU you can ignore it if it's not actually causing a crash (or could disable the driver
if you wanted).

5) Initiating RAM registers

The following is occurring very early in the logs on 2 machines with
32GB RAM, and seems to be about how set up registers for RAM access, as
per these two links ...

This seems to be related to an MTRR problem - maybe the hardware doesn't let the kernel find the optimal memory layout with more RAM than it was
originally designed for. How much RAM does Linux show you have after it's booted? Does it lose any memory, and can you live with having the amount
that remains?

Most of these seem like 'new Linux, old hardware' issues, but nothing
actually to worry me there. I'd check you're on the latest BIOS as that
might help some of the ACPI related issues.

Theo
--- Synchronet 3.21d-Linux NewsLink 1.2

From Java Jive@java@evij.com.invalid to alt.os.linux,uk.comp.os.linux on Thu Dec 12 12:11:05 2024

From Newsgroup: uk.comp.os.linux

On 2024-12-11 14:53, Theo wrote:

In uk.comp.os.linux Java Jive <java@evij.com.invalid> wrote:

I'm going round my Ubuntu 22 machines trying to remove error and fail
messages from the boot, mostly successfully, but five anomalies are
proving hard to fix, despite which the PCs all seem to work ...

1) IOAPIC

This is occurring on more than one PC. Searching for ...

ERROR: Unable to locate IOAPIC for GSI 37

That means it can't work out which interrupt controller is used for a particular interrupt. Likely means the ACPI tables are incomplete. If everything works you can ignore this.

Everything of any importance seems to be working, but see also my reply
to Andy regarding the COM port.

2) blkmapd

I think this is occurring on ALL my Ubuntu 22 machines. Appears to be
related to NFS, but networking seems fine (apart from a minor seemingly
unrelated issue already solved). Searching for ...

"blkmapd[717]: open pipe file /run/rpc_pipefs/nfs/blocklayout
failed: No such file or directory"

Are you actually running NFS? If not you can ignore this.

Yes, and it *seems* to be running fine, so I'm not sure what is going on
here.

3) CUPS Scheduler

This also is occuring on many or all of my Ubuntu 22 machines, even a
while after a successful boot. Oddly the status of cups service always
shows it to be working.

Dec 07 17:06:00 HOSTNAME systemd[1]: cups.service: start operation timed
out. Terminating.
Dec 07 17:06:00 HOSTNAME systemd[1]: cups.service: Failed with result
'timeout'.
Dec 07 17:06:00 HOSTNAME systemd[1]: Failed to start CUPS Scheduler.
Dec 07 17:06:00 HOSTNAME systemd[1]: cups.service: Scheduled restart
job, restart counter is at 5.
Dec 07 17:06:00 HOSTNAME systemd[1]: Stopped CUPS Scheduler.
Dec 07 17:06:00 HOSTNAME systemd[1]: cups.path: Deactivated successfully.
Dec 07 17:06:00 HOSTNAME systemd[1]: Stopped CUPS Scheduler.
Dec 07 17:06:00 HOSTNAME systemd[1]: Stopping CUPS Scheduler...
Dec 07 17:06:00 HOSTNAME systemd[1]: Started CUPS Scheduler.
Dec 07 17:06:00 HOSTNAME systemd[1]: cups.socket: Deactivated successfully. >> Dec 07 17:06:00 HOSTNAME systemd[1]: Closed CUPS Scheduler.
Dec 07 17:06:00 HOSTNAME systemd[1]: Stopping CUPS Scheduler...
Dec 07 17:06:00 HOSTNAME systemd[1]: Listening on CUPS Scheduler.
Dec 07 17:06:00 HOSTNAME systemd[1]: Starting CUPS Scheduler...
Dec 07 17:06:00 HOSTNAME audit[1760]: AVC apparmor="DENIED"
operation="capable" profile="/usr/sbin/cupsd" pid=1760 comm="cupsd"
capability=12 capname="net_>

I'm not seeing a problem there? CUPS didn't start first time round because something was busy, but tried again and succeeded.

I see, but then presumably it's an error by the folk who wrote the code
to flag it as an error.

4) UBSAN

This is on a laptop with two on-board GPUs, and I think is related to
that fact. However, there were no hits under DuckDuckGo, Google, or
Yahoo for ...

Ubuntu 22 "UBSAN: array-index-out-of-bounds in
/build/linux-hwe-6.8-W0MdK2/linux-hwe-6.8-6.8.0/drivers/gpu/drm/radeon/radeon_atombios.c:633:33"

UBSAN is the Undefined Behaviour Sanitiser, ie a debugging tool. Something went wrong in the driver for AMD Radeon GPUs, ie a bug, maybe due to the relatively elderly GPU you have. If you aren't using the AMD GPU you can ignore it if it's not actually causing a crash (or could disable the driver if you wanted).

I understand. I'm not aware of any problems with graphics under Ubuntu.

5) Initiating RAM registers

The following is occurring very early in the logs on 2 machines with
32GB RAM, and seems to be about how set up registers for RAM access, as
per these two links ...

This seems to be related to an MTRR problem - maybe the hardware doesn't let the kernel find the optimal memory layout with more RAM than it was originally designed for. How much RAM does Linux show you have after it's booted? Does it lose any memory, and can you live with having the amount that remains?

lsmem gives ...

root@HOSTNAME:home# lsmem
RANGE SIZE STATE REMOVABLE BLOCK 0x0000000000000000-0x00000000bfffffff 3G online yes 0-23 0x0000000100000000-0x000000083fffffff 29G online yes 32-263

Memory block size: 128M
Total online memory: 32G
Total offline memory: 0B

... so Ubuntu seems to be able to access all the actual physical RAM.

Most of these seem like 'new Linux, old hardware' issues, but nothing actually to worry me there. I'd check you're on the latest BIOS as that might help some of the ACPI related issues.

I see. Thanks very much for your help, Theo, much appreciated.
--

Fake news kills!

I may be contacted via the contact address given on my website: www.macfh.co.uk

--- Synchronet 3.21d-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Geek2
  Sun May 17 07:06:15 2026
  from Euclid, Oh via Telnet
- Geek2
  Sat May 16 21:25:04 2026
  from Euclid, Oh via Telnet
- Jas Hud
  Sat May 16 00:50:28 2026
  from Bbs.Eob-Bbs.Com,wi via Telnet
- Geek2
  Fri May 15 19:53:20 2026
  from Euclid, Oh via Telnet

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	65
Nodes:	6 (0 / 6)
Uptime:	08:05:56
Calls:	862
Files:	1,311
D/L today:	1 files (1,366K bytes)
Messages:	264,936

Re: Ubuntu 22 Boot Errors

Who's Online

Recent Visitors

System Info