Sysop: | Amessyroom |
---|---|
Location: | Fayetteville, NC |
Users: | 43 |
Nodes: | 6 (0 / 6) |
Uptime: | 96:08:05 |
Calls: | 290 |
Files: | 904 |
Messages: | 76,426 |
The cause was me booting up the machine with a rescue disk. This...
assembled my RAID partitions /dev/md127 and /dev/md126 reversed, but
also wrote those wrong identifiers, 126 and 127, into the "preferred
minor" field of the partitions' super blocks. In essence, they got
swapped.
Just for the record, all my RAID arrays have metadata version 0.90, the
(old fashioned) one that allows auto-assembly by the kernel without the
need of an initramfs.
The moral of the story: if your system uses software RAID, be careful
indeed before you boot up with a rescue disk.
Alan Mackenzie:
...
The cause was me booting up the machine with a rescue disk. This...
assembled my RAID partitions /dev/md127 and /dev/md126 reversed, but
also wrote those wrong identifiers, 126 and 127, into the "preferred
minor" field of the partitions' super blocks. In essence, they got swapped.
Just for the record, all my RAID arrays have metadata version 0.90, the (old fashioned) one that allows auto-assembly by the kernel without the need of an initramfs.
The moral of the story: if your system uses software RAID, be careful indeed before you boot up with a rescue disk.
So, why don't you simple add "root=902 md=2,/dev/sda2,/dev/sdb2" or similar to
your boot loader kernel command line ?
///
And... what is the need for dynamic minors now when dev_t is 32bits:
$ grep dev_t /Net/git/linux-stable/include/linux/types.h
typedef u32 __kernel_dev_t;
typedef __kernel_dev_t dev_t;
$
and we have 20 bits minors:
$ grep -A1 MINORBITS /Net/git/linux-stable/include/linux/kdev_t.h
#define MINORBITS 20
#define MINORMASK ((1U << MINORBITS) - 1)
#define MAJOR(dev) ((unsigned int) ((dev) >> MINORBITS))
#define MINOR(dev) ((unsigned int) ((dev) & MINORMASK))
#define MKDEV(ma,mi) (((ma) << MINORBITS) | (mi))
Regards,
/Karl Hammar
On Fri, Dec 20, 2024 at 15:50:53 +0100, karl@aspodata.se wrote:...
Because I didn't know about it. I found out about it this morning, and immediately tested it by setting up an
"md=126,/dev/nvme0n1p4,/dev/nvme1n1p4" on the kernel command line, using
the rescue disk to make the "preferred minor"s wrong, and testing it.
It worked!
If I understand things correctly, with this mechanism one can have the
kernel assemble the RAID arrays at boot up time with a modern metadata,
but still without needing the initramfs. My arrays are still at
metadata 0.90.
And... what is the need for dynamic minors now when dev_t is 32bits:Dynamic minors? I don't think I follow you, here.
By the way, do you know an easy way for copying an entire filesystem,
such as the root system, but without copying other systems mounted in
it? I tried for some while with rsync and various combinations of
find's and xargs's, and in the end booted up into the rescue disc to do
it. I shouldn't have to do that.
Alan Mackenzie:
On Fri, Dec 20, 2024 at 15:50:53 +0100, karl@aspodata.se wrote:...
Because I didn't know about it. I found out about it this morning, and immediately tested it by setting up an "md=126,/dev/nvme0n1p4,/dev/nvme1n1p4" on the kernel command line, using the rescue disk to make the "preferred minor"s wrong, and testing it.
It worked!
If I understand things correctly, with this mechanism one can have the kernel assemble the RAID arrays at boot up time with a modern metadata,
but still without needing the initramfs. My arrays are still at
metadata 0.90.
Please tell if you make booting with metadata 1.2 work.
I havn't tested that.
///
...
And... what is the need for dynamic minors now when dev_t is 32bits:Dynamic minors? I don't think I follow you, here.
If you partition the md device, the partitions will get a device with a dynamic minor.
# mdadm -C /dev/md11 -n 1 -l 1 --force /dev/sdc2
# mdadm -C /dev/md10 -n 1 -l 1 -e 0 --force /dev/sdc1
... create partitions
# fdisk -l /dev/md10
...
Device Boot Start End Sectors Size Id Type
/dev/md10p1 2048 22527 20480 10M 83 Linux
/dev/md10p2 22528 192383 169856 82.9M 83 Linux
# fdisk -l /dev/md11
...
Device Boot Start End Sectors Size Id Type
/dev/md11p1 2048 206847 204800 100M 83 Linux
/dev/md11p2 206848 1757183 1550336 757M 83 Linux
# cat /sys/block/md10/md10p1/dev
259:0
# cat /sys/block/md10/md10p2/dev
259:1
# cat /sys/block/md11/md11p1/dev
259:2
# cat /sys/block/md11/md11p2/dev
259:3
$ grep -A2 '259 block' /Net/git/linux-stable/Documentation/admin-guide/devices.txt
259 block Block Extended Major
Used dynamically to hold additional partition minor
numbers and allow large numbers of partitions per device
So, to boot to a md device partition (as /) might be a hit and miss
unless you use some initramfs magic.
Regards,
/Karl Hammar
Am Fr, Dez 20, 2024 am 08:19:55 +0000 schrieb Alan Mackenzie:
By the way, do you know an easy way for copying an entire filesystem,
such as the root system, but without copying other systems mounted in
it? I tried for some while with rsync and various combinations of
find's and xargs's, and in the end booted up into the rescue disc to do
it. I shouldn't have to do that.
rsync -x / /some-other-place
From man rsync:
--one-file-system, -x don’t cross filesystem boundaries
By the way, do you know an easy way for copying an entire filesystem,
such as the root system, but without copying other systems mounted in
it? I tried for some while with rsync and various combinations of
find's and xargs's, and in the end booted up into the rescue disc to do
it. I shouldn't have to do that.
Alan Mackenzie:
On Fri, Dec 20, 2024 at 18:44:53 +0100, karl@aspodata.se wrote:...
Please tell if you make booting with metadata 1.2 work.
I havn't tested that.
I've just tried it, with metadata 1.2, and it doesn't work. I got error messages at boot up to the effect that the component partitions were lacking valid version 0.0 super blocks.
People without initramfs appear not to be in the sights of the...
maintainers of this software. They could so easily have made the
assembly of metadata 1.2 components on the kernel command line work.
:-(
The cmd line handling and auto mounting seems to be handled in files
like (depending of kernel version I guess):
drivers/md/md-autodetect.c
init/do_mounts_md.c
you can find the correct file with
find <kernel top dir> -type f -name \*.c | xargs grep MD_AUTODETECT
The problem might be that in format 1.2, the superblock is at 4K from
start, could format 1.1 (where the superblock is at start) work ?
Regards,
/Karl Hammar
I've now got working code which assembles a metadata 1.2 RAID array at...
boot time. The syntax needed on the command line is, again,
md=124,1.2,/dev/nvme0n1p6,/dev/nvme1n1p6
.. In place of 1.2 can be any of 0.90, 1.0, 1.1, though I haven't tested
it with anything but 1.2 as yet.
On Fri, Dec 20, 2024 at 23:02:58 +0100, karl@aspodata.se wrote:
Alan Mackenzie:
On Fri, Dec 20, 2024 at 18:44:53 +0100, karl@aspodata.se wrote:...
Please tell if you make booting with metadata 1.2 work.
I havn't tested that.
I've just tried it, with metadata 1.2, and it doesn't work. I got error messages at boot up to the effect that the component partitions were lacking valid version 0.0 super blocks.
People without initramfs appear not to be in the sights of the maintainers of this software. They could so easily have made the assembly of metadata 1.2 components on the kernel command line work....
:-(
The pertinent functions are mainly in drivers/md/md-autodetect.c and
md.c (same directory).
Nevertheless, I might make the above enhancement, just because.
Regards,
/Karl Hammar
Alan Mackenzie:
...
I've now got working code which assembles a metadata 1.2 RAID array at
boot time. The syntax needed on the command line is, again,
md=124,1.2,/dev/nvme0n1p6,/dev/nvme1n1p6
.. In place of 1.2 can be any of 0.90, 1.0, 1.1, though I haven't tested it with anything but 1.2 as yet....
Fun! Which kernel, can you send a patch ?
Regards,
/Karl Hammar
I've just tried it, with metadata 1.2, and it doesn't work. I got error messages at boot up to the effect that the component partitions wereNo they couldn't. Not if they wanted (at the time) a kernel small enough
lacking valid version 0.0 super blocks.
People without initramfs appear not to be in the sights of the
maintainers of this software. They could so easily have made the
assembly of metadata 1.2 components on the kernel command line work.
🙁
By the way, do you know an easy way for copying an entire filesystem,
such as the root system, but without copying other systems mounted in
it? I tried for some while with rsync and various combinations of
find's and xargs's, and in the end booted up into the rescue disc to do
it. I shouldn't have to do that.
If I understand things correctly, with this mechanism one can have the
kernel assemble the RAID arrays at boot up time with a modern metadata,
but still without needing the initramfs. My arrays are still at
metadata 0.90.
Please tell if you make booting with metadata 1.2 work.
I havn't tested that.
Hello, Karl.
On Sat, Dec 21, 2024 at 17:45:13 +0100, karl@aspodata.se wrote:
Alan Mackenzie:
...
I've now got working code which assembles a metadata 1.2 RAID array at boot time. The syntax needed on the command line is, again,
md=124,1.2,/dev/nvme0n1p6,/dev/nvme1n1p6
.. In place of 1.2 can be any of 0.90, 1.0, 1.1, though I haven't tested it with anything but 1.2 as yet....
Fun! Which kernel, can you send a patch ?
6.6.62. Patch enclosed. It should apply cleanly from the directory ..../drivers/md.
Have fun!
Regards,
/Karl Hammar
, where the extra bit is optional. This enhancement would not be
difficult. The trouble is more political. I think this code is
maintained by RedHat. RedHat's customers all use initramfs, so they
probably think everybody else should, too, hence would be unwilling to enhance it for a small group of Gentooers.
On 20/12/2024 17:44, karl@aspodata.se wrote:
If I understand things correctly, with this mechanism one can have the
kernel assemble the RAID arrays at boot up time with a modern metadata,
but still without needing the initramfs. My arrays are still at
metadata 0.90.
Please tell if you make booting with metadata 1.2 work.
I havn't tested that.
It is NOT supported. The kernel has no code to do so, you need an
initramfs. That said, nowadays I believe you can actually load the
initramfs into the kernel so it's one monolithic blob ...
By the way, as to the other point of putting /dev/sda etc on the kernel command line, it's the kernel that's messing up and scrambling which
physical disk is which logical sda sdb et al device, so explicitly
specifying that will have exactly NO effect when your hardware/software
combo changes again.
I guess it was the fact your rescue disk booted from CDROM or whatever
made THAT sda, and pushed the other disks out of the way.
sda, sdb, sdc et al are allocated AT RANDOM by the kernel.
It just so happens that the "seed" rarely changes, so in normal use
the same values happen to get chosen every time - until something DOES change, and then you wonder why everything falls over. The same is
also true of md127, md126 et al. If your raid counts up from md1, md2
etc then those I believe are stable, but I haven't seen them for
pretty much the entire time I've been involved in mdraid (maybe a
decade or so?)
You need to use those UUID/GUID things. I know it's a hassle finding out whether it's a guid or a uuid, and what it is, and all that crud, but
trust me they don't change, you can shuffle your disks, stick in another
SATA card, move it from SATA to USB (BAD move - don't even think of it
!!!), and the system will still find the correct disk.
Cheers,
Wol
The trouble [is] that a kernel command line, or /etc/fstab, using lots
of these is not human readable, and hence is at the edge of unmaintainability. This maintenance difficulty surely outweighs the
rare situation where the physical->logical assignment changes due to a
broken drive. That's what we've got rescue disks for.
On Sunday 22 December 2024 13:43:08 GMT Alan Mackenzie wrote:
The trouble [is] that a kernel command line, or /etc/fstab, using lots
of these is not human readable, and hence is at the edge of
unmaintainability. This maintenance difficulty surely outweighs the
rare situation where the physical->logical assignment changes due to a
broken drive. That's what we've got rescue disks for.
Hear, hear! I never could understand why everyone seems to want to jump onto that band-wagon.
surely outweighs the rare situation where the physical->logicalassignment changes
On 22/12/2024 15:29, Peter Humphrey wrote:
On Sunday 22 December 2024 13:43:08 GMT Alan Mackenzie wrote:
The trouble [is] that a kernel command line, or /etc/fstab, using lots
of these is not human readable, and hence is at the edge of
unmaintainability. This maintenance difficulty surely outweighs the
rare situation where the physical->logical assignment changes due to a
broken drive. That's what we've got rescue disks for.
Hear, hear! I never could understand why everyone seems to want to jump onto
that band-wagon.
I have no problem with you saying all this long guid crap makes stuff unreadable (and yes, I agree, unreadable and unmaintainable aren't that
far different) BUT
surely outweighs the rare situation where the physical->logicalassignment changes
THAT DEPENDS ON YOUR HARDWARE!
For normal consumer grade hardware, I agree. I've never known it change unless I've been mucking about with add-in SATA, PATA, whatever cards.
BUT. Especially on big server-grade hardware, where there's lots of trip switches so stuff doesn't all power up in one huge spike (and I've
worked with such), different parts of the system come up in a completely random order, and drives re-order themselves pretty much every single boot!
So yes, with our consumer hardware I'd agree with you. But the people
paying big bills for reliable top-range hardware would wonder what
you're smoking!
Cheers,
Wol
Alan Mackenzie:
...
By the way, do you know an easy way for copying an entire filesystem,
such as the root system, but without copying other systems mounted in
it? I tried for some while with rsync and various combinations of
find's and xargs's, and in the end booted up into the rescue disc to do
it. I shouldn't have to do that.
rsync as other people have suggested.
There is also
cp -x
dump/restore
find -xdev
etc.
You can also do it by accessing the /dev/-file like
dd if=source of=dest (cp works here also but dd is more the norm).
///
When something is mounted on a mount point, the files below the
mount point is hidden and the mounted filessystem will be available
instead. Do you want to copy thoose hidden files also ?
I think any system admins reading this would long for the
predictability of "consumer hardware", having too often been
confronted with indistinguishable 32 hex digit identifiers. I would
imagine it quite likely that the said admins have written scripts to
make this more manageable.